Commit graph

23 commits

Author SHA1 Message Date
Simon Kornblith
470f7c463f The Voyager scraper now actually works on the search results page. 2006-06-22 20:50:57 +00:00
Simon Kornblith
3890e5f122 - Made ingester automatically create hidden browser objects, given a window object. This should make things much easier for both David and me.
- Multiple item detection code is now a part of the scraperJavaScript, rather than the scrapeDetectCode, and code to choose which items to add is part of Scholar.Ingester.Utilities, accessible from inside scrapers. The alternative approach would result in one request (or, in the case of JSTOR, three requests) per new item, while in some cases (e.g. Voyager) only one request is necessary to get all of the items.
2006-06-22 15:50:46 +00:00
Simon Kornblith
1b74d0b04a Doh! Forgot to update scraper timestamp. 2006-06-22 02:46:30 +00:00
Simon Kornblith
ca3a0e6e5d Beginnings of search result scraping (does not yet actually do the scraping, but does present the menu) 2006-06-22 02:43:40 +00:00
Simon Kornblith
6d1e447154 - Remove load eventListener after it has been called once
- Capture editors from Google Books
2006-06-21 15:18:18 +00:00
Simon Kornblith
f753c1cc2f Add Google Books scraper 2006-06-21 14:28:51 +00:00
Simon Kornblith
7b08c94437 Remember to update modified dates on changed scrapers. 2006-06-21 13:55:55 +00:00
Simon Kornblith
7d3deb5b9f - Make Scholar.Ingester.Utilities.loadDocument() attach an event handler to load rather than DOMContentLoaded to resolve an issue with the Ex Libris/Aleph scraper (VCU)
- When possible, corporate creators/contributors are categorized with their own RDF types (prefixDummy + "corporateCreator/corporateContributor)
- Remove extraneous debug code in extensions
2006-06-21 01:41:07 +00:00
Simon Kornblith
09d79d6dd7 Fix overly optimistic JSTOR scraper 2006-06-20 17:06:41 +00:00
Simon Kornblith
968348a5d1 Add a scraper for Dublin Core metadata embedded in HTML/XHTML META tags 2006-06-20 16:08:13 +00:00
Simon Kornblith
4c34c592da - Better handling of InnoPAC records not returned by searches 2006-06-18 21:00:43 +00:00
Simon Kornblith
20369f41b3 - Move commonly used scraper functions to ingester.js, rather than re-defining them in each scraper. This breaks Piggy Bank compatibility in our scrapers, but we will still be able to export our scrapers in a Piggy Bank compatible form.
- Better handling of scraper RDF to item mapping.
- Improved date handling. All scrapers now return ISO-style dates when possible.
2006-06-18 19:04:32 +00:00
Simon Kornblith
3d881eec13 - Make scrapers return standard ISO-style YYYY-MM-DD dates. Still need to work on journal article scrapers.
- Ingester lets callback function save items, rather than saving them itself.
- Better handling of multiple items in API, although no scrapers currently implement this.
2006-06-17 21:21:15 +00:00
Dan Stillman
70216ea2c7 - Added automatic scraper update mechanism (more details on Basecamp: http://chnm.grouphub.com/C2687015)
- Removed localLastUpdated field from scrapers table and renamed centralLastUpdated to lastUpdated; updated scraper queries accordingly

- Added query in scrapers.sql to update version table 'repository' row to prevent immediate downloads of newly installed scrapers

- Get version property from extension manager in Scholar.init() and assign to Scholar.version
2006-06-15 06:13:02 +00:00
Dan Stillman
d42258b168 Changed schema of scrapers table to use single GUID for scraperID
Assigned guids to scrapers, replaced INSERT queries with REPLACE queries, and removed table DELETE query at top -- this will allow scrapers to be updated without deleting any others that may exist (e.g. that someone is developing, third-party, etc.)
2006-06-12 15:43:24 +00:00
Simon Kornblith
076ee0fad2 Add PubMed scraper, fix a few other small bugs 2006-06-08 01:26:40 +00:00
Simon Kornblith
f437917016 Add Project MUSE scraper 2006-06-07 21:26:55 +00:00
Simon Kornblith
cef0b19770 Add TLC/YouSeeMore scraper 2006-06-07 18:44:27 +00:00
Simon Kornblith
1e48189c3b Add SIRSI (old) scraper 2006-06-07 17:44:55 +00:00
Simon Kornblith
07dad8fae9 Add DRA, GEAC scrapers 2006-06-07 16:48:03 +00:00
Dan Stillman
393807b152 This isn't quite done (I'm discussing changing the scrapers schema with Simon to better handle scraper updates) but in the interest of getting the scrapers in for testing, I'll commit this now.
Integrated the scrapers with the schema update mechanism. Changed a bunch of schema methods to handle both schema.sql and scrapers.sql (or others, if need be) and altered the version table to track mu
ltiple versions for different files. This theoretically should detect that the version table has changed and force a reinitialization of the DB--let me know if there are problems.
2006-06-07 15:27:21 +00:00
Simon Kornblith
0753d78910 - Add VLTS scraper
- Fix loadDocument/processDocuments (broken by r145)
2006-06-06 21:35:23 +00:00
Simon Kornblith
152c9bf9e7 - Small changes to MARC record support
- Implemented loadDocument API, for loading and parsing the DOMs of HTML documents in the background
- Added scraper code to SVN repository (now includes 12 scrapers, see Writeboard for details)

To update to the latest versions of all scrapers, ensure you have an up-to-date version of sqlite3, then run:
sqlite3 ~/Library/Application\ Support/Firefox/Profiles/profileName/scholar.sqlite < scrapers.sql
2006-06-06 18:25:45 +00:00