Commit graph

17 commits

Author SHA1 Message Date
Simon Kornblith
098078627c - Make events listening for DOMContentLoaded listen for load, because DOMContentLoaded does not seem ready for prime time (hey, it's undocumented, what can you expect)
- Make Amazon scraper work with multiple documents
- Fix bugs in processDocuments
- Make Scholar.Ingester.Utilities.getItemArray() willing to take an array of DOM nodes to search for links, and finally take advantage of the fact that objects have no length
2006-06-23 03:02:30 +00:00
Simon Kornblith
470f7c463f The Voyager scraper now actually works on the search results page. 2006-06-22 20:50:57 +00:00
Simon Kornblith
3890e5f122 - Made ingester automatically create hidden browser objects, given a window object. This should make things much easier for both David and me.
- Multiple item detection code is now a part of the scraperJavaScript, rather than the scrapeDetectCode, and code to choose which items to add is part of Scholar.Ingester.Utilities, accessible from inside scrapers. The alternative approach would result in one request (or, in the case of JSTOR, three requests) per new item, while in some cases (e.g. Voyager) only one request is necessary to get all of the items.
2006-06-22 15:50:46 +00:00
Simon Kornblith
ca3a0e6e5d Beginnings of search result scraping (does not yet actually do the scraping, but does present the menu) 2006-06-22 02:43:40 +00:00
David Norton
428eab6a95 A cog menu each for collections and items (the same as the contextual menu, for now)
Moved the capture icon into the URL bar (invisible until you visit a scrapable page. Currently just displays a Book, but will change to the correct item types in the future?)
2006-06-22 00:13:21 +00:00
Simon Kornblith
9a9621f39d Make net appear even before first page has loaded 2006-06-21 18:19:49 +00:00
Simon Kornblith
09d79d6dd7 Fix overly optimistic JSTOR scraper 2006-06-20 17:06:41 +00:00
Simon Kornblith
5af10b1061 - Fix small bug in ingester interface 2006-06-20 14:16:15 +00:00
Simon Kornblith
c983a8e7e4 - Re-named Scholar.Ingester.Interface to Scholar_Ingester_Interface (since Scholar object is defined in XPCOM and thus global) 2006-06-20 00:52:15 +00:00
Simon Kornblith
3d881eec13 - Make scrapers return standard ISO-style YYYY-MM-DD dates. Still need to work on journal article scrapers.
- Ingester lets callback function save items, rather than saving them itself.
- Better handling of multiple items in API, although no scrapers currently implement this.
2006-06-17 21:21:15 +00:00
Simon Kornblith
0753d78910 - Add VLTS scraper
- Fix loadDocument/processDocuments (broken by r145)
2006-06-06 21:35:23 +00:00
Simon Kornblith
152c9bf9e7 - Small changes to MARC record support
- Implemented loadDocument API, for loading and parsing the DOMs of HTML documents in the background
- Added scraper code to SVN repository (now includes 12 scrapers, see Writeboard for details)

To update to the latest versions of all scrapers, ensure you have an up-to-date version of sqlite3, then run:
sqlite3 ~/Library/Application\ Support/Firefox/Profiles/profileName/scholar.sqlite < scrapers.sql
2006-06-06 18:25:45 +00:00
Simon Kornblith
85d8153024 Add library, hooks for scraping MARC records. 2006-06-03 22:26:01 +00:00
Simon Kornblith
93652a137c Fix issues with asynchronous scraping and XMLHttpRequest 2006-06-02 23:53:42 +00:00
Simon Kornblith
bb57e6ba7d Provide visual feedback for scraping 2006-06-02 18:22:34 +00:00
Simon Kornblith
639a006efb XPCOM-ize ingester, fix swapped first and last name in ingested info, stop ingesting pages field (this should be for pages of the source used, not the total number of pages, right?) 2006-06-02 03:19:12 +00:00
Simon Kornblith
551582eb7e Still getting the hang of Subversion...the rest of the ingester code 2006-06-01 06:53:39 +00:00