zotero

Author	SHA1	Message	Date
Simon Kornblith	19504e6746	- closes #73 , use chrome for "Scraping Progress..." indicator - multiple and book icons were swapped for Voyager scraper	2006-06-27 02:03:10 +00:00
Simon Kornblith	f1cc809f76	Add a generic scraper that will scrape any website, although it may not always find very much information. It looks at META tags, both Dublin Core and otherwise. When tags are ready, we can pull out META keywords.	2006-06-26 20:44:45 +00:00
Simon Kornblith	4242c62b1b	- Fix redundancy in utilities.js (I accidentally copied and pasted a much larger block of code than i meant to) - Move processDocuments, a function for loading a DOM representation of a document or set of documents, to Scholar.Utilities.HTTP - Add Scholar.Ingester.ingestURL, a simplified function to scrape a URL (closes #33)	2006-06-26 20:02:30 +00:00
Simon Kornblith	4535b220db	Closes #84 , make type icon in toolbar match item about to be scraped. It's not perfect, since to get everything right, we'd need to scrape the page as soon as it appears, but it provides a pretty good indication. Multiple items get the folder icon. If there's a better icon out there, it's pretty straightforward to implement.	2006-06-26 18:05:23 +00:00
Simon Kornblith	a33b119dff	grab ISBN from SIRSI 2003+ catalogs	2006-06-26 01:17:29 +00:00
Simon Kornblith	303c6ee68d	closes #41 , get library call number	2006-06-26 01:08:59 +00:00
Simon Kornblith	d73127b1b3	update modification times	2006-06-25 22:01:04 +00:00
Simon Kornblith	f6b0d9a541	search results scraping for InfoTrac. closes #15	2006-06-25 22:00:20 +00:00
Simon Kornblith	1ec834cef2	Search results scraping for Project MUSE	2006-06-25 21:12:14 +00:00
Simon Kornblith	6a627fad0a	Search results scraping for LexisNexis	2006-06-25 20:09:27 +00:00
Simon Kornblith	a48ea7dabf	Search results scraping for ProQuest	2006-06-25 19:32:49 +00:00
Simon Kornblith	7402577806	Add search results scraping for History Cooperative	2006-06-25 18:34:23 +00:00
Simon Kornblith	a9c79f6110	Search results scraping for JSTOR	2006-06-25 18:17:00 +00:00
Simon Kornblith	5e73dcdd2e	- Search results scraping for WorldCat. - Make scraperJavaScript run on reload again, because it makes debugging easier - There's not actually a memory leak in the proxyMonitor code.	2006-06-25 16:13:47 +00:00
Simon Kornblith	9e78d62b13	Better handling of itemTypes, and improved date handling in PubMed scraper.	2006-06-25 05:03:01 +00:00
Simon Kornblith	fd2052e63c	Search results scraping for PubMed and Google Books. This marks the end of what I can do with respect to #15 until I'm at home or CHNM, where I'll have access to the gated collections.	2006-06-24 17:33:35 +00:00
Simon Kornblith	260ce80086	- Search results scraping for TLC. This is the last of the library scrapers. - Minor fixes to ingester utilities.	2006-06-24 15:38:53 +00:00
Simon Kornblith	06cf9e7853	Search results scraping for SIRSI (old versions)	2006-06-24 14:35:05 +00:00
Simon Kornblith	6f19b215f5	Search result scraping for GEAC catalogs	2006-06-23 21:27:32 +00:00
Simon Kornblith	2b58ead7aa	Search results scraping for Dynix	2006-06-23 20:53:29 +00:00
Simon Kornblith	2a74e88416	- Make generalized function for finding search results case insensitive - Scrape DRA search results	2006-06-23 20:09:48 +00:00
Simon Kornblith	8fe72b3e3c	Search results scraping for VTLS	2006-06-23 19:22:24 +00:00
Simon Kornblith	641d7054cc	- Fixed some bugs in the InnoPAC scraper (search results) - Made an Aleph search results scraper that works correctly on most sites, and degrades nicely when it doesn't	2006-06-23 17:35:57 +00:00
Simon Kornblith	83c36f330d	Scrapable search results for SIRSI 2003+ scraper	2006-06-23 16:17:53 +00:00
Simon Kornblith	9742283389	InnoPAC scraper now handles search results pages	2006-06-23 14:12:34 +00:00
Simon Kornblith	098078627c	- Make events listening for DOMContentLoaded listen for load, because DOMContentLoaded does not seem ready for prime time (hey, it's undocumented, what can you expect) - Make Amazon scraper work with multiple documents - Fix bugs in processDocuments - Make Scholar.Ingester.Utilities.getItemArray() willing to take an array of DOM nodes to search for links, and finally take advantage of the fact that objects have no length	2006-06-23 03:02:30 +00:00
Simon Kornblith	b4d65420f3	...but I forgot to update the timestamp	2006-06-22 20:51:40 +00:00
Simon Kornblith	470f7c463f	The Voyager scraper now actually works on the search results page.	2006-06-22 20:50:57 +00:00
Simon Kornblith	3890e5f122	- Made ingester automatically create hidden browser objects, given a window object. This should make things much easier for both David and me. - Multiple item detection code is now a part of the scraperJavaScript, rather than the scrapeDetectCode, and code to choose which items to add is part of Scholar.Ingester.Utilities, accessible from inside scrapers. The alternative approach would result in one request (or, in the case of JSTOR, three requests) per new item, while in some cases (e.g. Voyager) only one request is necessary to get all of the items.	2006-06-22 15:50:46 +00:00
Simon Kornblith	1b74d0b04a	Doh! Forgot to update scraper timestamp.	2006-06-22 02:46:30 +00:00
Simon Kornblith	ca3a0e6e5d	Beginnings of search result scraping (does not yet actually do the scraping, but does present the menu)	2006-06-22 02:43:40 +00:00
Simon Kornblith	6d1e447154	- Remove load eventListener after it has been called once - Capture editors from Google Books	2006-06-21 15:18:18 +00:00
Simon Kornblith	f753c1cc2f	Add Google Books scraper	2006-06-21 14:28:51 +00:00
Simon Kornblith	7b08c94437	Remember to update modified dates on changed scrapers.	2006-06-21 13:55:55 +00:00
Simon Kornblith	7d3deb5b9f	- Make Scholar.Ingester.Utilities.loadDocument() attach an event handler to load rather than DOMContentLoaded to resolve an issue with the Ex Libris/Aleph scraper (VCU) - When possible, corporate creators/contributors are categorized with their own RDF types (prefixDummy + "corporateCreator/corporateContributor) - Remove extraneous debug code in extensions	2006-06-21 01:41:07 +00:00
Simon Kornblith	09d79d6dd7	Fix overly optimistic JSTOR scraper	2006-06-20 17:06:41 +00:00
Simon Kornblith	968348a5d1	Add a scraper for Dublin Core metadata embedded in HTML/XHTML META tags	2006-06-20 16:08:13 +00:00
Simon Kornblith	4c34c592da	- Better handling of InnoPAC records not returned by searches	2006-06-18 21:00:43 +00:00
Simon Kornblith	20369f41b3	- Move commonly used scraper functions to ingester.js, rather than re-defining them in each scraper. This breaks Piggy Bank compatibility in our scrapers, but we will still be able to export our scrapers in a Piggy Bank compatible form. - Better handling of scraper RDF to item mapping. - Improved date handling. All scrapers now return ISO-style dates when possible.	2006-06-18 19:04:32 +00:00
Simon Kornblith	3d881eec13	- Make scrapers return standard ISO-style YYYY-MM-DD dates. Still need to work on journal article scrapers. - Ingester lets callback function save items, rather than saving them itself. - Better handling of multiple items in API, although no scrapers currently implement this.	2006-06-17 21:21:15 +00:00
Dan Stillman	70216ea2c7	- Added automatic scraper update mechanism (more details on Basecamp: http://chnm.grouphub.com/C2687015 ) - Removed localLastUpdated field from scrapers table and renamed centralLastUpdated to lastUpdated; updated scraper queries accordingly - Added query in scrapers.sql to update version table 'repository' row to prevent immediate downloads of newly installed scrapers - Get version property from extension manager in Scholar.init() and assign to Scholar.version	2006-06-15 06:13:02 +00:00
Dan Stillman	d42258b168	Changed schema of scrapers table to use single GUID for scraperID Assigned guids to scrapers, replaced INSERT queries with REPLACE queries, and removed table DELETE query at top -- this will allow scrapers to be updated without deleting any others that may exist (e.g. that someone is developing, third-party, etc.)	2006-06-12 15:43:24 +00:00
Simon Kornblith	076ee0fad2	Add PubMed scraper, fix a few other small bugs	2006-06-08 01:26:40 +00:00
Simon Kornblith	f437917016	Add Project MUSE scraper	2006-06-07 21:26:55 +00:00
Simon Kornblith	cef0b19770	Add TLC/YouSeeMore scraper	2006-06-07 18:44:27 +00:00
Simon Kornblith	1e48189c3b	Add SIRSI (old) scraper	2006-06-07 17:44:55 +00:00
Simon Kornblith	07dad8fae9	Add DRA, GEAC scrapers	2006-06-07 16:48:03 +00:00
Dan Stillman	393807b152	This isn't quite done (I'm discussing changing the scrapers schema with Simon to better handle scraper updates) but in the interest of getting the scrapers in for testing, I'll commit this now. Integrated the scrapers with the schema update mechanism. Changed a bunch of schema methods to handle both schema.sql and scrapers.sql (or others, if need be) and altered the version table to track mu ltiple versions for different files. This theoretically should detect that the version table has changed and force a reinitialization of the DB--let me know if there are problems.	2006-06-07 15:27:21 +00:00
Simon Kornblith	0753d78910	- Add VLTS scraper - Fix loadDocument/processDocuments (broken by r145)	2006-06-06 21:35:23 +00:00
Simon Kornblith	152c9bf9e7	- Small changes to MARC record support - Implemented loadDocument API, for loading and parsing the DOMs of HTML documents in the background - Added scraper code to SVN repository (now includes 12 scrapers, see Writeboard for details) To update to the latest versions of all scrapers, ensure you have an up-to-date version of sqlite3, then run: sqlite3 ~/Library/Application\ Support/Firefox/Profiles/profileName/scholar.sqlite < scrapers.sql	2006-06-06 18:25:45 +00:00

50 commits