zotero

Author	SHA1	Message	Date
Simon Kornblith	c64e5c841f	closes #78 , figure out import/export architecture closes #100, migrate ingester to Scholar.Translate closes #88, migrate scrapers away from RDF closes #9, pull out LC subject heading tags references #87, add fromArray() and toArray() methods to item objects API changes: all translation (import/export/web) now goes through Scholar.Translate all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone()) scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively) scrapers now contain functions (doImport, doExport, doWeb) rather than loose code scrapers can call functions in other scrapers or just call the function to translate itself export accesses items item-by-item, rather than accepting a huge array of items MARC functions are now in the MARC import translator, and accessed by the web translators new features: import now works rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment) items appear as they are scraped MARC import translator pulls out tags, although this seems to slow things down no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.	2006-07-17 04:06:58 +00:00
Simon Kornblith	77282c3edc	- fixes a bug that could result in scrapers using utilities.processDocuments malfunctioning - fixes a bug that could result in the Scrape Progress chrome thingy sticking around forever - makes chrome thingy disappear when URL changes or when tabs are switched	2006-06-29 03:22:10 +00:00
Simon Kornblith	45b9234996	addresses #78 , figure out import/export architecture - changes scrapers table to translators table; all import/export/web translators now belong in this table - adds Scholar.Translate to handle translation issues. eventually, Scholar.Ingester.Document will become part of this interface - adds Scholar_File_Interface (in fileInterface.js) to handle UI for export and eventually import. (David, when you have time, please connect Scholar_File_Interface.exportFile to a button.) - adds an export translator for MODS. all of our metadata, but not our hierarchy (projects, etc.) translates directly and unambiguously into valid MODS. eventually, we can use RDF or another format to handle hierarchy. - adds utilities.getVersion() and utilities.inArray() for simplified scraper coding - fixes minor interface issues with the nifty chrome scraping status window	2006-06-29 00:56:50 +00:00
Simon Kornblith	9a7d619122	closes #42 , save directly to project folder by clicking and holding down the icon in the toolbar. you actually have to right click (not just click and hold) for this to work, because 2.0 gets rid of the click-and-hold = contextual menu thing that existed in older version.	2006-06-27 21:02:26 +00:00
Simon Kornblith	257ed8f69b	closes #68 , figure out way to have scrapers work for gated resources behind proxies. most institutions use EZProxy for their proxy needs (or a more transparent proxy, which we support natively). this implementation is significantly better than the old one, which refused to work after you'd already logged in once, and is also simpler, because it's stateless. it has to observe every HTTP request, but there's no noticeable speed hit. it also still doesn't work when there's a link from one gated site to another gated site, but as far as i can tell, this only happens on the Gale Group site.	2006-06-27 04:08:21 +00:00
Simon Kornblith	19504e6746	- closes #73 , use chrome for "Scraping Progress..." indicator - multiple and book icons were swapped for Voyager scraper	2006-06-27 02:03:10 +00:00
Simon Kornblith	4242c62b1b	- Fix redundancy in utilities.js (I accidentally copied and pasted a much larger block of code than i meant to) - Move processDocuments, a function for loading a DOM representation of a document or set of documents, to Scholar.Utilities.HTTP - Add Scholar.Ingester.ingestURL, a simplified function to scrape a URL (closes #33)	2006-06-26 20:02:30 +00:00
Simon Kornblith	4535b220db	Closes #84 , make type icon in toolbar match item about to be scraped. It's not perfect, since to get everything right, we'd need to scrape the page as soon as it appears, but it provides a pretty good indication. Multiple items get the folder icon. If there's a better icon out there, it's pretty straightforward to implement.	2006-06-26 18:05:23 +00:00
Simon Kornblith	5e73dcdd2e	- Search results scraping for WorldCat. - Make scraperJavaScript run on reload again, because it makes debugging easier - There's not actually a memory leak in the proxyMonitor code.	2006-06-25 16:13:47 +00:00
Dan Stillman	b2247e1dd2	Fixes #66 , Need a function to get typeID given typeName - Added methods getID(idOrName) and getName(idOrName) to Scholar.CreatorTypes and Scholar.ItemTypes to take either typeID or typeName - Removed getTypeName() in each and changed references accordingly - Streamlined both classes to be as similar as possible	2006-06-25 04:35:11 +00:00
Simon Kornblith	22eebc6cdf	Addresses #68 , figure out way to have scrapers work for gated resources behind proxies. We can now access pages through an EZProxy. We need to know what alternatives to EZProxy exist in order to support them. Also, fixes some spacing issues in browser.js.	2006-06-25 04:30:43 +00:00
Simon Kornblith	40fabb888c	Addresses #65 , back button fools ingester, and fixes bugs loading new tabs in the background.	2006-06-24 21:39:36 +00:00
Dan Stillman	97940c7470	Replaced all instances of "Firefox Scholar" (not counting the repository URL) with "Scholar for Firefox" for now	2006-06-24 09:08:12 +00:00
Simon Kornblith	098078627c	- Make events listening for DOMContentLoaded listen for load, because DOMContentLoaded does not seem ready for prime time (hey, it's undocumented, what can you expect) - Make Amazon scraper work with multiple documents - Fix bugs in processDocuments - Make Scholar.Ingester.Utilities.getItemArray() willing to take an array of DOM nodes to search for links, and finally take advantage of the fact that objects have no length	2006-06-23 03:02:30 +00:00
Simon Kornblith	470f7c463f	The Voyager scraper now actually works on the search results page.	2006-06-22 20:50:57 +00:00
Simon Kornblith	3890e5f122	- Made ingester automatically create hidden browser objects, given a window object. This should make things much easier for both David and me. - Multiple item detection code is now a part of the scraperJavaScript, rather than the scrapeDetectCode, and code to choose which items to add is part of Scholar.Ingester.Utilities, accessible from inside scrapers. The alternative approach would result in one request (or, in the case of JSTOR, three requests) per new item, while in some cases (e.g. Voyager) only one request is necessary to get all of the items.	2006-06-22 15:50:46 +00:00
Simon Kornblith	ca3a0e6e5d	Beginnings of search result scraping (does not yet actually do the scraping, but does present the menu)	2006-06-22 02:43:40 +00:00
David Norton	428eab6a95	A cog menu each for collections and items (the same as the contextual menu, for now) Moved the capture icon into the URL bar (invisible until you visit a scrapable page. Currently just displays a Book, but will change to the correct item types in the future?)	2006-06-22 00:13:21 +00:00
Simon Kornblith	9a9621f39d	Make net appear even before first page has loaded	2006-06-21 18:19:49 +00:00
Simon Kornblith	09d79d6dd7	Fix overly optimistic JSTOR scraper	2006-06-20 17:06:41 +00:00
Simon Kornblith	5af10b1061	- Fix small bug in ingester interface	2006-06-20 14:16:15 +00:00
Simon Kornblith	c983a8e7e4	- Re-named Scholar.Ingester.Interface to Scholar_Ingester_Interface (since Scholar object is defined in XPCOM and thus global)	2006-06-20 00:52:15 +00:00
Simon Kornblith	3d881eec13	- Make scrapers return standard ISO-style YYYY-MM-DD dates. Still need to work on journal article scrapers. - Ingester lets callback function save items, rather than saving them itself. - Better handling of multiple items in API, although no scrapers currently implement this.	2006-06-17 21:21:15 +00:00
Simon Kornblith	0753d78910	- Add VLTS scraper - Fix loadDocument/processDocuments (broken by r145)	2006-06-06 21:35:23 +00:00
Simon Kornblith	152c9bf9e7	- Small changes to MARC record support - Implemented loadDocument API, for loading and parsing the DOMs of HTML documents in the background - Added scraper code to SVN repository (now includes 12 scrapers, see Writeboard for details) To update to the latest versions of all scrapers, ensure you have an up-to-date version of sqlite3, then run: sqlite3 ~/Library/Application\ Support/Firefox/Profiles/profileName/scholar.sqlite < scrapers.sql	2006-06-06 18:25:45 +00:00
Simon Kornblith	85d8153024	Add library, hooks for scraping MARC records.	2006-06-03 22:26:01 +00:00
Simon Kornblith	93652a137c	Fix issues with asynchronous scraping and XMLHttpRequest	2006-06-02 23:53:42 +00:00
Simon Kornblith	bb57e6ba7d	Provide visual feedback for scraping	2006-06-02 18:22:34 +00:00
Simon Kornblith	639a006efb	XPCOM-ize ingester, fix swapped first and last name in ingested info, stop ingesting pages field (this should be for pages of the source used, not the total number of pages, right?)	2006-06-02 03:19:12 +00:00
Simon Kornblith	551582eb7e	Still getting the hang of Subversion...the rest of the ingester code	2006-06-01 06:53:39 +00:00

30 commits