Commit graph

48 commits

Author SHA1 Message Date
Simon Kornblith
7b7d3d85e3 - added Washington Post translator
- translation works properly even when a user has switched to a different page
2006-09-08 05:47:47 +00:00
Simon Kornblith
dd0c537ce1 closes #267, MODS export option uses an rdf extension (should be xml)
thanks to Dan for the idea
2006-09-04 22:57:23 +00:00
Simon Kornblith
aa6e2cfab1 closes #264, UMich lib catalog doesn't work on Windows; other issues related to Mirlyn
positions "saving item" window in a slightly better place on Windows

the UMich bug was actually bigger than I though. as it turns out, the HiddenDOMWindow in Windows is not a chrome window, so i had to modify createHiddenBrowser() to attach the hidden browser object to an existing browser window. i don't believe this should have any adverse effects for snapshots, etc., but Dan, correct me if i'm wrong. it would be nice to be able to create a real chrome instance instead of a XUL element, but all of my attempts at doing so have failed.
2006-09-04 20:19:38 +00:00
Simon Kornblith
ed6650c4e7 closes #218, Windows support for Word integration. this solution seems to work with both Word 2003 and Word 2007. i have not tested with earlier versions. Zotero.dot is the Windows verison; Zotero.dot.dmg is the Mac version. the only difference is the function call used to perform SOAP requests.
to get this to work right, you'll need the SOAP toolkit from http://www.microsoft.com/downloads/details.aspx?FamilyID=ba611554-5943-444c-b53c-c0a450b7013c&DisplayLang=en
I may replace the SOAP object with a simple XMLHTTP object, since that page says that the SOAP toolkit is deprecated.
2006-09-04 08:06:04 +00:00
Simon Kornblith
73b5634f62 addresses #215, allow user to select citation style and change citation styles on the fly
addresses #214, add footnote support to word integration

- the third icon on the Zotero Word toolbar is now reserved for "Document Options," which, for now, means on the selection of styles
- the Document Options window will, for now, appear the first time you create a citation. the default style probably belongs in the Scholar preferences window.
- you can now generate citations in both footnote and in-text citation formats. you can't yet switch between them on the fly, but that should be coming soon...
- Ibid is not yet implemented. again, coming soon.
2006-08-29 23:15:13 +00:00
Simon Kornblith
f07cb5a5bc adds an InfoTrac OneFile translator
fixes a bug in ingester progress window handling
2006-08-26 03:50:15 +00:00
Dan Stillman
7d6bd8d0af "project"=>"collection" (already "collection" in most places internally) 2006-08-24 19:43:48 +00:00
Simon Kornblith
0e63958f96 - make proquest work better behind proxies
- improved frame support
2006-08-24 18:00:48 +00:00
Simon Kornblith
26668a6e73 closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions

MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
Simon Kornblith
10ba568ee8 closes #39, auto-ingest of associated files (as recognizable)
closes #3, Overflow metadata dumps into "extra" field

add "extra" data where such data is useful and conveniently accessible (not available for XML-based export or MARC formats yet)
add links to permanent URLs
download associated files from full text sources (if extensions.scholar.downloadAssociatedFiles preference is enabled)
fix WorldCat translator
improve InnoPAC translator (it now works on Georgetown search results pages, albeit slowly, because it must first realize the catalog is misconfigured)
tag items from SIRSI and WorldCat
return to putting the full lengths of books into "pages," because some citation styles require it
fix COinS (broken a few revisions ago)
2006-08-17 07:56:01 +00:00
Simon Kornblith
410e090ecd closes #104, speed up multiple item adds 2006-08-15 23:03:11 +00:00
Simon Kornblith
51108446e3 closes #187, make berkeley's library work
closes #186, stop translators from hanging

when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
Simon Kornblith
52fe187328 closes #184, support non-ASCII characters in HTML and RTF. since we use the unicode features of RTF 1.5, this requires Word 97 or later on a PC (or presumably Word 98 or later on a Mac) to read.
fixes one last strict mode bug
2006-08-15 01:05:20 +00:00
Simon Kornblith
c18f75d667 show fewer warnings in strict mode 2006-08-14 22:28:22 +00:00
Simon Kornblith
3195a1c382 closes #112, ingested items should be automatically added to selected project
references #178, changes to various date fields

- updates CSL to work with the latest schema. we can now (almost) generate completely valid APA style. the only issue is that there's no syntax for specifying short forms for page and creator type labels.
- updates scrapers to use date field rather than year field.
- removes now-unnecessary translation engine code pertaining to year field.
2006-08-14 05:12:28 +00:00
Simon Kornblith
3edb6e0286 closes #86, steal EndNote download links
Scholar should now attempt to process citation information from EndNote download links (MIME types application/x-endnote-refer and application/x-research-info-systems). in situations where Scholar cannot process the information, a standard helper app dialog will appear. this behavior is controlled by the preference extensions.scholar.parseEndNoteMIMETypes.
2006-08-08 21:17:07 +00:00
Simon Kornblith
9144b56772 addresses #131, make import/export symmetrical
closes #163, make translator API allow creator types besides author

import and export in the multi-ontology RDF format should now work properly. collections, notes, and see also are all preserved. more extensive testing will be necessary later.
2006-08-05 20:58:45 +00:00
Simon Kornblith
f6c12d3d81 closes #112, ingested items should be automatically added to selected project 2006-08-02 14:17:16 +00:00
Simon Kornblith
c64e5c841f closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects

API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators

new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing

apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
Simon Kornblith
77282c3edc - fixes a bug that could result in scrapers using utilities.processDocuments malfunctioning
- fixes a bug that could result in the Scrape Progress chrome thingy sticking around forever
- makes chrome thingy disappear when URL changes or when tabs are switched
2006-06-29 03:22:10 +00:00
Simon Kornblith
45b9234996 addresses #78, figure out import/export architecture
- changes scrapers table to translators table; all import/export/web translators now belong in this table
- adds Scholar.Translate to handle translation issues. eventually, Scholar.Ingester.Document will become part of this interface
- adds Scholar_File_Interface (in fileInterface.js) to handle UI for export and eventually import. (David, when you have time, please connect Scholar_File_Interface.exportFile to a button.)
- adds an export translator for MODS. all of our metadata, but not our hierarchy (projects, etc.) translates directly and unambiguously into valid MODS. eventually, we can use RDF or another format to handle hierarchy.
- adds utilities.getVersion() and utilities.inArray() for simplified scraper coding
- fixes minor interface issues with the nifty chrome scraping status window
2006-06-29 00:56:50 +00:00
Simon Kornblith
9a7d619122 closes #42, save directly to project folder by clicking and holding down the icon in the toolbar. you actually have to right click (not just click and hold) for this to work, because 2.0 gets rid of the click-and-hold = contextual menu thing that existed in older version. 2006-06-27 21:02:26 +00:00
Simon Kornblith
257ed8f69b closes #68, figure out way to have scrapers work for gated resources behind proxies. most institutions use EZProxy for their proxy needs (or a more transparent proxy, which we support natively). this implementation is significantly better than the old one, which refused to work after you'd already logged in once, and is also simpler, because it's stateless. it has to observe every HTTP request, but there's no noticeable speed hit. it also still doesn't work when there's a link from one gated site to another gated site, but as far as i can tell, this only happens on the Gale Group site. 2006-06-27 04:08:21 +00:00
Simon Kornblith
19504e6746 - closes #73, use chrome for "Scraping Progress..." indicator
- multiple and book icons were swapped for Voyager scraper
2006-06-27 02:03:10 +00:00
Simon Kornblith
4242c62b1b - Fix redundancy in utilities.js (I accidentally copied and pasted a much larger block of code than i meant to)
- Move processDocuments, a function for loading a DOM representation of a document or set of documents, to Scholar.Utilities.HTTP
- Add Scholar.Ingester.ingestURL, a simplified function to scrape a URL (closes #33)
2006-06-26 20:02:30 +00:00
Simon Kornblith
4535b220db Closes #84, make type icon in toolbar match item about to be scraped. It's not perfect, since to get everything right, we'd need to scrape the page as soon as it appears, but it provides a pretty good indication. Multiple items get the folder icon. If there's a better icon out there, it's pretty straightforward to implement. 2006-06-26 18:05:23 +00:00
Simon Kornblith
5e73dcdd2e - Search results scraping for WorldCat.
- Make scraperJavaScript run on reload again, because it makes debugging easier
- There's not actually a memory leak in the proxyMonitor code.
2006-06-25 16:13:47 +00:00
Dan Stillman
b2247e1dd2 Fixes #66, Need a function to get typeID given typeName
- Added methods getID(idOrName) and getName(idOrName) to Scholar.CreatorTypes and Scholar.ItemTypes to take either typeID or typeName

- Removed getTypeName() in each and changed references accordingly

- Streamlined both classes to be as similar as possible
2006-06-25 04:35:11 +00:00
Simon Kornblith
22eebc6cdf Addresses #68, figure out way to have scrapers work for gated resources behind proxies. We can now access pages through an EZProxy. We need to know what alternatives to EZProxy exist in order to support them. Also, fixes some spacing issues in browser.js. 2006-06-25 04:30:43 +00:00
Simon Kornblith
40fabb888c Addresses #65, back button fools ingester, and fixes bugs loading new tabs in the background. 2006-06-24 21:39:36 +00:00
Dan Stillman
97940c7470 Replaced all instances of "Firefox Scholar" (not counting the repository URL) with "Scholar for Firefox" for now 2006-06-24 09:08:12 +00:00
Simon Kornblith
098078627c - Make events listening for DOMContentLoaded listen for load, because DOMContentLoaded does not seem ready for prime time (hey, it's undocumented, what can you expect)
- Make Amazon scraper work with multiple documents
- Fix bugs in processDocuments
- Make Scholar.Ingester.Utilities.getItemArray() willing to take an array of DOM nodes to search for links, and finally take advantage of the fact that objects have no length
2006-06-23 03:02:30 +00:00
Simon Kornblith
470f7c463f The Voyager scraper now actually works on the search results page. 2006-06-22 20:50:57 +00:00
Simon Kornblith
3890e5f122 - Made ingester automatically create hidden browser objects, given a window object. This should make things much easier for both David and me.
- Multiple item detection code is now a part of the scraperJavaScript, rather than the scrapeDetectCode, and code to choose which items to add is part of Scholar.Ingester.Utilities, accessible from inside scrapers. The alternative approach would result in one request (or, in the case of JSTOR, three requests) per new item, while in some cases (e.g. Voyager) only one request is necessary to get all of the items.
2006-06-22 15:50:46 +00:00
Simon Kornblith
ca3a0e6e5d Beginnings of search result scraping (does not yet actually do the scraping, but does present the menu) 2006-06-22 02:43:40 +00:00
David Norton
428eab6a95 A cog menu each for collections and items (the same as the contextual menu, for now)
Moved the capture icon into the URL bar (invisible until you visit a scrapable page. Currently just displays a Book, but will change to the correct item types in the future?)
2006-06-22 00:13:21 +00:00
Simon Kornblith
9a9621f39d Make net appear even before first page has loaded 2006-06-21 18:19:49 +00:00
Simon Kornblith
09d79d6dd7 Fix overly optimistic JSTOR scraper 2006-06-20 17:06:41 +00:00
Simon Kornblith
5af10b1061 - Fix small bug in ingester interface 2006-06-20 14:16:15 +00:00
Simon Kornblith
c983a8e7e4 - Re-named Scholar.Ingester.Interface to Scholar_Ingester_Interface (since Scholar object is defined in XPCOM and thus global) 2006-06-20 00:52:15 +00:00
Simon Kornblith
3d881eec13 - Make scrapers return standard ISO-style YYYY-MM-DD dates. Still need to work on journal article scrapers.
- Ingester lets callback function save items, rather than saving them itself.
- Better handling of multiple items in API, although no scrapers currently implement this.
2006-06-17 21:21:15 +00:00
Simon Kornblith
0753d78910 - Add VLTS scraper
- Fix loadDocument/processDocuments (broken by r145)
2006-06-06 21:35:23 +00:00
Simon Kornblith
152c9bf9e7 - Small changes to MARC record support
- Implemented loadDocument API, for loading and parsing the DOMs of HTML documents in the background
- Added scraper code to SVN repository (now includes 12 scrapers, see Writeboard for details)

To update to the latest versions of all scrapers, ensure you have an up-to-date version of sqlite3, then run:
sqlite3 ~/Library/Application\ Support/Firefox/Profiles/profileName/scholar.sqlite < scrapers.sql
2006-06-06 18:25:45 +00:00
Simon Kornblith
85d8153024 Add library, hooks for scraping MARC records. 2006-06-03 22:26:01 +00:00
Simon Kornblith
93652a137c Fix issues with asynchronous scraping and XMLHttpRequest 2006-06-02 23:53:42 +00:00
Simon Kornblith
bb57e6ba7d Provide visual feedback for scraping 2006-06-02 18:22:34 +00:00
Simon Kornblith
639a006efb XPCOM-ize ingester, fix swapped first and last name in ingested info, stop ingesting pages field (this should be for pages of the source used, not the total number of pages, right?) 2006-06-02 03:19:12 +00:00
Simon Kornblith
551582eb7e Still getting the hang of Subversion...the rest of the ingester code 2006-06-01 06:53:39 +00:00