Commit graph

36 commits

Author SHA1 Message Date
Dan Stillman
da5e74a06a Autocomplete for creators in item pane
Differentiates between single and double fields for the search, but there's a problem in the current implementation in that only one field is editable at once, so displaying two-field names in a drop-down is a little problematic. While I could display the full names, comma-delimited, and get the discrete parts (which is what Scholar.Utilities.AutoComplete.getResultComment(), included in this commit, is for--the creatorID for the row would be hidden in the autocomplete drop-down comment field), it's a bit unclear what should happen when a user selects a comma-separated name from the drop-down of one of the fields. One option would be to have a row for the last name (in case that's all they want to complete) and other rows for "last, first" matches, and selecting one of the two-part names would replace whatever's in the opposite name field with the appropriate text (and save it to the DB, I'm afraid, unless I change how the creator fields work), keeping the focus in the current textbox for easy tabbing. Not great, but it might work.

Other ideas?
2006-09-25 06:38:47 +00:00
Simon Kornblith
d5bc6cbe4b - fixes a bug in capitalizeTitle
- better feedback for search translator errors
2006-09-09 22:45:03 +00:00
Simon Kornblith
d4576d3d55 addresses #69, notification system for broken scrapers
thanks to Dan for his help on the repository side of things
2006-09-09 00:12:09 +00:00
Simon Kornblith
7b7d3d85e3 - added Washington Post translator
- translation works properly even when a user has switched to a different page
2006-09-08 05:47:47 +00:00
Simon Kornblith
e9ba093c15 oops 2006-09-07 01:30:10 +00:00
Simon Kornblith
cf8dc232b1 - new translators: New York Review of Books, Chronicle of Higher Education
- more useful errors in utilities
- fixes minor bugs in citation styling
2006-09-07 01:23:13 +00:00
Simon Kornblith
7f40d696a4 more useful comments in utilities.js 2006-09-06 04:48:13 +00:00
Simon Kornblith
89cf0c7235 closes #276, fix RIS bugs
- import translators no longer fail when trying to import an item with no name
- the T2/BT field becomes the publication title when no JO/JF field is available (fixes newspaper issues)
- Y2 is now treated as part of the date if and only if it is improperly formatted (seriously, why can't Thomson get their own specs straight?)
- work around EndNote's strange behavior of putting article titles into notes for no apparent reason
- RIS export gives dates as per specification
- fixed a bug that could have (potentially) caused problems formatting "January"
- allow translators to access strToDate function
2006-09-06 04:45:19 +00:00
Simon Kornblith
7d93903e2d closes #239, fix embedded RDF translator 2006-09-04 21:43:23 +00:00
Simon Kornblith
aa6e2cfab1 closes #264, UMich lib catalog doesn't work on Windows; other issues related to Mirlyn
positions "saving item" window in a slightly better place on Windows

the UMich bug was actually bigger than I though. as it turns out, the HiddenDOMWindow in Windows is not a chrome window, so i had to modify createHiddenBrowser() to attach the hidden browser object to an existing browser window. i don't believe this should have any adverse effects for snapshots, etc., but Dan, correct me if i'm wrong. it would be nice to be able to create a real chrome instance instead of a XUL element, but all of my attempts at doing so have failed.
2006-09-04 20:19:38 +00:00
Simon Kornblith
2b0bebe7a4 closes #258, MARC translator should capitalize titles 2006-09-04 18:16:50 +00:00
Simon Kornblith
1c8e3fcb02 closes #239, fix embedded RDF translator
modifies scrapers to use dates in the format that comes out of the page, rather than converting to SQL
adds Scholar.Date.formatDate() to provide a pretty representation of dates
2006-08-31 00:04:11 +00:00
Simon Kornblith
0e63958f96 - make proquest work better behind proxies
- improved frame support
2006-08-24 18:00:48 +00:00
Dan Stillman
3ce672756c Fix Utilities.HTTP.doHead() brokenness 2006-08-19 20:45:27 +00:00
Simon Kornblith
26668a6e73 closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions

MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
Simon Kornblith
20486d5053 addresses #103, figure out how to store captured pages in native export format
import/export of file data should work for all file types _except_ snapshots (in this situation, export is working, but import is not yet complete; see #193)
also, fixes a potential security issue that could have allowed malicious web translators to post local data to remote sites (although, given we maintain the central repository and there's no easy way to install a translator, the risk would have been minimal to begin with).
2006-08-18 05:58:14 +00:00
Simon Kornblith
10ba568ee8 closes #39, auto-ingest of associated files (as recognizable)
closes #3, Overflow metadata dumps into "extra" field

add "extra" data where such data is useful and conveniently accessible (not available for XML-based export or MARC formats yet)
add links to permanent URLs
download associated files from full text sources (if extensions.scholar.downloadAssociatedFiles preference is enabled)
fix WorldCat translator
improve InnoPAC translator (it now works on Georgetown search results pages, albeit slowly, because it must first realize the catalog is misconfigured)
tag items from SIRSI and WorldCat
return to putting the full lengths of books into "pages," because some citation styles require it
fix COinS (broken a few revisions ago)
2006-08-17 07:56:01 +00:00
Simon Kornblith
51108446e3 closes #187, make berkeley's library work
closes #186, stop translators from hanging

when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
Simon Kornblith
ddb4fc872c remove the doStatus argument from Scholar.Utilities.HTTP 2006-08-12 04:27:49 +00:00
Simon Kornblith
36a402713c rename Scholar.Utilities.Ingester.HTTPUtilities to Scholar.Utilities.Ingester.HTTP for consistency 2006-08-11 16:34:22 +00:00
Simon Kornblith
064ecd17db removes unnecessary pieces of piggy bank API from utilities and updates translators to abide by current translator guidelines 2006-08-11 15:28:18 +00:00
Dan Stillman
f3a66085f5 Closes #173, Try to detect content type of linked pages without loading entire file
Closes #174, Don't load images and attached files when detecting content type in linkFromURL()

If mime type not provided, Scholar.Files.linkFromURL() now uses XMLHTTPRequest HEAD request to get the content type without loading file (thanks Simon for the idea)

If title not provided, try to figure it out from URL, though not particularly intelligently (last slash)

Note that order of title and mimeType parameters is now swapped

This code should be a bit smarter about unexpected conditions
2006-08-09 18:37:34 +00:00
Simon Kornblith
216f0c7581 closes #83, figure out how to implement OpenURL
closes #76, implement extensible search/retrieval architecture for obtaining metadata

OpenURL COinS lookup is now implemented using a real search architecture system. at the moment, it works with Open WorldCat for books, CrossRef for journal articles (provided the COinS object contains a DOI or an ISSN), and PubMed when a PMID is available.
2006-08-08 01:06:33 +00:00
Simon Kornblith
6626eba844 addresses #83, figure out how to implement OpenURL
OpenURL lookup now works for books. this means that all that's necessary to add scrapable book metadata to a page is an ISBN, as shown below:

<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info:ofi/fmt:kev:mtx:book&amp;rft.isbn=1579550088"></span>

also, we can now scrape Open WorldCat and Wikipedia Book Sources pages with no specialized code involved.

i'm still looking for a better way of looking up journal article metadata. it's currently implemented with CrossRef, but CrossRef simply will not work without a DOI, and is also incomplete (only holds the last name of the first author).
2006-08-07 05:15:30 +00:00
Simon Kornblith
fc589a37cf closes #131, make import/export symmetrical
all 4 import/export formats currently supported (MODS, Hybrid RDF, Unqualified Dublin Core, and RIS) now work as both import and export translators
2006-08-06 09:34:51 +00:00
Simon Kornblith
6305e4cada closes #55, export bibliography to printable version
closes #4, Make printable version

- moves functions for creating and deleting hidden browser objects to scholar.js (from ingester.js), since these are necessary for printing as well
- allows saving bibliography in HTML or printing bibliography. style support is not yet complete (pending finalization of 0.9 version of CSL specification).
2006-07-27 23:01:55 +00:00
Simon Kornblith
c64e5c841f closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects

API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators

new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing

apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
Simon Kornblith
c0251085a9 Add export filters for RIS and Dublin Core RDF 2006-07-05 21:44:01 +00:00
Simon Kornblith
77282c3edc - fixes a bug that could result in scrapers using utilities.processDocuments malfunctioning
- fixes a bug that could result in the Scrape Progress chrome thingy sticking around forever
- makes chrome thingy disappear when URL changes or when tabs are switched
2006-06-29 03:22:10 +00:00
Simon Kornblith
45b9234996 addresses #78, figure out import/export architecture
- changes scrapers table to translators table; all import/export/web translators now belong in this table
- adds Scholar.Translate to handle translation issues. eventually, Scholar.Ingester.Document will become part of this interface
- adds Scholar_File_Interface (in fileInterface.js) to handle UI for export and eventually import. (David, when you have time, please connect Scholar_File_Interface.exportFile to a button.)
- adds an export translator for MODS. all of our metadata, but not our hierarchy (projects, etc.) translates directly and unambiguously into valid MODS. eventually, we can use RDF or another format to handle hierarchy.
- adds utilities.getVersion() and utilities.inArray() for simplified scraper coding
- fixes minor interface issues with the nifty chrome scraping status window
2006-06-29 00:56:50 +00:00
Simon Kornblith
257ed8f69b closes #68, figure out way to have scrapers work for gated resources behind proxies. most institutions use EZProxy for their proxy needs (or a more transparent proxy, which we support natively). this implementation is significantly better than the old one, which refused to work after you'd already logged in once, and is also simpler, because it's stateless. it has to observe every HTTP request, but there's no noticeable speed hit. it also still doesn't work when there's a link from one gated site to another gated site, but as far as i can tell, this only happens on the Gale Group site. 2006-06-27 04:08:21 +00:00
Simon Kornblith
4242c62b1b - Fix redundancy in utilities.js (I accidentally copied and pasted a much larger block of code than i meant to)
- Move processDocuments, a function for loading a DOM representation of a document or set of documents, to Scholar.Utilities.HTTP
- Add Scholar.Ingester.ingestURL, a simplified function to scrape a URL (closes #33)
2006-06-26 20:02:30 +00:00
Simon Kornblith
4535b220db Closes #84, make type icon in toolbar match item about to be scraped. It's not perfect, since to get everything right, we'd need to scrape the page as soon as it appears, but it provides a pretty good indication. Multiple items get the folder icon. If there's a better icon out there, it's pretty straightforward to implement. 2006-06-26 18:05:23 +00:00
Simon Kornblith
cb647aa607 remove vestigial code pieces and make usage clearer for Scholar.Utilities.HTTP 2006-06-26 16:32:19 +00:00
Simon Kornblith
ed47e0c84c Forgot to commit updated utilities... 2006-06-26 16:19:44 +00:00
Simon Kornblith
7148852955 make generic Scholar.Utilities class and HTTP-dependent Scholar.Utilities.Ingester and Scholar.Utilities.HTTP classes in preparation for import/export filters; split off into separate javascript file 2006-06-26 14:46:57 +00:00