zotero

Author	SHA1	Message	Date
Simon Kornblith	36a402713c	rename Scholar.Utilities.Ingester.HTTPUtilities to Scholar.Utilities.Ingester.HTTP for consistency	2006-08-11 16:34:22 +00:00
Simon Kornblith	064ecd17db	removes unnecessary pieces of piggy bank API from utilities and updates translators to abide by current translator guidelines	2006-08-11 15:28:18 +00:00
Simon Kornblith	6efd6d2cc4	closes #99 , add options for export	2006-08-08 23:00:33 +00:00
Simon Kornblith	3edb6e0286	closes #86 , steal EndNote download links Scholar should now attempt to process citation information from EndNote download links (MIME types application/x-endnote-refer and application/x-research-info-systems). in situations where Scholar cannot process the information, a standard helper app dialog will appear. this behavior is controlled by the preference extensions.scholar.parseEndNoteMIMETypes.	2006-08-08 21:17:07 +00:00
Simon Kornblith	504ebf8996	closes #162 , do sniffing for import formats import should now work regardless of file extensions. this should make #86 (steal EndNote download links) fairly easy to implement.	2006-08-08 02:46:52 +00:00
Simon Kornblith	216f0c7581	closes #83 , figure out how to implement OpenURL closes #76, implement extensible search/retrieval architecture for obtaining metadata OpenURL COinS lookup is now implemented using a real search architecture system. at the moment, it works with Open WorldCat for books, CrossRef for journal articles (provided the COinS object contains a DOI or an ISSN), and PubMed when a PMID is available.	2006-08-08 01:06:33 +00:00
Simon Kornblith	6626eba844	addresses #83 , figure out how to implement OpenURL OpenURL lookup now works for books. this means that all that's necessary to add scrapable book metadata to a page is an ISBN, as shown below: <span class="Z3988" title="ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.isbn=1579550088"></span> also, we can now scrape Open WorldCat and Wikipedia Book Sources pages with no specialized code involved. i'm still looking for a better way of looking up journal article metadata. it's currently implemented with CrossRef, but CrossRef simply will not work without a DOI, and is also incomplete (only holds the last name of the first author).	2006-08-07 05:15:30 +00:00
Simon Kornblith	e3d062a819	fix inappropriately truncated field values in InnoPAC	2006-08-07 01:49:56 +00:00
Simon Kornblith	2b5b65f4dd	addresses #83 , figure out how to implement OpenURL adds preliminary support for COinS microformat data. does not yet support COinS where there is only a DOI or ISBN.	2006-08-07 00:30:36 +00:00
Simon Kornblith	c0bab22016	bring scrapers into sync with updated database schema	2006-08-06 17:34:41 +00:00
Simon Kornblith	fc589a37cf	closes #131 , make import/export symmetrical all 4 import/export formats currently supported (MODS, Hybrid RDF, Unqualified Dublin Core, and RIS) now work as both import and export translators	2006-08-06 09:34:51 +00:00
Simon Kornblith	9144b56772	addresses #131 , make import/export symmetrical closes #163, make translator API allow creator types besides author import and export in the multi-ontology RDF format should now work properly. collections, notes, and see also are all preserved. more extensive testing will be necessary later.	2006-08-05 20:58:45 +00:00
Simon Kornblith	b4c8dbe700	closes #157 , add database infrastructure for different CSL styles CSL is stored in a new "csl" table. only metadata relevant to updates and selection (ID, date updated, and title) is stored in columns.	2006-08-03 04:54:16 +00:00
Simon Kornblith	6305e4cada	closes #55 , export bibliography to printable version closes #4, Make printable version - moves functions for creating and deleting hidden browser objects to scholar.js (from ingester.js), since these are necessary for printing as well - allows saving bibliography in HTML or printing bibliography. style support is not yet complete (pending finalization of 0.9 version of CSL specification).	2006-07-27 23:01:55 +00:00
Simon Kornblith	c64e5c841f	closes #78 , figure out import/export architecture closes #100, migrate ingester to Scholar.Translate closes #88, migrate scrapers away from RDF closes #9, pull out LC subject heading tags references #87, add fromArray() and toArray() methods to item objects API changes: all translation (import/export/web) now goes through Scholar.Translate all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone()) scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively) scrapers now contain functions (doImport, doExport, doWeb) rather than loose code scrapers can call functions in other scrapers or just call the function to translate itself export accesses items item-by-item, rather than accepting a huge array of items MARC functions are now in the MARC import translator, and accessed by the web translators new features: import now works rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment) items appear as they are scraped MARC import translator pulls out tags, although this seems to slow things down no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.	2006-07-17 04:06:58 +00:00
Simon Kornblith	d65328c830	adds Biblio/DC/FOAF/PRISM/VCard RDF export type. Bruce D'Arcus, author of CiteProc and co-lead on the OpenOffice bibliographic project, is currently using this as his ontology, and we can unambiguously encode all of our metadata with it. caveats: - it's not human readable. mozilla doesn't nest blank nodes, so everything's scattered throughout the file. it would be relatively easy to do post-processing with E4X or even regexps to correct this. - there's no generic callNumber field, so all callNumbers are encoded as LCC. adds container creation routines to dataMode rdf changes Dublin Core export to Unqualified Dublin Core, and removes DC Terms qualifiers	2006-07-07 18:41:21 +00:00
Simon Kornblith	c02666fcd3	add an API for Mozilla's RDF data source, so that import/export translators will be able to create and parse RDF with minimal effort convert Dublin Core export to new API	2006-07-06 21:55:46 +00:00
Simon Kornblith	b7124bd8c1	ack, update scrapers.sql version info	2006-07-06 03:41:18 +00:00
Simon Kornblith	2d8ed16d88	adds export of tags to MODS. adds export of seeAlso info and project hierarchy to RDF. for now, this is embedded in the modsCollection root element. uses nodeIDs for Dublin Core RDF.	2006-07-06 03:39:32 +00:00
Simon Kornblith	c0251085a9	Add export filters for RIS and Dublin Core RDF	2006-07-05 21:44:01 +00:00
Simon Kornblith	8b4a44be0f	fixes a bug that made the Google Books translator not appear adjusts the Google Books translator to work with the latest revision of the site renames the MODS translator to just MODS, because "Metadata Object Description Schema (MODS)" was too long for the export dialog	2006-06-30 19:21:36 +00:00
Simon Kornblith	77282c3edc	- fixes a bug that could result in scrapers using utilities.processDocuments malfunctioning - fixes a bug that could result in the Scrape Progress chrome thingy sticking around forever - makes chrome thingy disappear when URL changes or when tabs are switched	2006-06-29 03:22:10 +00:00
Simon Kornblith	cd25ecc034	I swear I've fixed this bug before, but make multiple item ingest work right for InnoPAC	2006-06-29 02:54:37 +00:00
Simon Kornblith	45b9234996	addresses #78 , figure out import/export architecture - changes scrapers table to translators table; all import/export/web translators now belong in this table - adds Scholar.Translate to handle translation issues. eventually, Scholar.Ingester.Document will become part of this interface - adds Scholar_File_Interface (in fileInterface.js) to handle UI for export and eventually import. (David, when you have time, please connect Scholar_File_Interface.exportFile to a button.) - adds an export translator for MODS. all of our metadata, but not our hierarchy (projects, etc.) translates directly and unambiguously into valid MODS. eventually, we can use RDF or another format to handle hierarchy. - adds utilities.getVersion() and utilities.inArray() for simplified scraper coding - fixes minor interface issues with the nifty chrome scraping status window	2006-06-29 00:56:50 +00:00
Simon Kornblith	19504e6746	- closes #73 , use chrome for "Scraping Progress..." indicator - multiple and book icons were swapped for Voyager scraper	2006-06-27 02:03:10 +00:00
Simon Kornblith	f1cc809f76	Add a generic scraper that will scrape any website, although it may not always find very much information. It looks at META tags, both Dublin Core and otherwise. When tags are ready, we can pull out META keywords.	2006-06-26 20:44:45 +00:00
Simon Kornblith	4242c62b1b	- Fix redundancy in utilities.js (I accidentally copied and pasted a much larger block of code than i meant to) - Move processDocuments, a function for loading a DOM representation of a document or set of documents, to Scholar.Utilities.HTTP - Add Scholar.Ingester.ingestURL, a simplified function to scrape a URL (closes #33)	2006-06-26 20:02:30 +00:00
Simon Kornblith	4535b220db	Closes #84 , make type icon in toolbar match item about to be scraped. It's not perfect, since to get everything right, we'd need to scrape the page as soon as it appears, but it provides a pretty good indication. Multiple items get the folder icon. If there's a better icon out there, it's pretty straightforward to implement.	2006-06-26 18:05:23 +00:00
Simon Kornblith	a33b119dff	grab ISBN from SIRSI 2003+ catalogs	2006-06-26 01:17:29 +00:00
Simon Kornblith	303c6ee68d	closes #41 , get library call number	2006-06-26 01:08:59 +00:00
Simon Kornblith	d73127b1b3	update modification times	2006-06-25 22:01:04 +00:00
Simon Kornblith	f6b0d9a541	search results scraping for InfoTrac. closes #15	2006-06-25 22:00:20 +00:00
Simon Kornblith	1ec834cef2	Search results scraping for Project MUSE	2006-06-25 21:12:14 +00:00
Simon Kornblith	6a627fad0a	Search results scraping for LexisNexis	2006-06-25 20:09:27 +00:00
Simon Kornblith	a48ea7dabf	Search results scraping for ProQuest	2006-06-25 19:32:49 +00:00
Simon Kornblith	7402577806	Add search results scraping for History Cooperative	2006-06-25 18:34:23 +00:00
Simon Kornblith	a9c79f6110	Search results scraping for JSTOR	2006-06-25 18:17:00 +00:00
Simon Kornblith	5e73dcdd2e	- Search results scraping for WorldCat. - Make scraperJavaScript run on reload again, because it makes debugging easier - There's not actually a memory leak in the proxyMonitor code.	2006-06-25 16:13:47 +00:00
Simon Kornblith	9e78d62b13	Better handling of itemTypes, and improved date handling in PubMed scraper.	2006-06-25 05:03:01 +00:00
Simon Kornblith	fd2052e63c	Search results scraping for PubMed and Google Books. This marks the end of what I can do with respect to #15 until I'm at home or CHNM, where I'll have access to the gated collections.	2006-06-24 17:33:35 +00:00
Simon Kornblith	260ce80086	- Search results scraping for TLC. This is the last of the library scrapers. - Minor fixes to ingester utilities.	2006-06-24 15:38:53 +00:00
Simon Kornblith	06cf9e7853	Search results scraping for SIRSI (old versions)	2006-06-24 14:35:05 +00:00
Simon Kornblith	6f19b215f5	Search result scraping for GEAC catalogs	2006-06-23 21:27:32 +00:00
Simon Kornblith	2b58ead7aa	Search results scraping for Dynix	2006-06-23 20:53:29 +00:00
Simon Kornblith	2a74e88416	- Make generalized function for finding search results case insensitive - Scrape DRA search results	2006-06-23 20:09:48 +00:00
Simon Kornblith	8fe72b3e3c	Search results scraping for VTLS	2006-06-23 19:22:24 +00:00
Simon Kornblith	641d7054cc	- Fixed some bugs in the InnoPAC scraper (search results) - Made an Aleph search results scraper that works correctly on most sites, and degrades nicely when it doesn't	2006-06-23 17:35:57 +00:00
Simon Kornblith	83c36f330d	Scrapable search results for SIRSI 2003+ scraper	2006-06-23 16:17:53 +00:00
Simon Kornblith	9742283389	InnoPAC scraper now handles search results pages	2006-06-23 14:12:34 +00:00
Simon Kornblith	098078627c	- Make events listening for DOMContentLoaded listen for load, because DOMContentLoaded does not seem ready for prime time (hey, it's undocumented, what can you expect) - Make Amazon scraper work with multiple documents - Fix bugs in processDocuments - Make Scholar.Ingester.Utilities.getItemArray() willing to take an array of DOM nodes to search for links, and finally take advantage of the fact that objects have no length	2006-06-23 03:02:30 +00:00

1 2

74 commits