2006-08-18 05:58:14 +00:00
|
|
|
// Scholar for Firefox Translate Engine
|
2006-06-29 00:56:50 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Scholar.Translate: a class for translation of Scholar metadata from and to
|
|
|
|
* other formats
|
|
|
|
*
|
|
|
|
* eventually, Scholar.Ingester may be rolled in here (i.e., after we get rid
|
|
|
|
* of RDF)
|
|
|
|
*
|
|
|
|
* type can be:
|
|
|
|
* export
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
* import
|
|
|
|
* web
|
2006-08-08 01:06:33 +00:00
|
|
|
* search
|
2006-06-29 00:56:50 +00:00
|
|
|
*
|
|
|
|
* a typical export process:
|
|
|
|
* var translatorObj = new Scholar.Translate();
|
|
|
|
* var possibleTranslators = translatorObj.getTranslators();
|
|
|
|
* // do something involving nsIFilePicker; remember, each possibleTranslator
|
|
|
|
* // object has properties translatorID, label, and targetID
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
* translatorObj.setLocation(myNsILocalFile);
|
2006-06-29 00:56:50 +00:00
|
|
|
* translatorObj.setTranslator(possibleTranslators[x]); // also accepts only an ID
|
|
|
|
* translatorObj.setHandler("done", _translationDone);
|
2006-07-06 21:55:46 +00:00
|
|
|
* translatorObj.translate();
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* PUBLIC PROPERTIES:
|
|
|
|
*
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
* type - the text type of translator (set by constructor, should be read only)
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
* document - the document object to be used for web scraping (read-only; set
|
|
|
|
* with setDocument)
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
* translator - the translator currently in use (read-only; set with
|
|
|
|
* setTranslator)
|
|
|
|
* location - the location of the target (read-only; set with setLocation)
|
2006-07-06 21:55:46 +00:00
|
|
|
* for import/export - this is an instance of nsILocalFile
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
* for web - this is a URL
|
2006-08-14 20:34:13 +00:00
|
|
|
* search - item (in toArray() format) to extrapolate data for (read-only; set
|
|
|
|
* with setSearch).
|
|
|
|
* items - items (in Scholar.Item format) to be exported. if this is empty,
|
|
|
|
* Scholar will export all items in the library (read-only; set with
|
|
|
|
* setItems). setting items disables export of collections.
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
* path - the path to the target; for web, this is the same as location
|
2006-08-08 21:17:07 +00:00
|
|
|
* string - the string content to be used as a file.
|
2006-08-08 01:06:33 +00:00
|
|
|
* saveItem - whether new items should be saved to the database. defaults to
|
|
|
|
* true; set using second argument of constructor.
|
2006-08-15 23:03:11 +00:00
|
|
|
* newItems - items created when translate() was called
|
|
|
|
* newCollections - collections created when translate() was called
|
2006-07-06 21:55:46 +00:00
|
|
|
*
|
|
|
|
* PRIVATE PROPERTIES:
|
|
|
|
*
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
* _numericTypes - possible numeric types as a comma-delimited string
|
2006-07-06 21:55:46 +00:00
|
|
|
* _handlers - handlers for various events (see setHandler)
|
|
|
|
* _configOptions - options set by translator modifying behavior of
|
|
|
|
* Scholar.Translate
|
|
|
|
* _displayOptions - options available to user for this specific translator
|
|
|
|
* _waitForCompletion - whether to wait for asynchronous completion, or return
|
|
|
|
* immediately when script has finished executing
|
|
|
|
* _sandbox - sandbox in which translators will be executed
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
* _streams - streams that need to be closed when execution is complete
|
2006-08-05 20:58:45 +00:00
|
|
|
* _IDMap - a map from IDs as specified in Scholar.Item() to IDs of actual items
|
2006-08-08 01:06:33 +00:00
|
|
|
* _parentTranslator - set when a translator is called from another translator.
|
|
|
|
* among other things, disables passing of the translate
|
|
|
|
* object to handlers and modifies complete() function on
|
|
|
|
* returned items
|
2006-08-31 07:52:28 +00:00
|
|
|
* _storage - the stored string to be treated as input
|
|
|
|
* _storageLength - the length of the stored string
|
2006-08-18 05:58:14 +00:00
|
|
|
* _exportFileDirectory - the directory to which files will be exported
|
2006-09-08 20:44:05 +00:00
|
|
|
* _hasBOM - whether the given file ready to be imported has a BOM or not
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
*
|
|
|
|
* WEB-ONLY PRIVATE PROPERTIES:
|
|
|
|
*
|
|
|
|
* _locationIsProxied - whether the URL being scraped is going through
|
|
|
|
* an EZProxy
|
2006-08-17 07:56:01 +00:00
|
|
|
* _downloadAssociatedFiles - whether to download content, according to
|
|
|
|
* preferences
|
2006-06-29 00:56:50 +00:00
|
|
|
*/
|
2006-07-06 21:55:46 +00:00
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
Scholar.Translate = function(type, saveItem) {
|
2006-06-29 00:56:50 +00:00
|
|
|
this.type = type;
|
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
// import = 0001 = 1
|
|
|
|
// export = 0010 = 2
|
|
|
|
// web = 0100 = 4
|
|
|
|
// search = 1000 = 8
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
|
|
|
// combination types determined by addition or bitwise AND
|
|
|
|
// i.e., import+export = 1+2 = 3
|
2006-08-08 01:06:33 +00:00
|
|
|
this._numericTypes = "";
|
|
|
|
for(var i=0; i<=1; i++) {
|
|
|
|
for(var j=0; j<=1; j++) {
|
|
|
|
for(var k=0; k<=1; k++) {
|
|
|
|
if(type == "import") {
|
|
|
|
this._numericTypes += ","+parseInt(i.toString()+j.toString()+k.toString()+"1", 2);
|
|
|
|
} else if(type == "export") {
|
|
|
|
this._numericTypes += ","+parseInt(i.toString()+j.toString()+"1"+k.toString(), 2);
|
|
|
|
} else if(type == "web") {
|
|
|
|
this._numericTypes += ","+parseInt(i.toString()+"1"+j.toString()+k.toString(), 2);
|
|
|
|
} else if(type == "search") {
|
|
|
|
this._numericTypes += ","+parseInt("1"+i.toString()+j.toString()+k.toString(), 2);
|
|
|
|
} else {
|
|
|
|
throw("invalid import type");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
this._numericTypes = this._numericTypes.substr(1);
|
|
|
|
|
|
|
|
if(saveItem === false) { // three equals signs means if it's left
|
|
|
|
// undefined, this.saveItem will still be true
|
|
|
|
this.saveItem = false;
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
} else {
|
2006-08-08 01:06:33 +00:00
|
|
|
this.saveItem = true;
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
this._handlers = new Array();
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
this._streams = new Array();
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
/*
|
|
|
|
* (singleton) initializes scrapers, loading from the database and separating
|
|
|
|
* into types
|
|
|
|
*/
|
|
|
|
Scholar.Translate.init = function() {
|
|
|
|
if(!Scholar.Translate.cache) {
|
|
|
|
var cachePref = Scholar.Prefs.get("cacheTranslatorData");
|
|
|
|
|
|
|
|
if(cachePref) {
|
|
|
|
// fetch translator list
|
2006-08-30 19:57:23 +00:00
|
|
|
var translators = Scholar.DB.query("SELECT translatorID, type, label, "+
|
|
|
|
"target, detectCode IS NULL as noDetectCode FROM translators "+
|
|
|
|
"ORDER BY target IS NULL, translatorID = '14763d24-8ba0-45df-8f52-b8d1108e7ac9' DESC");
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
var detectCodes = Scholar.DB.query("SELECT translatorID, detectCode FROM translators WHERE target IS NULL");
|
|
|
|
|
|
|
|
Scholar.Translate.cache = new Object();
|
|
|
|
Scholar.Translate.cache["import"] = new Array();
|
|
|
|
Scholar.Translate.cache["export"] = new Array();
|
|
|
|
Scholar.Translate.cache["web"] = new Array();
|
|
|
|
Scholar.Translate.cache["search"] = new Array();
|
|
|
|
|
|
|
|
for each(translator in translators) {
|
|
|
|
var type = translator.type;
|
|
|
|
|
|
|
|
// not sure why this is necessary
|
|
|
|
var wrappedTranslator = {translatorID:translator.translatorID,
|
|
|
|
label:translator.label,
|
|
|
|
target:translator.target}
|
|
|
|
|
|
|
|
if(translator.noDetectCode) {
|
|
|
|
wrappedTranslator.noDetectCode = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
// import translator
|
|
|
|
var mod = type % 2;
|
|
|
|
if(mod) {
|
|
|
|
var regexp = new RegExp();
|
|
|
|
regexp.compile("\."+translator.target+"$", "i");
|
|
|
|
wrappedTranslator.importRegexp = regexp;
|
|
|
|
Scholar.Translate.cache["import"].push(wrappedTranslator);
|
|
|
|
type -= mod;
|
|
|
|
}
|
|
|
|
// search translator
|
|
|
|
var mod = type % 4;
|
|
|
|
if(mod) {
|
|
|
|
Scholar.Translate.cache["export"].push(wrappedTranslator);
|
|
|
|
type -= mod;
|
|
|
|
}
|
|
|
|
// web translator
|
|
|
|
var mod = type % 8;
|
|
|
|
if(mod) {
|
|
|
|
var regexp = new RegExp();
|
|
|
|
regexp.compile(translator.target, "i");
|
|
|
|
wrappedTranslator.webRegexp = regexp;
|
|
|
|
Scholar.Translate.cache["web"].push(wrappedTranslator);
|
|
|
|
|
|
|
|
if(!translator.target) {
|
|
|
|
for each(var detectCode in detectCodes) {
|
|
|
|
if(detectCode.translatorID == translator.translatorID) {
|
|
|
|
wrappedTranslator.detectCode = detectCode.detectCode;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
type -= mod;
|
|
|
|
}
|
|
|
|
// search translator
|
|
|
|
var mod = type % 16;
|
|
|
|
if(mod) {
|
|
|
|
Scholar.Translate.cache["search"].push(wrappedTranslator);
|
|
|
|
type -= mod;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-06-29 00:56:50 +00:00
|
|
|
/*
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
* sets the browser to be used for web translation; also sets the location
|
2006-06-29 00:56:50 +00:00
|
|
|
*/
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
Scholar.Translate.prototype.setDocument = function(doc) {
|
|
|
|
this.document = doc;
|
|
|
|
this.setLocation(doc.location.href);
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
/*
|
|
|
|
* sets the item to be used for searching
|
|
|
|
*/
|
2006-08-14 20:34:13 +00:00
|
|
|
Scholar.Translate.prototype.setSearch = function(search) {
|
|
|
|
this.search = search;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* sets the item to be used for export
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype.setItems = function(items) {
|
|
|
|
this.items = items;
|
2006-08-08 01:06:33 +00:00
|
|
|
}
|
|
|
|
|
2006-06-29 00:56:50 +00:00
|
|
|
/*
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
* sets the location to operate upon (file should be an nsILocalFile object or
|
|
|
|
* web address)
|
2006-06-29 00:56:50 +00:00
|
|
|
*/
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
Scholar.Translate.prototype.setLocation = function(location) {
|
|
|
|
if(this.type == "web") {
|
|
|
|
// account for proxies
|
|
|
|
this.location = Scholar.Ingester.ProxyMonitor.proxyToProper(location);
|
|
|
|
if(this.location != location) {
|
|
|
|
// figure out if this URL is being proxies
|
|
|
|
this.locationIsProxied = true;
|
|
|
|
}
|
|
|
|
this.path = this.location;
|
|
|
|
} else {
|
|
|
|
this.location = location;
|
2006-08-08 21:17:07 +00:00
|
|
|
if(this.location instanceof Components.interfaces.nsIFile) { // if a file
|
|
|
|
this.path = location.path;
|
|
|
|
} else { // if a url
|
|
|
|
this.path = location;
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
|
2006-08-08 21:17:07 +00:00
|
|
|
/*
|
|
|
|
* sets the string to be used as a file
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype.setString = function(string) {
|
2006-08-31 07:45:03 +00:00
|
|
|
this._storage = string;
|
|
|
|
this._storageLength = string.length;
|
|
|
|
this._storagePointer = 0;
|
2006-08-08 21:17:07 +00:00
|
|
|
}
|
|
|
|
|
2006-09-04 22:57:23 +00:00
|
|
|
/*
|
|
|
|
* sets translator display options. you can also pass a translator (not ID) to
|
|
|
|
* setTranslator that includes a displayOptions argument
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype.setDisplayOptions = function(displayOptions) {
|
|
|
|
this._setDisplayOptions = displayOptions;
|
|
|
|
}
|
|
|
|
|
2006-06-29 00:56:50 +00:00
|
|
|
/*
|
|
|
|
* sets the translator to be used for import/export
|
|
|
|
*
|
|
|
|
* accepts either the object from getTranslators() or an ID
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype.setTranslator = function(translator) {
|
2006-08-08 01:06:33 +00:00
|
|
|
if(!translator) {
|
|
|
|
throw("cannot set translator: invalid value");
|
|
|
|
}
|
|
|
|
|
2006-09-04 22:57:23 +00:00
|
|
|
this._setDisplayOptions = null;
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
if(typeof(translator) == "object") { // passed an object and not an ID
|
2006-08-08 01:06:33 +00:00
|
|
|
if(translator.translatorID) {
|
2006-09-04 22:57:23 +00:00
|
|
|
if(translator.displayOptions) {
|
|
|
|
this._setDisplayOptions = translator.displayOptions;
|
|
|
|
}
|
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
translator = [translator.translatorID];
|
|
|
|
} else {
|
|
|
|
// we have an associative array of translators
|
|
|
|
if(this.type != "search") {
|
|
|
|
throw("cannot set translator: a single translator must be specified when doing "+this.type+" translation");
|
|
|
|
}
|
|
|
|
// accept a list of objects
|
|
|
|
for(var i in translator) {
|
|
|
|
if(typeof(translator[i]) == "object") {
|
|
|
|
if(translator[i].translatorID) {
|
|
|
|
translator[i] = translator[i].translatorID;
|
|
|
|
} else {
|
|
|
|
throw("cannot set translator: must specify a single translator or a list of translators");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
translator = [translator];
|
|
|
|
}
|
|
|
|
|
2006-08-26 21:36:49 +00:00
|
|
|
if(!translator.length) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
var where = "";
|
|
|
|
for(var i in translator) {
|
|
|
|
where += " OR translatorID = ?";
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
2006-08-08 01:06:33 +00:00
|
|
|
where = where.substr(4);
|
2006-06-29 00:56:50 +00:00
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
var sql = "SELECT * FROM translators WHERE "+where+" AND type IN ("+this._numericTypes+")";
|
|
|
|
this.translator = Scholar.DB.query(sql, translator);
|
2006-07-06 21:55:46 +00:00
|
|
|
if(!this.translator) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* registers a handler function to be called when translation is complete
|
|
|
|
*
|
|
|
|
* as the first argument, all handlers will be passed the current function. the
|
|
|
|
* second argument is dependent on the handler.
|
2006-08-08 23:00:33 +00:00
|
|
|
*
|
2006-06-29 00:56:50 +00:00
|
|
|
* select
|
|
|
|
* valid: web
|
|
|
|
* called: when the user needs to select from a list of available items
|
|
|
|
* passed: an associative array in the form id => text
|
|
|
|
* returns: a numerically indexed array of ids, as extracted from the passed
|
|
|
|
* string
|
|
|
|
*
|
2006-08-02 21:06:58 +00:00
|
|
|
* itemCount
|
|
|
|
* valid: export
|
|
|
|
* called: when the export
|
|
|
|
* passed: the number of items to be processed
|
|
|
|
* returns: N/A
|
|
|
|
*
|
2006-06-29 00:56:50 +00:00
|
|
|
* itemDone
|
2006-08-08 01:06:33 +00:00
|
|
|
* valid: import, web, search
|
2006-06-29 00:56:50 +00:00
|
|
|
* called: when an item has been processed; may be called asynchronously
|
|
|
|
* passed: an item object (see Scholar.Item)
|
|
|
|
* returns: N/A
|
2006-08-05 20:58:45 +00:00
|
|
|
*
|
|
|
|
* collectionDone
|
2006-08-08 01:06:33 +00:00
|
|
|
* valid: import
|
2006-08-05 20:58:45 +00:00
|
|
|
* called: when a collection has been processed, after all items have been
|
|
|
|
* added; may be called asynchronously
|
|
|
|
* passed: a collection object (see Scholar.Collection)
|
|
|
|
* returns: N/A
|
2006-06-29 00:56:50 +00:00
|
|
|
*
|
|
|
|
* done
|
|
|
|
* valid: all
|
|
|
|
* called: when all processing is finished
|
2006-08-02 21:06:58 +00:00
|
|
|
* passed: true if successful, false if an error occurred
|
2006-06-29 00:56:50 +00:00
|
|
|
* returns: N/A
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype.setHandler = function(type, handler) {
|
2006-07-06 21:55:46 +00:00
|
|
|
if(!this._handlers[type]) {
|
|
|
|
this._handlers[type] = new Array();
|
|
|
|
}
|
|
|
|
this._handlers[type].push(handler);
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
/*
|
|
|
|
* gets all applicable translators
|
|
|
|
*
|
|
|
|
* for import, you should call this after setFile; otherwise, you'll just get
|
|
|
|
* a list of all import filters, not filters equipped to handle a specific file
|
|
|
|
*
|
|
|
|
* this returns a list of translator objects, of which the following fields
|
|
|
|
* are useful:
|
|
|
|
*
|
|
|
|
* translatorID - the GUID of the translator
|
|
|
|
* label - the name of the translator
|
|
|
|
* itemType - the type of item this scraper says it will scrape
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype.getTranslators = function() {
|
2006-09-08 20:44:05 +00:00
|
|
|
// clear BOM
|
|
|
|
this._hasBOM = null;
|
|
|
|
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
if(Scholar.Translate.cache) {
|
|
|
|
var translators = Scholar.Translate.cache[this.type];
|
|
|
|
} else {
|
2006-08-30 19:57:23 +00:00
|
|
|
var sql = "SELECT translatorID, label, target, detectCode IS NULL as "+
|
|
|
|
"noDetectCode FROM translators WHERE type IN ("+this._numericTypes+") "+
|
|
|
|
"ORDER BY target IS NULL, translatorID = '14763d24-8ba0-45df-8f52-b8d1108e7ac9' DESC";
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
var translators = Scholar.DB.query(sql);
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
2006-09-04 22:57:23 +00:00
|
|
|
// create a new sandbox
|
|
|
|
this._generateSandbox();
|
|
|
|
|
|
|
|
var possibleTranslators = new Array();
|
|
|
|
Scholar.debug("searching for translators for "+(this.path ? this.path : "an undisclosed location"));
|
|
|
|
|
|
|
|
// see which translators can translate
|
|
|
|
var possibleTranslators = this._findTranslators(translators);
|
|
|
|
|
2006-09-05 07:51:55 +00:00
|
|
|
this._closeStreams();
|
|
|
|
|
2006-09-04 22:57:23 +00:00
|
|
|
return possibleTranslators;
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
|
2006-06-29 00:56:50 +00:00
|
|
|
/*
|
2006-08-08 23:00:33 +00:00
|
|
|
* finds applicable translators from a list. if the second argument is given,
|
|
|
|
* extension-based exclusion is inverted, so that only detectCode is used to
|
|
|
|
* determine if a translator can be run.
|
2006-06-29 00:56:50 +00:00
|
|
|
*/
|
2006-08-08 02:46:52 +00:00
|
|
|
Scholar.Translate.prototype._findTranslators = function(translators, ignoreExtensions) {
|
|
|
|
var possibleTranslators = new Array();
|
|
|
|
for(var i in translators) {
|
|
|
|
if(this._canTranslate(translators[i], ignoreExtensions)) {
|
|
|
|
Scholar.debug("found translator "+translators[i].label);
|
|
|
|
|
|
|
|
// for some reason, and i'm not quite sure what this reason is,
|
|
|
|
// we HAVE to do this to get things to work right; we can't
|
|
|
|
// just push a normal translator object from an SQL statement
|
|
|
|
var translator = {translatorID:translators[i].translatorID,
|
|
|
|
label:translators[i].label,
|
|
|
|
target:translators[i].target,
|
|
|
|
itemType:translators[i].itemType}
|
2006-09-04 22:57:23 +00:00
|
|
|
if(this.type == "export") {
|
|
|
|
translator.displayOptions = this._displayOptions;
|
|
|
|
}
|
2006-08-08 02:46:52 +00:00
|
|
|
|
|
|
|
possibleTranslators.push(translator);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if(!possibleTranslators.length && this.type == "import" && !ignoreExtensions) {
|
|
|
|
Scholar.debug("looking a second time");
|
|
|
|
// try search again, ignoring file extensions
|
|
|
|
return this._findTranslators(translators, true);
|
|
|
|
}
|
|
|
|
return possibleTranslators;
|
|
|
|
}
|
|
|
|
|
2006-08-08 23:00:33 +00:00
|
|
|
/*
|
|
|
|
* loads a translator into a sandbox
|
|
|
|
*/
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
Scholar.Translate.prototype._loadTranslator = function() {
|
2006-08-08 01:06:33 +00:00
|
|
|
if(!this._sandbox || this.type == "search") {
|
|
|
|
// create a new sandbox if none exists, or for searching (so that it's
|
|
|
|
// bound to the correct url)
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
this._generateSandbox();
|
|
|
|
}
|
|
|
|
|
|
|
|
// parse detect code for the translator
|
2006-08-08 01:06:33 +00:00
|
|
|
this._parseDetectCode(this.translator[0]);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
Scholar.debug("parsing code for "+this.translator[0].label);
|
2006-06-29 00:56:50 +00:00
|
|
|
|
|
|
|
try {
|
2006-08-08 01:06:33 +00:00
|
|
|
Components.utils.evalInSandbox(this.translator[0].code, this._sandbox);
|
2006-06-29 00:56:50 +00:00
|
|
|
} catch(e) {
|
2006-09-08 20:44:05 +00:00
|
|
|
var error = e+' in parsing code for '+this.translator[0].label;
|
|
|
|
if(this._parentTranslator) {
|
|
|
|
throw error;
|
|
|
|
} else {
|
|
|
|
Scholar.debug(error);
|
|
|
|
this._translationComplete(false);
|
|
|
|
return false;
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* does the actual translation
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype.translate = function() {
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
Scholar.debug("translate called");
|
|
|
|
|
2006-09-08 20:44:05 +00:00
|
|
|
/*
|
|
|
|
* initialize properties
|
|
|
|
*/
|
2006-08-15 23:03:11 +00:00
|
|
|
this.newItems = new Array();
|
|
|
|
this.newCollections = new Array();
|
2006-08-05 20:58:45 +00:00
|
|
|
this._IDMap = new Array();
|
2006-08-08 01:06:33 +00:00
|
|
|
this._complete = false;
|
2006-09-08 20:44:05 +00:00
|
|
|
this._hasBOM = null;
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
if(!this.translator || !this.translator.length) {
|
|
|
|
throw("cannot translate: no translator specified");
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
|
2006-08-31 07:45:03 +00:00
|
|
|
if(!this.location && this.type != "search" && !this._storage) {
|
2006-08-08 01:06:33 +00:00
|
|
|
// searches operate differently, because we could have an array of
|
|
|
|
// translators and have to go through each
|
|
|
|
throw("cannot translate: no location specified");
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
|
|
|
if(!this._loadTranslator()) {
|
|
|
|
return;
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
|
2006-09-04 22:57:23 +00:00
|
|
|
if(this._setDisplayOptions) {
|
|
|
|
this._displayOptions = this._setDisplayOptions;
|
|
|
|
}
|
|
|
|
|
2006-08-31 07:45:03 +00:00
|
|
|
if(this._storage) {
|
|
|
|
// enable reading from storage, which we can't do until the translator
|
|
|
|
// is loaded
|
|
|
|
this._storageFunctions(true);
|
|
|
|
}
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
var returnValue;
|
|
|
|
if(this.type == "web") {
|
|
|
|
returnValue = this._web();
|
|
|
|
} else if(this.type == "import") {
|
|
|
|
returnValue = this._import();
|
|
|
|
} else if(this.type == "export") {
|
|
|
|
returnValue = this._export();
|
2006-08-08 01:06:33 +00:00
|
|
|
} else if(this.type == "search") {
|
|
|
|
returnValue = this._search();
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
2006-08-08 01:06:33 +00:00
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
if(!returnValue) {
|
|
|
|
// failure
|
|
|
|
this._translationComplete(false);
|
|
|
|
} else if(!this._waitForCompletion) {
|
|
|
|
// if synchronous, call _translationComplete();
|
|
|
|
this._translationComplete(true);
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* generates a sandbox for scraping/scraper detection
|
|
|
|
*/
|
2006-08-08 01:06:33 +00:00
|
|
|
Scholar.Translate._searchSandboxRegexp = new RegExp();
|
|
|
|
Scholar.Translate._searchSandboxRegexp.compile("^http://[\\w.]+/");
|
2006-06-29 00:56:50 +00:00
|
|
|
Scholar.Translate.prototype._generateSandbox = function() {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
var me = this;
|
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
if(this.type == "web" || this.type == "search") {
|
|
|
|
// get sandbox URL
|
|
|
|
var sandboxURL = "";
|
|
|
|
if(this.type == "web") {
|
|
|
|
// use real URL, not proxied version, to create sandbox
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
sandboxURL = this.document.location.href;
|
2006-08-08 01:06:33 +00:00
|
|
|
} else {
|
|
|
|
// generate sandbox for search by extracting domain from translator
|
|
|
|
// target, if one exists
|
|
|
|
if(this.translator && this.translator[0] && this.translator[0].target) {
|
|
|
|
// so that web translators work too
|
|
|
|
var tempURL = this.translator[0].target.replace(/\\/g, "").replace(/\^/g, "");
|
|
|
|
var m = Scholar.Translate._searchSandboxRegexp.exec(tempURL);
|
|
|
|
if(m) {
|
|
|
|
sandboxURL = m[0];
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
Scholar.debug("binding sandbox to "+sandboxURL);
|
|
|
|
this._sandbox = new Components.utils.Sandbox(sandboxURL);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
this._sandbox.Scholar = new Object();
|
|
|
|
|
|
|
|
// add ingester utilities
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
this._sandbox.Scholar.Utilities = new Scholar.Utilities.Ingester(this);
|
|
|
|
this._sandbox.Scholar.Utilities.HTTP = new Scholar.Utilities.Ingester.HTTP(this);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
|
|
|
// set up selectItems handler
|
|
|
|
this._sandbox.Scholar.selectItems = function(options) { return me._selectItems(options) };
|
2006-06-29 00:56:50 +00:00
|
|
|
} else {
|
2006-08-08 01:06:33 +00:00
|
|
|
// use null URL to create sandbox
|
2006-06-29 00:56:50 +00:00
|
|
|
this._sandbox = new Components.utils.Sandbox("");
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
this._sandbox.Scholar = new Object();
|
|
|
|
|
|
|
|
this._sandbox.Scholar.Utilities = new Scholar.Utilities();
|
|
|
|
}
|
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
|
|
|
|
if(this.type == "export") {
|
|
|
|
// add routines to retrieve items and collections
|
|
|
|
this._sandbox.Scholar.nextItem = function() { return me._exportGetItem() };
|
|
|
|
this._sandbox.Scholar.nextCollection = function() { return me._exportGetCollection() }
|
|
|
|
} else {
|
2006-09-08 05:47:47 +00:00
|
|
|
// copy routines to add new items
|
|
|
|
this._sandbox.Scholar.Item = Scholar.Translate.GenerateScholarItemClass();
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
this._sandbox.Scholar.Item.prototype.complete = function() {me._itemDone(this)};
|
2006-08-05 20:58:45 +00:00
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
if(this.type == "import") {
|
|
|
|
// add routines to add new collections
|
2006-09-08 05:47:47 +00:00
|
|
|
this._sandbox.Scholar.Collection = Scholar.Translate.GenerateScholarItemClass();
|
2006-08-08 01:06:33 +00:00
|
|
|
// attach the function to be run when a collection is done
|
|
|
|
this._sandbox.Scholar.Collection.prototype.complete = function() {me._collectionDone(this)};
|
|
|
|
}
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
this._sandbox.XPathResult = Components.interfaces.nsIDOMXPathResult;
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
// for asynchronous operation, use wait()
|
|
|
|
// done() is implemented after wait() is called
|
|
|
|
this._sandbox.Scholar.wait = function() { me._enableAsynchronous() };
|
|
|
|
// for adding configuration options
|
|
|
|
this._sandbox.Scholar.configure = function(option, value) {me._configure(option, value) };
|
|
|
|
// for adding displayed options
|
|
|
|
this._sandbox.Scholar.addOption = function(option, value) {me._addOption(option, value) };
|
2006-08-08 23:00:33 +00:00
|
|
|
// for getting the value of displayed options
|
|
|
|
this._sandbox.Scholar.getOption = function(option) { return me._getOption(option) };
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
|
|
|
// for loading other translators and accessing their methods
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
this._sandbox.Scholar.loadTranslator = function(type) {
|
|
|
|
var translation = new Scholar.Translate(type, false);
|
2006-08-17 07:56:01 +00:00
|
|
|
translation._parentTranslator = me;
|
|
|
|
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
if(type == "export" && (this.type == "web" || this.type == "search")) {
|
|
|
|
throw("for security reasons, web and search translators may not call export translators");
|
|
|
|
}
|
|
|
|
|
|
|
|
// for security reasons, safeTranslator wraps the translator object.
|
|
|
|
// note that setLocation() is not allowed
|
|
|
|
var safeTranslator = new Object();
|
|
|
|
safeTranslator.setSearch = function(arg) { return translation.setSearch(arg) };
|
|
|
|
safeTranslator.setBrowser = function(arg) { return translation.setBrowser(arg) };
|
|
|
|
safeTranslator.setHandler = function(arg1, arg2) { translation.setHandler(arg1, arg2) };
|
|
|
|
safeTranslator.setString = function(arg) { translation.setString(arg) };
|
|
|
|
safeTranslator.setTranslator = function(arg) { return translation.setTranslator(arg) };
|
|
|
|
safeTranslator.getTranslators = function() { return translation.getTranslators() };
|
|
|
|
safeTranslator.translate = function() {
|
|
|
|
var noHandlers = true;
|
|
|
|
for(var i in translation._handlers) {
|
|
|
|
noHandlers = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if(noHandlers) {
|
|
|
|
if(type != "export") {
|
|
|
|
translation.setHandler("itemDone", function(obj, item) { item.complete() });
|
|
|
|
}
|
|
|
|
if(type == "web") {
|
|
|
|
translation.setHandler("selectItems", me._handlers["selectItems"]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return translation.translate()
|
|
|
|
};
|
|
|
|
safeTranslator.getTranslatorObject = function() {
|
2006-08-08 01:06:33 +00:00
|
|
|
// load the translator into our sandbox
|
|
|
|
translation._loadTranslator();
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
// initialize internal IO
|
2006-08-08 01:06:33 +00:00
|
|
|
translation._initializeInternalIO();
|
2006-08-15 23:03:11 +00:00
|
|
|
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
var noHandlers = true;
|
|
|
|
for(var i in translation._handlers) {
|
|
|
|
noHandlers = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if(noHandlers) {
|
|
|
|
if(type != "export") {
|
|
|
|
translation.setHandler("itemDone", function(obj, item) { item.complete() });
|
|
|
|
}
|
|
|
|
if(type == "web") {
|
|
|
|
translation.setHandler("selectItems", me._handlers["selectItems"]);
|
|
|
|
}
|
2006-08-08 01:06:33 +00:00
|
|
|
}
|
|
|
|
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
// return sandbox
|
|
|
|
return translation._sandbox;
|
|
|
|
};
|
|
|
|
|
|
|
|
return safeTranslator;
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
2006-07-06 21:55:46 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
* Check to see if _scraper_ can scrape this document
|
2006-07-06 21:55:46 +00:00
|
|
|
*/
|
2006-09-05 07:51:55 +00:00
|
|
|
Scholar.Translate.prototype._canTranslate = function(translator, ignoreExtensions) {
|
|
|
|
if((this.type == "import" || this.type == "web") && !this.location) {
|
|
|
|
// if no location yet (e.g., getting list of possible web translators),
|
|
|
|
// just return true
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
// Test location with regular expression
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
if(translator.target && (this.type == "import" || this.type == "web")) {
|
2006-08-08 01:06:33 +00:00
|
|
|
var canTranslate = false;
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
if(this.type == "web") {
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
if(translator.webRegexp) {
|
|
|
|
var regularExpression = translator.webRegexp;
|
|
|
|
} else {
|
|
|
|
var regularExpression = new RegExp(translator.target, "i");
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
} else {
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
if(translator.importRegexp) {
|
|
|
|
var regularExpression = translator.importRegexp;
|
|
|
|
} else {
|
2006-09-05 07:51:55 +00:00
|
|
|
var regularExpression = new RegExp("\\."+translator.target+"$", "i");
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if(regularExpression.test(this.path)) {
|
|
|
|
canTranslate = true;
|
|
|
|
}
|
2006-08-08 02:46:52 +00:00
|
|
|
|
|
|
|
if(ignoreExtensions) {
|
|
|
|
// if we're ignoring extensions, that means we already tried
|
|
|
|
// everything without ignoring extensions and it didn't work
|
|
|
|
canTranslate = !canTranslate;
|
2006-08-08 23:00:33 +00:00
|
|
|
|
|
|
|
// if a translator has no detectCode, don't offer it as an option
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
if(translator.noDetectCode) {
|
2006-08-08 23:00:33 +00:00
|
|
|
return false;
|
|
|
|
}
|
2006-08-08 02:46:52 +00:00
|
|
|
}
|
2006-08-08 01:06:33 +00:00
|
|
|
} else {
|
|
|
|
var canTranslate = true;
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Test with JavaScript if available and didn't have a regular expression or
|
|
|
|
// passed regular expression test
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
if(!translator.target || canTranslate) {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
// parse the detect code and execute
|
|
|
|
this._parseDetectCode(translator);
|
|
|
|
|
|
|
|
if(this.type == "import") {
|
|
|
|
try {
|
|
|
|
this._importConfigureIO(); // so it can read
|
|
|
|
} catch(e) {
|
|
|
|
Scholar.debug(e+' in opening IO for '+translator.label);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
if((this.type == "web" && this._sandbox.detectWeb) ||
|
|
|
|
(this.type == "search" && this._sandbox.detectSearch) ||
|
|
|
|
(this.type == "import" && this._sandbox.detectImport) ||
|
|
|
|
(this.type == "export" && this._sandbox.detectExport)) {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
var returnValue;
|
|
|
|
|
|
|
|
try {
|
|
|
|
if(this.type == "web") {
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
returnValue = this._sandbox.detectWeb(this.document, this.location);
|
2006-08-08 01:06:33 +00:00
|
|
|
} else if(this.type == "search") {
|
2006-08-14 20:34:13 +00:00
|
|
|
returnValue = this._sandbox.detectSearch(this.search);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
} else if(this.type == "import") {
|
2006-08-08 01:06:33 +00:00
|
|
|
returnValue = this._sandbox.detectImport();
|
|
|
|
} else if(this.type == "export") {
|
|
|
|
returnValue = this._sandbox.detectExport();
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
} catch(e) {
|
|
|
|
Scholar.debug(e+' in executing detectCode for '+translator.label);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
Scholar.debug("executed detectCode for "+translator.label);
|
|
|
|
|
|
|
|
// detectCode returns text type
|
|
|
|
if(returnValue) {
|
|
|
|
canTranslate = true;
|
|
|
|
|
|
|
|
if(typeof(returnValue) == "string") {
|
|
|
|
translator.itemType = returnValue;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
canTranslate = false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return canTranslate;
|
|
|
|
}
|
2006-09-06 03:49:41 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* parses translator detect code
|
|
|
|
*/
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
Scholar.Translate.prototype._parseDetectCode = function(translator) {
|
2006-07-06 21:55:46 +00:00
|
|
|
this._configOptions = new Array();
|
|
|
|
this._displayOptions = new Array();
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
if(translator.detectCode) {
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
var detectCode = translator.detectCode;
|
|
|
|
} else if(!translator.noDetectCode) {
|
|
|
|
// get detect code from database
|
|
|
|
var detectCode = Scholar.DB.valueQuery("SELECT detectCode FROM translators WHERE translatorID = ?",
|
|
|
|
[translator.translatorID]);
|
|
|
|
}
|
|
|
|
|
|
|
|
if(detectCode) {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
try {
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
Components.utils.evalInSandbox(detectCode, this._sandbox);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
} catch(e) {
|
|
|
|
Scholar.debug(e+' in parsing detectCode for '+translator.label);
|
|
|
|
return;
|
|
|
|
}
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-07-06 21:55:46 +00:00
|
|
|
/*
|
|
|
|
* sets an option that modifies the way the translator is executed
|
|
|
|
*
|
|
|
|
* called as configure() in translator detectCode
|
|
|
|
*
|
|
|
|
* current options:
|
|
|
|
*
|
|
|
|
* dataMode
|
|
|
|
* valid: import, export
|
2006-08-18 05:58:14 +00:00
|
|
|
* options: rdf, block, line
|
2006-07-06 21:55:46 +00:00
|
|
|
* purpose: selects whether write/read behave as standard text functions or
|
|
|
|
* using Mozilla's built-in support for RDF data sources
|
|
|
|
*
|
|
|
|
* getCollections
|
|
|
|
* valid: export
|
|
|
|
* options: true, false
|
|
|
|
* purpose: selects whether export translator will receive an array of
|
|
|
|
* collections and children in addition to the array of items and
|
|
|
|
* children
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._configure = function(option, value) {
|
|
|
|
this._configOptions[option] = value;
|
|
|
|
Scholar.debug("setting configure option "+option+" to "+value);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* adds translator options to be displayed in a dialog
|
|
|
|
*
|
|
|
|
* called as addOption() in detect code
|
|
|
|
*
|
2006-08-18 05:58:14 +00:00
|
|
|
* current options are exportNotes and exportFileData
|
2006-07-06 21:55:46 +00:00
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._addOption = function(option, value) {
|
|
|
|
this._displayOptions[option] = value;
|
|
|
|
Scholar.debug("setting display option "+option+" to "+value);
|
|
|
|
}
|
|
|
|
|
2006-08-08 23:00:33 +00:00
|
|
|
/*
|
|
|
|
* gets translator options that were displayed in a dialog
|
|
|
|
*
|
|
|
|
* called as getOption() in detect code
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._getOption = function(option) {
|
|
|
|
return this._displayOptions[option];
|
|
|
|
}
|
|
|
|
|
2006-06-29 00:56:50 +00:00
|
|
|
/*
|
|
|
|
* makes translation API wait until done() has been called from the translator
|
2006-07-06 21:55:46 +00:00
|
|
|
* before executing _translationComplete
|
|
|
|
*
|
|
|
|
* called as wait() in translator code
|
2006-06-29 00:56:50 +00:00
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._enableAsynchronous = function() {
|
2006-08-08 01:06:33 +00:00
|
|
|
var me = this;
|
2006-06-29 00:56:50 +00:00
|
|
|
this._waitForCompletion = true;
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
this._sandbox.Scholar.done = function() { me._translationComplete(true) };
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* lets user pick which items s/he wants to put in his/her library
|
|
|
|
*
|
|
|
|
* called as selectItems() in translator code
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._selectItems = function(options) {
|
2006-09-04 17:09:44 +00:00
|
|
|
// hack to see if there are options
|
|
|
|
var haveOptions = false;
|
|
|
|
for(var i in options) {
|
|
|
|
haveOptions = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if(!haveOptions) {
|
|
|
|
throw "translator called select items with no items";
|
|
|
|
}
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
if(this._handlers.select) {
|
|
|
|
return this._runHandler("select", options);
|
|
|
|
} else { // no handler defined; assume they want all of them
|
|
|
|
return options;
|
|
|
|
}
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* executed on translator completion, either automatically from a synchronous
|
|
|
|
* scraper or as done() from an asynchronous scraper
|
|
|
|
*
|
|
|
|
* finishes things up and calls callback function(s)
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._translationComplete = function(returnValue) {
|
|
|
|
// to make sure this isn't called twice
|
|
|
|
if(!this._complete) {
|
|
|
|
this._complete = true;
|
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
if(this.type == "search" && !this._itemsFound && this.translator.length > 1) {
|
|
|
|
// if we're performing a search and didn't get any results, go on
|
|
|
|
// to the next translator
|
|
|
|
this.translator.shift();
|
|
|
|
this.translate();
|
|
|
|
} else {
|
|
|
|
Scholar.debug("translation complete");
|
|
|
|
|
|
|
|
// close open streams
|
|
|
|
this._closeStreams();
|
2006-08-14 20:34:13 +00:00
|
|
|
|
2006-08-15 23:03:11 +00:00
|
|
|
if(Scholar.Notifier.isEnabled()) {
|
|
|
|
// notify itemTreeView about updates
|
|
|
|
if(this.newItems.length) {
|
|
|
|
Scholar.Notifier.trigger("add", "item", this.newItems);
|
|
|
|
}
|
|
|
|
// notify collectionTreeView about updates
|
2006-09-08 05:47:47 +00:00
|
|
|
if(this.newCollections && this.newCollections.length) {
|
2006-08-15 23:03:11 +00:00
|
|
|
Scholar.Notifier.trigger("add", "collection", this.newCollections);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-08-14 20:34:13 +00:00
|
|
|
// call handlers
|
|
|
|
this._runHandler("done", returnValue);
|
2006-08-08 01:06:33 +00:00
|
|
|
}
|
2006-07-06 21:55:46 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
/*
|
|
|
|
* closes open file streams, if any exist
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._closeStreams = function() {
|
2006-09-05 07:51:55 +00:00
|
|
|
// serialize RDF and unregister dataSource
|
|
|
|
if(this._rdf) {
|
|
|
|
if(this._rdf.serializer) {
|
|
|
|
this._rdf.serializer.Serialize(this._streams[0]);
|
|
|
|
}
|
|
|
|
|
|
|
|
try {
|
|
|
|
var rdfService = Components.classes["@mozilla.org/rdf/rdf-service;1"].
|
|
|
|
getService(Components.interfaces.nsIRDFService);
|
|
|
|
rdfService.UnregisterDataSource(this._rdf.dataSource);
|
|
|
|
} catch(e) {}
|
|
|
|
|
|
|
|
delete this._rdf.dataSource;
|
|
|
|
}
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
if(this._streams.length) {
|
|
|
|
for(var i in this._streams) {
|
|
|
|
var stream = this._streams[i];
|
|
|
|
|
|
|
|
// stream could be either an input stream or an output stream
|
|
|
|
try {
|
|
|
|
stream.QueryInterface(Components.interfaces.nsIFileInputStream);
|
|
|
|
} catch(e) {
|
2006-08-08 21:17:07 +00:00
|
|
|
try {
|
|
|
|
stream.QueryInterface(Components.interfaces.nsIFileOutputStream);
|
|
|
|
} catch(e) {
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// encase close in try block, because it's possible it's already
|
|
|
|
// closed
|
|
|
|
try {
|
|
|
|
stream.close();
|
|
|
|
} catch(e) {
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2006-09-05 07:51:55 +00:00
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
delete this._streams;
|
|
|
|
this._streams = new Array();
|
2006-09-05 07:51:55 +00:00
|
|
|
this._inputStream = null;
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
|
2006-08-18 05:58:14 +00:00
|
|
|
/*
|
|
|
|
* imports an attachment from the disk
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._itemImportAttachment = function(attachment, sourceID) {
|
|
|
|
if(!attachment.path) {
|
|
|
|
// create from URL
|
|
|
|
if(attachment.url) {
|
|
|
|
var attachmentID = Scholar.Attachments.linkFromURL(attachment.url, sourceID,
|
|
|
|
(attachment.mimeType ? attachment.mimeType : undefined),
|
|
|
|
(attachment.title ? attachment.title : undefined));
|
2006-08-20 04:35:04 +00:00
|
|
|
var attachmentItem = Scholar.Items.get(attachmentID);
|
2006-08-18 05:58:14 +00:00
|
|
|
} else {
|
|
|
|
Scholar.debug("not adding attachment: no path or url specified");
|
2006-08-20 04:35:04 +00:00
|
|
|
return false;
|
2006-08-18 05:58:14 +00:00
|
|
|
}
|
|
|
|
} else {
|
2006-08-20 04:35:04 +00:00
|
|
|
// generate nsIFile
|
|
|
|
var IOService = Components.classes["@mozilla.org/network/io-service;1"].
|
|
|
|
getService(Components.interfaces.nsIIOService);
|
|
|
|
var uri = IOService.newURI(attachment.path, "", null);
|
|
|
|
var file = uri.QueryInterface(Components.interfaces.nsIFileURL).file;
|
|
|
|
|
2006-08-18 05:58:14 +00:00
|
|
|
if(attachment.url) {
|
2006-08-20 04:35:04 +00:00
|
|
|
// import from nsIFile
|
|
|
|
var attachmentID = Scholar.Attachments.importSnapshotFromFile(file,
|
|
|
|
attachment.url, attachment.title, attachment.mimeType,
|
|
|
|
(attachment.charset ? attachment.charset : null), sourceID);
|
|
|
|
var attachmentItem = Scholar.Items.get(attachmentID);
|
2006-08-18 05:58:14 +00:00
|
|
|
} else {
|
|
|
|
// import from nsIFile
|
|
|
|
var attachmentID = Scholar.Attachments.importFromFile(file, sourceID);
|
|
|
|
// get attachment item
|
2006-08-20 04:35:04 +00:00
|
|
|
var attachmentItem = Scholar.Items.get(attachmentID);
|
2006-08-18 05:58:14 +00:00
|
|
|
if(attachment.title) {
|
|
|
|
// set title
|
2006-08-20 04:35:04 +00:00
|
|
|
attachmentItem.setField("title", attachment.title);
|
2006-08-18 05:58:14 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-08-20 04:35:04 +00:00
|
|
|
return attachmentItem;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* handles tags and see also data for notes and attachments
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._itemTagsAndSeeAlso = function(item, newItem) {
|
|
|
|
Scholar.debug("handling notes and see also");
|
|
|
|
// add to ID map
|
|
|
|
if(item.itemID) {
|
|
|
|
this._IDMap[item.itemID] = newItem.getID();
|
|
|
|
}
|
|
|
|
// add see alsos
|
|
|
|
for each(var seeAlso in item.seeAlso) {
|
|
|
|
if(this._IDMap[seeAlso]) {
|
|
|
|
newItem.addSeeAlso(this._IDMap[seeAlso]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
for each(var tag in item.tags) {
|
|
|
|
newItem.addTag(tag);
|
|
|
|
}
|
2006-08-18 05:58:14 +00:00
|
|
|
}
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
/*
|
|
|
|
* executed when an item is done and ready to be loaded into the database
|
|
|
|
*/
|
2006-09-08 05:47:47 +00:00
|
|
|
Scholar.Translate.prototype._itemDone = function(item) {
|
2006-08-08 01:06:33 +00:00
|
|
|
if(!this.saveItem) { // if we're not supposed to save the item, just
|
|
|
|
// return the item array
|
|
|
|
|
|
|
|
// if a parent sandbox exists, use complete() function from that sandbox
|
|
|
|
if(this._parentTranslator) {
|
|
|
|
var pt = this._parentTranslator;
|
|
|
|
item.complete = function() { pt._itemDone(this) };
|
|
|
|
Scholar.debug("done from parent sandbox");
|
|
|
|
}
|
|
|
|
this._runHandler("itemDone", item);
|
|
|
|
return;
|
2006-08-17 07:56:01 +00:00
|
|
|
}
|
|
|
|
|
2006-08-05 20:58:45 +00:00
|
|
|
|
2006-08-15 23:03:11 +00:00
|
|
|
var notifierStatus = Scholar.Notifier.isEnabled();
|
|
|
|
if(notifierStatus) {
|
|
|
|
Scholar.Notifier.disable();
|
|
|
|
}
|
|
|
|
|
2006-08-30 00:43:09 +00:00
|
|
|
try { // make sure notifier gets turned back on when done
|
|
|
|
// Get typeID, defaulting to "website"
|
|
|
|
var type = (item.itemType ? item.itemType : "website");
|
2006-08-05 20:58:45 +00:00
|
|
|
|
2006-08-30 00:43:09 +00:00
|
|
|
if(type == "note") { // handle notes differently
|
|
|
|
var myID = Scholar.Notes.add(item.note);
|
|
|
|
// re-retrieve the item
|
|
|
|
var newItem = Scholar.Items.get(myID);
|
|
|
|
} else if(type == "attachment") {
|
|
|
|
if(this.type == "import") {
|
|
|
|
var newItem = this._itemImportAttachment(item, null);
|
|
|
|
var myID = newItem.getID();
|
|
|
|
} else {
|
|
|
|
Scholar.debug("discarding standalone attachment");
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
} else {
|
2006-09-06 04:45:19 +00:00
|
|
|
if(!item.title && this.type == "web") {
|
2006-08-30 00:43:09 +00:00
|
|
|
throw("item has no title");
|
|
|
|
}
|
|
|
|
|
|
|
|
// create new item
|
|
|
|
var typeID = Scholar.ItemTypes.getID(type);
|
|
|
|
var newItem = Scholar.Items.getNewItemByType(typeID);
|
|
|
|
|
|
|
|
// makes looping through easier
|
|
|
|
item.itemType = item.complete = undefined;
|
2006-08-05 20:58:45 +00:00
|
|
|
|
2006-08-30 21:56:52 +00:00
|
|
|
// automatically set access date if URL is set
|
2006-09-08 05:47:47 +00:00
|
|
|
if(item.url && !item.accessDate && this.type == "web") {
|
2006-08-30 21:56:52 +00:00
|
|
|
item.accessDate = (new Date()).toLocaleString();
|
|
|
|
}
|
|
|
|
|
2006-08-30 00:43:09 +00:00
|
|
|
var fieldID, field;
|
|
|
|
for(var i in item) {
|
|
|
|
// loop through item fields
|
|
|
|
data = item[i];
|
|
|
|
|
|
|
|
if(data) { // if field has content
|
|
|
|
if(i == "creators") { // creators are a special case
|
|
|
|
for(var j in data) {
|
|
|
|
var creatorType = 1;
|
|
|
|
// try to assign correct creator type
|
|
|
|
if(data[j].creatorType) {
|
|
|
|
try {
|
|
|
|
var creatorType = Scholar.CreatorTypes.getID(data[j].creatorType);
|
|
|
|
} catch(e) {
|
|
|
|
Scholar.debug("invalid creator type "+data[j].creatorType+" for creator index "+j);
|
|
|
|
}
|
2006-08-05 20:58:45 +00:00
|
|
|
}
|
2006-08-30 00:43:09 +00:00
|
|
|
|
|
|
|
newItem.setCreator(j, data[j].firstName, data[j].lastName, creatorType);
|
2006-08-05 20:58:45 +00:00
|
|
|
}
|
2006-08-30 00:43:09 +00:00
|
|
|
} else if(i == "title") { // skip checks for title
|
2006-08-05 20:58:45 +00:00
|
|
|
newItem.setField(i, data);
|
2006-08-30 00:43:09 +00:00
|
|
|
} else if(i == "seeAlso") {
|
|
|
|
newItem.translateSeeAlso = data;
|
|
|
|
} else if(i != "note" && i != "notes" && i != "itemID" &&
|
|
|
|
i != "attachments" && i != "tags" &&
|
|
|
|
(fieldID = Scholar.ItemFields.getID(i))) {
|
|
|
|
// if field is in db
|
|
|
|
if(Scholar.ItemFields.isValidForType(fieldID, typeID)) {
|
|
|
|
// if field is valid for this type
|
|
|
|
// add field
|
|
|
|
newItem.setField(i, data);
|
|
|
|
} else {
|
|
|
|
Scholar.debug("discarded field "+i+" for item: field not valid for type "+type);
|
|
|
|
}
|
2006-08-05 20:58:45 +00:00
|
|
|
} else {
|
2006-08-30 00:43:09 +00:00
|
|
|
Scholar.debug("discarded field "+i+" for item: field does not exist");
|
2006-08-05 20:58:45 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2006-08-30 00:43:09 +00:00
|
|
|
|
|
|
|
// save item
|
|
|
|
var myID = newItem.save();
|
|
|
|
if(myID == true) {
|
|
|
|
myID = newItem.getID();
|
2006-08-05 20:58:45 +00:00
|
|
|
}
|
2006-08-30 00:43:09 +00:00
|
|
|
|
|
|
|
// handle notes
|
|
|
|
if(item.notes) {
|
|
|
|
for each(var note in item.notes) {
|
|
|
|
var noteID = Scholar.Notes.add(note.note, myID);
|
2006-08-18 05:58:14 +00:00
|
|
|
|
2006-08-30 00:43:09 +00:00
|
|
|
// handle see also
|
|
|
|
var myNote = Scholar.Items.get(noteID);
|
|
|
|
this._itemTagsAndSeeAlso(note, myNote);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// handle attachments
|
|
|
|
if(item.attachments) {
|
|
|
|
for each(var attachment in item.attachments) {
|
|
|
|
if(this.type == "web") {
|
|
|
|
if(!attachment.url && !attachment.document) {
|
|
|
|
Scholar.debug("not adding attachment: no URL specified");
|
2006-08-30 01:41:51 +00:00
|
|
|
} else if(attachment.downloadable && this._downloadAssociatedFiles) {
|
2006-08-30 00:43:09 +00:00
|
|
|
if(attachment.document) {
|
|
|
|
attachmentID = Scholar.Attachments.importFromDocument(attachment.document, myID);
|
|
|
|
|
|
|
|
// change title, if a different one was specified
|
|
|
|
if(attachment.title && (!attachment.document.title
|
|
|
|
|| attachment.title != attachment.document.title)) {
|
|
|
|
var attachmentItem = Scholar.Items.get(attachmentID);
|
|
|
|
attachmentItem.setField("title", attachment.title);
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
Scholar.Attachments.importFromURL(attachment.url, myID,
|
|
|
|
(attachment.mimeType ? attachment.mimeType : attachment.document.contentType),
|
|
|
|
(attachment.title ? attachment.title : attachment.document.title));
|
|
|
|
}
|
2006-08-17 07:56:01 +00:00
|
|
|
} else {
|
2006-08-30 00:43:09 +00:00
|
|
|
if(attachment.document) {
|
|
|
|
attachmentID = Scholar.Attachments.linkFromURL(attachment.document.location.href, myID,
|
|
|
|
(attachment.mimeType ? attachment.mimeType : attachment.document.contentType),
|
|
|
|
(attachment.title ? attachment.title : attachment.document.title));
|
|
|
|
} else {
|
|
|
|
if(!attachment.mimeType || attachment.title) {
|
|
|
|
Scholar.debug("notice: either mimeType or title is missing; attaching file will be slower");
|
|
|
|
}
|
|
|
|
|
|
|
|
attachmentID = Scholar.Attachments.linkFromURL(attachment.url, myID,
|
|
|
|
(attachment.mimeType ? attachment.mimeType : undefined),
|
|
|
|
(attachment.title ? attachment.title : undefined));
|
2006-08-17 07:56:01 +00:00
|
|
|
}
|
|
|
|
}
|
2006-08-30 00:43:09 +00:00
|
|
|
} else if(this.type == "import") {
|
|
|
|
var attachmentItem = this._itemImportAttachment(attachment, myID);
|
|
|
|
if(attachmentItem) {
|
|
|
|
this._itemTagsAndSeeAlso(attachment, attachmentItem);
|
|
|
|
}
|
2006-08-20 04:35:04 +00:00
|
|
|
}
|
2006-08-17 07:56:01 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2006-08-30 00:43:09 +00:00
|
|
|
|
|
|
|
if(item.itemID) {
|
|
|
|
this._IDMap[item.itemID] = myID;
|
|
|
|
}
|
|
|
|
this.newItems.push(myID);
|
|
|
|
|
|
|
|
// handle see also
|
|
|
|
if(item.seeAlso) {
|
|
|
|
for each(var seeAlso in item.seeAlso) {
|
|
|
|
if(this._IDMap[seeAlso]) {
|
|
|
|
newItem.addSeeAlso(this._IDMap[seeAlso]);
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
}
|
2006-08-30 00:43:09 +00:00
|
|
|
|
|
|
|
if(item.tags) {
|
|
|
|
for each(var tag in item.tags) {
|
|
|
|
newItem.addTag(tag);
|
|
|
|
}
|
2006-08-20 04:35:04 +00:00
|
|
|
}
|
2006-08-30 00:43:09 +00:00
|
|
|
|
|
|
|
delete item;
|
|
|
|
} catch(e) {
|
|
|
|
if(notifierStatus) {
|
|
|
|
Scholar.Notifier.enable();
|
|
|
|
}
|
|
|
|
throw(e);
|
2006-08-20 04:35:04 +00:00
|
|
|
}
|
|
|
|
|
2006-08-15 23:03:11 +00:00
|
|
|
// only re-enable if notifier was enabled at the beginning of scraping
|
|
|
|
if(notifierStatus) {
|
|
|
|
Scholar.Notifier.enable();
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
this._runHandler("itemDone", newItem);
|
|
|
|
}
|
|
|
|
|
2006-08-05 20:58:45 +00:00
|
|
|
/*
|
|
|
|
* executed when a collection is done and ready to be loaded into the database
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._collectionDone = function(collection) {
|
|
|
|
var newCollection = this._processCollection(collection, null);
|
|
|
|
|
|
|
|
this._runHandler("collectionDone", newCollection);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* recursively processes collections
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._processCollection = function(collection, parentID) {
|
|
|
|
var newCollection = Scholar.Collections.add(collection.name, parentID);
|
2006-08-15 23:03:11 +00:00
|
|
|
var myID = newCollection.getID();
|
|
|
|
|
|
|
|
this.newCollections.push(myID);
|
2006-08-05 20:58:45 +00:00
|
|
|
|
|
|
|
for each(child in collection.children) {
|
|
|
|
if(child.type == "collection") {
|
|
|
|
// do recursive processing of collections
|
2006-08-15 23:03:11 +00:00
|
|
|
this._processCollection(child, myID);
|
2006-08-05 20:58:45 +00:00
|
|
|
} else {
|
|
|
|
// add mapped items to collection
|
|
|
|
if(this._IDMap[child.id]) {
|
|
|
|
Scholar.debug("adding "+this._IDMap[child.id]);
|
|
|
|
newCollection.addItem(this._IDMap[child.id]);
|
|
|
|
} else {
|
|
|
|
Scholar.debug("could not map "+child.id+" to an imported item");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return newCollection;
|
|
|
|
}
|
|
|
|
|
2006-07-06 21:55:46 +00:00
|
|
|
/*
|
|
|
|
* calls a handler (see setHandler above)
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._runHandler = function(type, argument) {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
var returnValue;
|
2006-07-06 21:55:46 +00:00
|
|
|
if(this._handlers[type]) {
|
|
|
|
for(var i in this._handlers[type]) {
|
|
|
|
Scholar.debug("running handler "+i+" for "+type);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
try {
|
2006-08-08 01:06:33 +00:00
|
|
|
if(this._parentTranslator) {
|
|
|
|
returnValue = this._handlers[type][i](null, argument);
|
|
|
|
} else {
|
|
|
|
returnValue = this._handlers[type][i](this, argument);
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
} catch(e) {
|
2006-08-30 01:41:51 +00:00
|
|
|
if(this._parentTranslator) {
|
|
|
|
// throw handler errors if they occur when a translator is
|
|
|
|
// called from another translator, so that the
|
|
|
|
// "Could Not Translate" dialog will appear if necessary
|
|
|
|
throw(e+' in handler '+i+' for '+type);
|
|
|
|
} else {
|
|
|
|
// otherwise, fail silently, so as not to interfere with
|
|
|
|
// interface cleanup
|
|
|
|
Scholar.debug(e+' in handler '+i+' for '+type);
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return returnValue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* does the actual web translation
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._web = function() {
|
2006-08-17 07:56:01 +00:00
|
|
|
this._downloadAssociatedFiles = Scholar.Prefs.get("downloadAssociatedFiles");
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
try {
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
this._sandbox.doWeb(this.document, this.location);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
} catch(e) {
|
2006-09-08 20:44:05 +00:00
|
|
|
var error = e+' in executing code for '+this.translator[0].label;
|
|
|
|
if(this._parentTranslator) {
|
|
|
|
throw error;
|
|
|
|
} else {
|
|
|
|
Scholar.debug();
|
|
|
|
return false;
|
|
|
|
}
|
2006-08-08 01:06:33 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* does the actual search translation
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._search = function() {
|
|
|
|
try {
|
2006-08-14 20:34:13 +00:00
|
|
|
this._sandbox.doSearch(this.search);
|
2006-08-08 01:06:33 +00:00
|
|
|
} catch(e) {
|
|
|
|
Scholar.debug(e+' in executing code for '+this.translator[0].label);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* does the actual import translation
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._import = function() {
|
|
|
|
this._importConfigureIO();
|
|
|
|
|
|
|
|
try {
|
|
|
|
this._sandbox.doImport();
|
|
|
|
} catch(e) {
|
2006-09-08 20:44:05 +00:00
|
|
|
Scholar.debug(e.toSource());
|
|
|
|
var error = e+' in executing code for '+this.translator[0].label;
|
|
|
|
if(this._parentTranslator) {
|
|
|
|
throw error;
|
|
|
|
} else {
|
|
|
|
Scholar.debug(error);
|
|
|
|
return false;
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* sets up import for IO
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._importConfigureIO = function() {
|
2006-08-31 07:45:03 +00:00
|
|
|
if(this._storage) {
|
2006-08-08 21:17:07 +00:00
|
|
|
if(this._configOptions.dataMode == "rdf") {
|
2006-08-14 20:34:13 +00:00
|
|
|
this._rdf = new Object();
|
|
|
|
|
2006-08-08 21:17:07 +00:00
|
|
|
// read string out of storage stream
|
|
|
|
var IOService = Components.classes['@mozilla.org/network/io-service;1']
|
|
|
|
.getService(Components.interfaces.nsIIOService);
|
2006-08-14 20:34:13 +00:00
|
|
|
this._rdf.dataSource = Components.classes["@mozilla.org/rdf/datasource;1?name=in-memory-datasource"].
|
2006-08-08 21:17:07 +00:00
|
|
|
createInstance(Components.interfaces.nsIRDFDataSource);
|
|
|
|
var parser = Components.classes["@mozilla.org/rdf/xml-parser;1"].
|
|
|
|
createInstance(Components.interfaces.nsIRDFXMLParser);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
2006-08-08 21:17:07 +00:00
|
|
|
// get URI and parse
|
|
|
|
var baseURI = (this.location ? IOService.newURI(this.location, "utf-8", null) : null);
|
2006-08-31 07:45:03 +00:00
|
|
|
parser.parseString(this._rdf.dataSource, baseURI, this._storage);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
2006-08-08 21:17:07 +00:00
|
|
|
// make an instance of the RDF handler
|
2006-08-14 20:34:13 +00:00
|
|
|
this._sandbox.Scholar.RDF = new Scholar.Translate.RDF(this._rdf.dataSource);
|
2006-08-08 21:17:07 +00:00
|
|
|
} else {
|
2006-08-31 07:45:03 +00:00
|
|
|
this._storageFunctions(true);
|
|
|
|
this._storagePointer = 0;
|
2006-08-08 21:17:07 +00:00
|
|
|
}
|
|
|
|
} else {
|
2006-09-05 07:51:55 +00:00
|
|
|
var me = this;
|
|
|
|
|
2006-08-08 21:17:07 +00:00
|
|
|
if(this._configOptions.dataMode == "rdf") {
|
2006-09-05 07:51:55 +00:00
|
|
|
if(!this._rdf) {
|
|
|
|
this._rdf = new Object()
|
|
|
|
|
|
|
|
var IOService = Components.classes['@mozilla.org/network/io-service;1']
|
|
|
|
.getService(Components.interfaces.nsIIOService);
|
|
|
|
var fileHandler = IOService.getProtocolHandler("file")
|
|
|
|
.QueryInterface(Components.interfaces.nsIFileProtocolHandler);
|
|
|
|
var URL = fileHandler.getURLSpecFromFile(this.location);
|
|
|
|
|
|
|
|
var RDFService = Components.classes['@mozilla.org/rdf/rdf-service;1']
|
|
|
|
.getService(Components.interfaces.nsIRDFService);
|
|
|
|
this._rdf.dataSource = RDFService.GetDataSourceBlocking(URL);
|
|
|
|
|
|
|
|
// make an instance of the RDF handler
|
|
|
|
this._sandbox.Scholar.RDF = new Scholar.Translate.RDF(this._rdf.dataSource);
|
|
|
|
}
|
2006-08-08 21:17:07 +00:00
|
|
|
} else {
|
|
|
|
// open file and set read methods
|
2006-09-05 07:51:55 +00:00
|
|
|
if(this._inputStream) {
|
|
|
|
this._inputStream.QueryInterface(Components.interfaces.nsISeekableStream)
|
|
|
|
.seek(Components.interfaces.nsISeekableStream.NS_SEEK_SET, 0);
|
|
|
|
this._inputStream.QueryInterface(Components.interfaces.nsIFileInputStream);
|
|
|
|
} else {
|
|
|
|
this._inputStream = Components.classes["@mozilla.org/network/file-input-stream;1"]
|
|
|
|
.createInstance(Components.interfaces.nsIFileInputStream);
|
|
|
|
this._inputStream.init(this.location, 0x01, 0664, 0);
|
|
|
|
this._streams.push(this._inputStream);
|
|
|
|
}
|
2006-08-08 21:17:07 +00:00
|
|
|
|
2006-09-05 07:51:55 +00:00
|
|
|
var filePosition = 0;
|
2006-09-08 20:44:05 +00:00
|
|
|
var intlStream = this._importDefuseBOM();
|
|
|
|
if(intlStream) {
|
|
|
|
// found a UTF BOM at the beginning of the file; don't allow
|
|
|
|
// translator to set the character set
|
|
|
|
this._sandbox.Scholar.setCharacterSet = function() {}
|
|
|
|
} else {
|
|
|
|
// allow translator to set charset
|
|
|
|
this._sandbox.Scholar.setCharacterSet = function(charset) {
|
|
|
|
// seek
|
|
|
|
if(filePosition != 0) {
|
|
|
|
me._inputStream.QueryInterface(Components.interfaces.nsISeekableStream)
|
|
|
|
.seek(Components.interfaces.nsISeekableStream.NS_SEEK_SET, filePosition);
|
|
|
|
me._inputStream.QueryInterface(Components.interfaces.nsIFileInputStream);
|
|
|
|
}
|
|
|
|
|
|
|
|
intlStream = Components.classes["@mozilla.org/intl/converter-input-stream;1"]
|
|
|
|
.createInstance(Components.interfaces.nsIConverterInputStream);
|
|
|
|
try {
|
|
|
|
intlStream.init(me._inputStream, charset, 1024,
|
|
|
|
Components.interfaces.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);
|
|
|
|
} catch(e) {
|
|
|
|
throw "Text encoding not supported";
|
|
|
|
}
|
|
|
|
me._streams.push(intlStream);
|
2006-09-05 07:51:55 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
var str = new Object();
|
|
|
|
if(this._configOptions.dataMode == "line") { // line by line reading
|
|
|
|
this._inputStream.QueryInterface(Components.interfaces.nsILineInputStream);
|
2006-08-08 21:17:07 +00:00
|
|
|
|
|
|
|
this._sandbox.Scholar.read = function() {
|
2006-09-08 20:44:05 +00:00
|
|
|
if(intlStream && intlStream instanceof Components.interfaces.nsIUnicharLineInputStream) {
|
|
|
|
Scholar.debug("using intlStream");
|
2006-09-05 07:51:55 +00:00
|
|
|
var amountRead = intlStream.readLine(str);
|
|
|
|
} else {
|
|
|
|
var amountRead = me._inputStream.readLine(str);
|
|
|
|
}
|
|
|
|
if(amountRead) {
|
|
|
|
filePosition += amountRead;
|
|
|
|
return str.value;
|
2006-08-08 21:17:07 +00:00
|
|
|
} else {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
} else { // block reading
|
2006-09-05 07:51:55 +00:00
|
|
|
var sStream;
|
2006-08-08 21:17:07 +00:00
|
|
|
|
|
|
|
this._sandbox.Scholar.read = function(amount) {
|
2006-09-05 07:51:55 +00:00
|
|
|
if(intlStream) {
|
|
|
|
// read from international stream, if one is available
|
|
|
|
var amountRead = intlStream.readString(amount, str);
|
|
|
|
|
|
|
|
if(amountRead) {
|
|
|
|
filePosition += amountRead;
|
|
|
|
return str.value;
|
|
|
|
} else {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
// allocate sStream on the fly
|
|
|
|
if(!sStream) {
|
|
|
|
sStream = Components.classes["@mozilla.org/scriptableinputstream;1"]
|
|
|
|
.createInstance(Components.interfaces.nsIScriptableInputStream);
|
|
|
|
sStream.init(me._inputStream);
|
|
|
|
}
|
|
|
|
|
|
|
|
// read from the scriptable input stream
|
|
|
|
var string = sStream.read(amount);
|
|
|
|
filePosition += string.length;
|
|
|
|
return string;
|
|
|
|
}
|
2006-08-08 21:17:07 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// attach sStream to stack of streams to close
|
|
|
|
this._streams.push(sStream);
|
|
|
|
}
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-09-08 20:44:05 +00:00
|
|
|
/*
|
|
|
|
* searches for a UTF BOM at the beginning of the input stream. if one is found,
|
|
|
|
* returns an appropriate converter-input-stream for the UTF type, and sets
|
|
|
|
* _hasBOM to the UTF type. if one is not found, returns false, and sets
|
|
|
|
* _hasBOM to false to prevent further checking.
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._importDefuseBOM = function() {
|
|
|
|
// if already found not to have a BOM, skip
|
|
|
|
if(this._hasBOM === false) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if(!this._hasBOM) {
|
|
|
|
// if not checked for a BOM, open a binary input stream and read
|
|
|
|
var binStream = Components.classes["@mozilla.org/binaryinputstream;1"].
|
|
|
|
createInstance(Components.interfaces.nsIBinaryInputStream);
|
|
|
|
binStream.setInputStream(this._inputStream);
|
|
|
|
|
|
|
|
// read the first byte
|
|
|
|
var byte1 = binStream.read8();
|
|
|
|
|
|
|
|
// at the moment, we don't support UTF-32 or UTF-7. while mozilla
|
|
|
|
// supports these encodings, they add slight additional complexity to
|
|
|
|
// the function and anyone using them for storing bibliographic metadata
|
|
|
|
// is insane.
|
|
|
|
if(byte1 == 0xEF) { // UTF-8: EF BB BF
|
|
|
|
var byte2 = binStream.read8();
|
|
|
|
if(byte2 == 0xBB) {
|
|
|
|
var byte3 = binStream.read8();
|
|
|
|
if(byte3 == 0xBF) {
|
|
|
|
this._hasBOM = "UTF-8";
|
|
|
|
}
|
|
|
|
}
|
|
|
|
} else if(byte1 == 0xFE) { // UTF-16BE: FE FF
|
|
|
|
var byte2 = binStream.read8();
|
|
|
|
if(byte2 == 0xFF) {
|
|
|
|
this._hasBOM = "UTF-16BE";
|
|
|
|
}
|
|
|
|
} else if(byte1 == 0xFF) { // UTF-16LE: FF FE
|
|
|
|
var byte2 = binStream.read8();
|
|
|
|
if(byte2 == 0xFE) {
|
|
|
|
this._hasBOM = "UTF16-LE";
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if(!this._hasBOM) {
|
|
|
|
// seek back to begining of file
|
|
|
|
this._inputStream.QueryInterface(Components.interfaces.nsISeekableStream)
|
|
|
|
.seek(Components.interfaces.nsISeekableStream.NS_SEEK_SET, 0);
|
|
|
|
this._inputStream.QueryInterface(Components.interfaces.nsIFileInputStream);
|
|
|
|
|
|
|
|
// say there's no BOM
|
|
|
|
this._hasBOM = false;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
// if it had a BOM the last time, it has one this time, too. seek to the
|
|
|
|
// correct position.
|
|
|
|
|
|
|
|
if(this._hasBOM == "UTF-8") {
|
|
|
|
var seekPosition = 3;
|
|
|
|
} else {
|
|
|
|
var seekPosition = 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
this._inputStream.QueryInterface(Components.interfaces.nsISeekableStream)
|
|
|
|
.seek(Components.interfaces.nsISeekableStream.NS_SEEK_SET, seekPosition);
|
|
|
|
this._inputStream.QueryInterface(Components.interfaces.nsIFileInputStream);
|
|
|
|
}
|
|
|
|
|
|
|
|
// if we know what kind of BOM it has, generate an input stream
|
|
|
|
intlStream = Components.classes["@mozilla.org/intl/converter-input-stream;1"]
|
|
|
|
.createInstance(Components.interfaces.nsIConverterInputStream);
|
|
|
|
intlStream.init(this._inputStream, this._hasBOM, 1024,
|
|
|
|
Components.interfaces.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);
|
|
|
|
return intlStream;
|
|
|
|
}
|
|
|
|
|
2006-06-29 00:56:50 +00:00
|
|
|
/*
|
|
|
|
* does the actual export, after code has been loaded and parsed
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._export = function() {
|
2006-07-06 21:55:46 +00:00
|
|
|
|
2006-06-29 00:56:50 +00:00
|
|
|
// get items
|
2006-08-14 20:34:13 +00:00
|
|
|
if(this.items) {
|
|
|
|
this._itemsLeft = this.items;
|
|
|
|
} else {
|
|
|
|
this._itemsLeft = Scholar.getItems();
|
|
|
|
}
|
2006-08-18 05:58:14 +00:00
|
|
|
|
2006-08-02 21:06:58 +00:00
|
|
|
// run handler for items available
|
|
|
|
this._runHandler("itemCount", this._itemsLeft.length);
|
2006-07-06 03:39:32 +00:00
|
|
|
|
2006-07-06 21:55:46 +00:00
|
|
|
// get collections, if requested
|
2006-08-14 20:34:13 +00:00
|
|
|
if(this._configOptions.getCollections && !this.items) {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
this._collectionsLeft = Scholar.getCollections();
|
2006-07-06 03:39:32 +00:00
|
|
|
}
|
2006-06-29 00:56:50 +00:00
|
|
|
|
2006-08-18 05:58:14 +00:00
|
|
|
Scholar.debug(this._displayOptions);
|
|
|
|
|
|
|
|
// export file data, if requested
|
|
|
|
if(this._displayOptions["exportFileData"]) {
|
|
|
|
// generate directory
|
|
|
|
var directory = Components.classes["@mozilla.org/file/local;1"].
|
|
|
|
createInstance(Components.interfaces.nsILocalFile);
|
|
|
|
directory.initWithFile(this.location.parent);
|
|
|
|
|
|
|
|
// get name
|
|
|
|
var name = this.location.leafName;
|
|
|
|
var extensionMatch = /^(.*)\.[a-zA-Z0-9]+$/
|
|
|
|
var m = extensionMatch.exec(name);
|
|
|
|
if(m) {
|
2006-08-20 04:35:04 +00:00
|
|
|
name = m[1];
|
2006-08-18 05:58:14 +00:00
|
|
|
}
|
|
|
|
directory.append(name);
|
|
|
|
|
|
|
|
// create directory
|
|
|
|
directory.create(Components.interfaces.nsIFile.DIRECTORY_TYPE, 0700);
|
|
|
|
|
|
|
|
// generate a new location
|
|
|
|
var originalName = this.location.leafName;
|
|
|
|
this.location = Components.classes["@mozilla.org/file/local;1"].
|
|
|
|
createInstance(Components.interfaces.nsILocalFile);
|
|
|
|
this.location.initWithFile(directory);
|
|
|
|
this.location.append(originalName);
|
|
|
|
|
|
|
|
// create files directory
|
|
|
|
this._exportFileDirectory = Components.classes["@mozilla.org/file/local;1"].
|
|
|
|
createInstance(Components.interfaces.nsILocalFile);
|
|
|
|
this._exportFileDirectory.initWithFile(directory);
|
|
|
|
this._exportFileDirectory.append("files");
|
|
|
|
this._exportFileDirectory.create(Components.interfaces.nsIFile.DIRECTORY_TYPE, 0700);
|
|
|
|
}
|
|
|
|
|
|
|
|
// configure IO
|
|
|
|
this._exportConfigureIO();
|
|
|
|
|
2006-06-29 00:56:50 +00:00
|
|
|
try {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
this._sandbox.doExport();
|
2006-06-29 00:56:50 +00:00
|
|
|
} catch(e) {
|
2006-08-08 01:06:33 +00:00
|
|
|
Scholar.debug(e+' in executing code for '+this.translator[0].label);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
return false;
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
|
|
|
return true;
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|
|
|
|
|
2006-07-06 21:55:46 +00:00
|
|
|
/*
|
|
|
|
* configures IO for export
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._exportConfigureIO = function() {
|
|
|
|
// open file
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
var fStream = Components.classes["@mozilla.org/network/file-output-stream;1"]
|
2006-07-06 21:55:46 +00:00
|
|
|
.createInstance(Components.interfaces.nsIFileOutputStream);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
fStream.init(this.location, 0x02 | 0x08 | 0x20, 0664, 0); // write, create, truncate
|
|
|
|
// attach to stack of streams to close at the end
|
|
|
|
this._streams.push(fStream);
|
2006-07-06 21:55:46 +00:00
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
if(this._configOptions.dataMode == "rdf") { // rdf io
|
2006-08-14 20:34:13 +00:00
|
|
|
this._rdf = new Object();
|
|
|
|
|
2006-07-06 21:55:46 +00:00
|
|
|
// create data source
|
2006-08-14 20:34:13 +00:00
|
|
|
this._rdf.dataSource = Components.classes["@mozilla.org/rdf/datasource;1?name=xml-datasource"].
|
2006-07-06 21:55:46 +00:00
|
|
|
createInstance(Components.interfaces.nsIRDFDataSource);
|
|
|
|
// create serializer
|
2006-08-14 20:34:13 +00:00
|
|
|
this._rdf.serializer = Components.classes["@mozilla.org/rdf/xml-serializer;1"].
|
2006-07-06 21:55:46 +00:00
|
|
|
createInstance(Components.interfaces.nsIRDFXMLSerializer);
|
2006-08-14 20:34:13 +00:00
|
|
|
this._rdf.serializer.init(this._rdf.dataSource);
|
|
|
|
this._rdf.serializer.QueryInterface(Components.interfaces.nsIRDFXMLSource);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
|
|
|
// make an instance of the RDF handler
|
2006-08-14 20:34:13 +00:00
|
|
|
this._sandbox.Scholar.RDF = new Scholar.Translate.RDF(this._rdf.dataSource, this._rdf.serializer);
|
2006-09-05 07:51:55 +00:00
|
|
|
} else {
|
|
|
|
// regular io; write just writes to file
|
|
|
|
var intlStream = null;
|
|
|
|
|
|
|
|
// allow setting of character sets
|
|
|
|
this._sandbox.Scholar.setCharacterSet = function(charset) {
|
|
|
|
intlStream = Components.classes["@mozilla.org/intl/converter-output-stream;1"]
|
|
|
|
.createInstance(Components.interfaces.nsIConverterOutputStream);
|
|
|
|
intlStream.init(fStream, charset, 1024, "?".charCodeAt(0));
|
|
|
|
};
|
|
|
|
|
|
|
|
this._sandbox.Scholar.write = function(data) {
|
|
|
|
if(intlStream) {
|
|
|
|
intlStream.writeString(data);
|
|
|
|
} else {
|
|
|
|
fStream.write(data, data.length);
|
|
|
|
}
|
|
|
|
};
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-08-18 05:58:14 +00:00
|
|
|
/*
|
|
|
|
* copies attachment and returns data, given an attachment object
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._exportGetAttachment = function(attachment) {
|
|
|
|
var attachmentArray = new Object();
|
|
|
|
|
|
|
|
var attachmentID = attachment.getID();
|
|
|
|
var linkMode = attachment.getAttachmentLinkMode();
|
|
|
|
|
|
|
|
// get url if one exists
|
|
|
|
if(linkMode == Scholar.Attachments.LINK_MODE_LINKED_URL ||
|
|
|
|
linkMode == Scholar.Attachments.LINK_MODE_IMPORTED_URL) {
|
|
|
|
var url = attachment.getURL()
|
|
|
|
attachmentArray.url = url;
|
|
|
|
} else if(!this._displayOptions["exportFileData"]) {
|
|
|
|
// only export urls, not files, if exportFileData is off
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
// add item ID
|
|
|
|
attachmentArray.itemID = attachmentID;
|
|
|
|
// get title
|
|
|
|
attachmentArray.title = attachment.getField("title");
|
|
|
|
// get mime type
|
|
|
|
attachmentArray.mimeType = attachment.getAttachmentMimeType();
|
2006-08-20 04:35:04 +00:00
|
|
|
// get charset
|
|
|
|
attachmentArray.charset = attachment.getAttachmentCharset();
|
|
|
|
// get seeAlso
|
|
|
|
attachmentArray.seeAlso = attachment.getSeeAlso();
|
|
|
|
// get tags
|
|
|
|
attachmentArray.tags = attachment.getTags();
|
2006-08-18 05:58:14 +00:00
|
|
|
|
|
|
|
if(linkMode != Scholar.Attachments.LINK_MODE_LINKED_URL &&
|
|
|
|
this._displayOptions["exportFileData"]) {
|
|
|
|
// add path and filename if not an internet link
|
|
|
|
var file = attachment.getFile();
|
2006-08-20 04:35:04 +00:00
|
|
|
attachmentArray.path = "files/"+attachmentID+"/"+file.leafName;
|
2006-08-18 05:58:14 +00:00
|
|
|
|
|
|
|
if(linkMode == Scholar.Attachments.LINK_MODE_LINKED_FILE) {
|
|
|
|
// create a new directory
|
|
|
|
var directory = Components.classes["@mozilla.org/file/local;1"].
|
|
|
|
createInstance(Components.interfaces.nsILocalFile);
|
|
|
|
directory.initWithFile(this._exportFileDirectory);
|
|
|
|
directory.append(attachmentID);
|
|
|
|
directory.create(Components.interfaces.nsIFile.DIRECTORY_TYPE, 0700);
|
|
|
|
// copy file
|
|
|
|
file.copyTo(directory, attachmentArray.filename);
|
|
|
|
} else {
|
|
|
|
// copy imported files from the Scholar directory
|
|
|
|
var directory = Scholar.getStorageDirectory();
|
|
|
|
directory.append(attachmentID);
|
|
|
|
directory.copyTo(this._exportFileDirectory, attachmentID);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
Scholar.debug(attachmentArray);
|
|
|
|
|
|
|
|
return attachmentArray;
|
|
|
|
}
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
/*
|
|
|
|
* gets the next item to process (called as Scholar.nextItem() from code)
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._exportGetItem = function() {
|
|
|
|
if(this._itemsLeft.length != 0) {
|
|
|
|
var returnItem = this._itemsLeft.shift();
|
2006-08-18 05:58:14 +00:00
|
|
|
|
|
|
|
// skip files if exportFileData is off, or if the file isn't standalone
|
|
|
|
if(returnItem.isAttachment() &&
|
|
|
|
(!this._displayOptions["exportFileData"] ||
|
|
|
|
returnItem.getSource())) {
|
|
|
|
return this._exportGetItem();
|
|
|
|
}
|
|
|
|
|
|
|
|
// export file data for single files
|
|
|
|
if(returnItem.isAttachment()) { // an independent attachment
|
|
|
|
var returnItemArray = this._exportGetAttachment(returnItem);
|
|
|
|
returnItemArray.itemType = "attachment";
|
|
|
|
return returnItemArray;
|
|
|
|
} else {
|
|
|
|
var returnItemArray = returnItem.toArray();
|
|
|
|
// get attachments, although only urls will be passed if exportFileData
|
|
|
|
// is off
|
|
|
|
returnItemArray.attachments = new Array();
|
|
|
|
var attachments = returnItem.getAttachments();
|
|
|
|
for each(attachmentID in attachments) {
|
|
|
|
var attachment = Scholar.Items.get(attachmentID);
|
|
|
|
var attachmentInfo = this._exportGetAttachment(attachment);
|
|
|
|
|
|
|
|
if(attachmentInfo) {
|
|
|
|
returnItemArray.attachments.push(attachmentInfo);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-08-02 21:06:58 +00:00
|
|
|
this._runHandler("itemDone", returnItem);
|
2006-08-18 05:58:14 +00:00
|
|
|
|
|
|
|
return returnItemArray;
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
2006-08-02 21:06:58 +00:00
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* gets the next item to collection (called as Scholar.nextCollection() from code)
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._exportGetCollection = function() {
|
2006-08-14 20:34:13 +00:00
|
|
|
if(!this._configOptions.getCollections) {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
throw("getCollections configure option not set; cannot retrieve collection");
|
|
|
|
}
|
|
|
|
|
2006-08-14 20:34:13 +00:00
|
|
|
if(this._collectionsLeft && this._collectionsLeft.length != 0) {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
var returnItem = this._collectionsLeft.shift();
|
|
|
|
var collection = new Object();
|
|
|
|
collection.id = returnItem.getID();
|
|
|
|
collection.name = returnItem.getName();
|
|
|
|
collection.type = "collection";
|
|
|
|
collection.children = returnItem.toArray();
|
2006-07-06 21:55:46 +00:00
|
|
|
|
2006-08-05 20:58:45 +00:00
|
|
|
return collection;
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* sets up internal IO in such a way that both reading and writing are possible
|
|
|
|
* (for inter-scraper communications)
|
|
|
|
*/
|
|
|
|
Scholar.Translate.prototype._initializeInternalIO = function() {
|
|
|
|
if(this.type == "import" || this.type == "export") {
|
|
|
|
if(this._configOptions.dataMode == "rdf") {
|
2006-08-31 00:04:11 +00:00
|
|
|
this._rdf = new Object();
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
// use an in-memory data source for internal IO
|
2006-08-14 20:34:13 +00:00
|
|
|
this._rdf.dataSource = Components.classes["@mozilla.org/rdf/datasource;1?name=in-memory-datasource"].
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
createInstance(Components.interfaces.nsIRDFDataSource);
|
2006-07-07 18:41:21 +00:00
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
// make an instance of the RDF handler
|
2006-08-14 20:34:13 +00:00
|
|
|
this._sandbox.Scholar.RDF = new Scholar.Translate.RDF(this._rdf.dataSource);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
} else {
|
2006-08-31 07:45:03 +00:00
|
|
|
this._storage = "";
|
|
|
|
this._storageLength = 0;
|
|
|
|
this._storagePointer = 0;
|
|
|
|
this._storageFunctions(true, true);
|
2006-08-08 21:17:07 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* sets up functions for reading/writing to a storage stream
|
|
|
|
*/
|
2006-08-31 07:45:03 +00:00
|
|
|
Scholar.Translate.prototype._storageFunctions = function(read, write) {
|
2006-08-08 21:17:07 +00:00
|
|
|
var me = this;
|
2006-09-05 07:51:55 +00:00
|
|
|
|
|
|
|
// add setCharacterSet method that does nothing
|
|
|
|
this._sandbox.Scholar.setCharacterSet = function() {}
|
|
|
|
|
2006-08-08 21:17:07 +00:00
|
|
|
if(write) {
|
|
|
|
// set up write() method
|
2006-08-31 07:45:03 +00:00
|
|
|
this._sandbox.Scholar.write = function(data) {
|
|
|
|
me._storage += data;
|
|
|
|
me._storageLength += data.length;
|
|
|
|
};
|
2006-08-08 21:17:07 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if(read) {
|
|
|
|
// set up read methods
|
|
|
|
if(this._configOptions.dataMode == "line") { // line by line reading
|
|
|
|
var lastCharacter;
|
2006-07-07 18:41:21 +00:00
|
|
|
|
2006-08-08 21:17:07 +00:00
|
|
|
this._sandbox.Scholar.read = function() {
|
2006-08-31 07:45:03 +00:00
|
|
|
if(me._storagePointer >= me._storageLength) {
|
2006-08-08 21:17:07 +00:00
|
|
|
return false;
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
2006-08-31 07:45:03 +00:00
|
|
|
var oldPointer = me._storagePointer;
|
|
|
|
var lfIndex = me._storage.indexOf("\n", me._storagePointer);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
|
2006-08-31 07:45:03 +00:00
|
|
|
if(lfIndex != -1) {
|
|
|
|
// in case we have a CRLF
|
|
|
|
me._storagePointer = lfIndex+1;
|
|
|
|
if(me._storageLength > lfIndex && me._storage[lfIndex-1] == "\r") {
|
|
|
|
lfIndex--;
|
|
|
|
}
|
|
|
|
return me._storage.substr(oldPointer, lfIndex-oldPointer);
|
2006-07-07 18:41:21 +00:00
|
|
|
}
|
2006-08-08 21:17:07 +00:00
|
|
|
|
2006-08-31 07:45:03 +00:00
|
|
|
var crIndex = me._storage.indexOf("\r", me._storagePointer);
|
|
|
|
if(crIndex != -1) {
|
|
|
|
me._storagePointer = crIndex+1;
|
|
|
|
return me._storage.substr(oldPointer, crIndex-oldPointer-1);
|
|
|
|
}
|
2006-08-08 21:17:07 +00:00
|
|
|
|
2006-08-31 07:45:03 +00:00
|
|
|
me._storagePointer = me._storageLength;
|
|
|
|
return me._storage;
|
2006-07-07 18:41:21 +00:00
|
|
|
}
|
2006-08-31 07:45:03 +00:00
|
|
|
} else { // block reading
|
2006-09-08 01:59:22 +00:00
|
|
|
this._sandbox.Scholar.read = function(amount) {;
|
2006-08-31 07:45:03 +00:00
|
|
|
if(me._storagePointer >= me._storageLength) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2006-09-08 01:59:22 +00:00
|
|
|
if((me._storagePointer+amount) > me._storageLength) {
|
|
|
|
me._storagePointer = me._storageLength+1;
|
2006-08-31 07:45:03 +00:00
|
|
|
return me._storage;
|
2006-08-08 21:17:07 +00:00
|
|
|
}
|
|
|
|
|
2006-08-31 07:45:03 +00:00
|
|
|
var oldPointer = me._storagePointer;
|
|
|
|
me._storagePointer += amount;
|
|
|
|
return me._storage.substr(oldPointer, amount);
|
2006-07-07 18:41:21 +00:00
|
|
|
}
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-08-05 20:58:45 +00:00
|
|
|
/* Scholar.Translate.ScholarItem: a class for generating a new item from
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
* inside scraper code
|
|
|
|
*/
|
|
|
|
|
2006-09-08 05:47:47 +00:00
|
|
|
Scholar.Translate.GenerateScholarItemClass = function() {
|
|
|
|
var ScholarItem = function(itemType) {
|
|
|
|
// assign item type
|
|
|
|
this.itemType = itemType;
|
|
|
|
// generate creators array
|
|
|
|
this.creators = new Array();
|
|
|
|
// generate notes array
|
|
|
|
this.notes = new Array();
|
|
|
|
// generate tags array
|
|
|
|
this.tags = new Array();
|
|
|
|
// generate see also array
|
|
|
|
this.seeAlso = new Array();
|
|
|
|
// generate file array
|
|
|
|
this.attachments = new Array();
|
|
|
|
};
|
|
|
|
|
|
|
|
return ScholarItem;
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
|
2006-08-05 20:58:45 +00:00
|
|
|
/* Scholar.Translate.Collection: a class for generating a new top-level
|
|
|
|
* collection from inside scraper code
|
|
|
|
*/
|
2006-09-08 05:47:47 +00:00
|
|
|
|
|
|
|
Scholar.Translate.GenerateScholarCollectionClass = function() {
|
|
|
|
var ScholarCollection = Scholar.Translate.ScholarCollection = function() {};
|
|
|
|
|
|
|
|
return ScholarCollection;
|
|
|
|
}
|
2006-08-05 20:58:45 +00:00
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
/* Scholar.Translate.RDF: a class for handling RDF IO
|
|
|
|
*
|
|
|
|
* If an import/export translator specifies dataMode RDF, this is the interface,
|
2006-08-05 20:58:45 +00:00
|
|
|
* accessible from model.
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
*
|
|
|
|
* In order to simplify things, all classes take in their resource/container
|
|
|
|
* as either the Mozilla native type or a string, but all
|
|
|
|
* return resource/containers as Mozilla native types (use model.toString to
|
|
|
|
* convert)
|
|
|
|
*/
|
|
|
|
|
|
|
|
Scholar.Translate.RDF = function(dataSource, serializer) {
|
|
|
|
this._RDFService = Components.classes['@mozilla.org/rdf/rdf-service;1']
|
|
|
|
.getService(Components.interfaces.nsIRDFService);
|
|
|
|
this._AtomService = Components.classes["@mozilla.org/atom-service;1"]
|
|
|
|
.getService(Components.interfaces.nsIAtomService);
|
|
|
|
this._RDFContainerUtils = Components.classes["@mozilla.org/rdf/container-utils;1"]
|
|
|
|
.getService(Components.interfaces.nsIRDFContainerUtils);
|
|
|
|
|
|
|
|
this._dataSource = dataSource;
|
|
|
|
this._serializer = serializer;
|
|
|
|
}
|
|
|
|
|
|
|
|
// turn an nsISimpleEnumerator into an array
|
|
|
|
Scholar.Translate.RDF.prototype._deEnumerate = function(enumerator) {
|
|
|
|
if(!(enumerator instanceof Components.interfaces.nsISimpleEnumerator)) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
var resources = new Array();
|
|
|
|
|
|
|
|
while(enumerator.hasMoreElements()) {
|
|
|
|
var resource = enumerator.getNext();
|
|
|
|
try {
|
|
|
|
resource.QueryInterface(Components.interfaces.nsIRDFLiteral);
|
|
|
|
resources.push(resource.Value);
|
|
|
|
} catch(e) {
|
|
|
|
resource.QueryInterface(Components.interfaces.nsIRDFResource);
|
|
|
|
resources.push(resource);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if(resources.length) {
|
|
|
|
return resources;
|
|
|
|
} else {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// get a resource as an nsIRDFResource, instead of a string
|
|
|
|
Scholar.Translate.RDF.prototype._getResource = function(about) {
|
2006-08-05 20:58:45 +00:00
|
|
|
try {
|
|
|
|
if(!(about instanceof Components.interfaces.nsIRDFResource)) {
|
|
|
|
about = this._RDFService.GetResource(about);
|
|
|
|
}
|
|
|
|
} catch(e) {
|
|
|
|
throw("invalid RDF resource: "+about);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
return about;
|
|
|
|
}
|
|
|
|
|
|
|
|
// USED FOR OUTPUT
|
|
|
|
|
|
|
|
// writes an RDF triple
|
|
|
|
Scholar.Translate.RDF.prototype.addStatement = function(about, relation, value, literal) {
|
|
|
|
about = this._getResource(about);
|
|
|
|
|
|
|
|
if(!(value instanceof Components.interfaces.nsIRDFResource)) {
|
|
|
|
if(literal) {
|
|
|
|
value = this._RDFService.GetLiteral(value);
|
|
|
|
} else {
|
|
|
|
value = this._RDFService.GetResource(value);
|
2006-07-06 21:55:46 +00:00
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
this._dataSource.Assert(about, this._RDFService.GetResource(relation), value, true);
|
|
|
|
}
|
2006-07-06 21:55:46 +00:00
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
// creates an anonymous resource
|
|
|
|
Scholar.Translate.RDF.prototype.newResource = function() {
|
|
|
|
return this._RDFService.GetAnonymousResource()
|
|
|
|
};
|
|
|
|
|
|
|
|
// creates a new container
|
|
|
|
Scholar.Translate.RDF.prototype.newContainer = function(type, about) {
|
|
|
|
about = this._getResource(about);
|
|
|
|
|
|
|
|
type = type.toLowerCase();
|
|
|
|
if(type == "bag") {
|
|
|
|
return this._RDFContainerUtils.MakeBag(this._dataSource, about);
|
|
|
|
} else if(type == "seq") {
|
|
|
|
return this._RDFContainerUtils.MakeSeq(this._dataSource, about);
|
|
|
|
} else if(type == "alt") {
|
|
|
|
return this._RDFContainerUtils.MakeAlt(this._dataSource, about);
|
2006-07-06 21:55:46 +00:00
|
|
|
} else {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
throw "Invalid container type in model.newContainer";
|
2006-07-06 21:55:46 +00:00
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
2006-07-06 21:55:46 +00:00
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
// adds a new container element (index optional)
|
2006-08-05 20:58:45 +00:00
|
|
|
Scholar.Translate.RDF.prototype.addContainerElement = function(about, element, literal, index) {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
if(!(about instanceof Components.interfaces.nsIRDFContainer)) {
|
|
|
|
about = this._getResource(about);
|
|
|
|
var container = Components.classes["@mozilla.org/rdf/container;1"].
|
|
|
|
createInstance(Components.interfaces.nsIRDFContainer);
|
|
|
|
container.Init(this._dataSource, about);
|
2006-08-05 20:58:45 +00:00
|
|
|
about = container;
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
if(!(element instanceof Components.interfaces.nsIRDFResource)) {
|
2006-08-05 20:58:45 +00:00
|
|
|
if(literal) {
|
|
|
|
element = this._RDFService.GetLiteral(element);
|
|
|
|
} else {
|
|
|
|
element = this._RDFService.GetResource(element);
|
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if(index) {
|
|
|
|
about.InsertElementAt(element, index, true);
|
|
|
|
} else {
|
|
|
|
about.AppendElement(element);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-08-05 20:58:45 +00:00
|
|
|
// gets container elements as an array
|
|
|
|
Scholar.Translate.RDF.prototype.getContainerElements = function(about) {
|
|
|
|
if(!(about instanceof Components.interfaces.nsIRDFContainer)) {
|
|
|
|
about = this._getResource(about);
|
|
|
|
var container = Components.classes["@mozilla.org/rdf/container;1"].
|
|
|
|
createInstance(Components.interfaces.nsIRDFContainer);
|
|
|
|
container.Init(this._dataSource, about);
|
|
|
|
about = container;
|
|
|
|
}
|
|
|
|
|
|
|
|
return this._deEnumerate(about.GetElements());
|
|
|
|
}
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
// sets a namespace
|
|
|
|
Scholar.Translate.RDF.prototype.addNamespace = function(prefix, uri) {
|
|
|
|
if(this._serializer) { // silently fail, in case the reason the scraper
|
|
|
|
// is failing is that we're using internal IO
|
|
|
|
this._serializer.addNameSpace(this._AtomService.getAtom(prefix), uri);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// gets a resource's URI
|
|
|
|
Scholar.Translate.RDF.prototype.getResourceURI = function(resource) {
|
2006-09-04 21:43:23 +00:00
|
|
|
if(typeof(resource) == "string") {
|
|
|
|
return resource;
|
|
|
|
}
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
resource.QueryInterface(Components.interfaces.nsIRDFResource);
|
|
|
|
return resource.ValueUTF8;
|
|
|
|
}
|
|
|
|
|
|
|
|
// USED FOR INPUT
|
|
|
|
|
|
|
|
// gets all RDF resources
|
|
|
|
Scholar.Translate.RDF.prototype.getAllResources = function() {
|
|
|
|
var resourceEnumerator = this._dataSource.GetAllResources();
|
|
|
|
return this._deEnumerate(resourceEnumerator);
|
|
|
|
}
|
|
|
|
|
|
|
|
// gets arcs going in
|
|
|
|
Scholar.Translate.RDF.prototype.getArcsIn = function(resource) {
|
|
|
|
resource = this._getResource(resource);
|
|
|
|
|
|
|
|
var arcEnumerator = this._dataSource.ArcLabelsIn(resource);
|
|
|
|
return this._deEnumerate(arcEnumerator);
|
|
|
|
}
|
|
|
|
|
|
|
|
// gets arcs going out
|
|
|
|
Scholar.Translate.RDF.prototype.getArcsOut = function(resource) {
|
|
|
|
resource = this._getResource(resource);
|
|
|
|
|
|
|
|
var arcEnumerator = this._dataSource.ArcLabelsOut(resource);
|
|
|
|
return this._deEnumerate(arcEnumerator);
|
|
|
|
}
|
|
|
|
|
|
|
|
// gets source resources
|
|
|
|
Scholar.Translate.RDF.prototype.getSources = function(resource, property) {
|
|
|
|
property = this._getResource(property);
|
|
|
|
resource = this._getResource(resource);
|
|
|
|
|
2006-08-20 04:35:04 +00:00
|
|
|
var enumerator = this._dataSource.GetSources(property, resource, true);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
return this._deEnumerate(enumerator);
|
|
|
|
}
|
|
|
|
|
|
|
|
// gets target resources
|
|
|
|
Scholar.Translate.RDF.prototype.getTargets = function(resource, property) {
|
|
|
|
property = this._getResource(property);
|
|
|
|
resource = this._getResource(resource);
|
|
|
|
|
|
|
|
var enumerator = this._dataSource.GetTargets(resource, property, true);
|
|
|
|
return this._deEnumerate(enumerator);
|
2006-06-29 00:56:50 +00:00
|
|
|
}
|