2006-06-26 14:46:57 +00:00
|
|
|
// Scholar for Firefox Utilities
|
|
|
|
|
|
|
|
/////////////////////////////////////////////////////////////////
|
|
|
|
//
|
|
|
|
// Scholar.Utilities
|
|
|
|
//
|
|
|
|
/////////////////////////////////////////////////////////////////
|
|
|
|
|
|
|
|
Scholar.Utilities = function () {}
|
|
|
|
|
2006-08-11 15:28:18 +00:00
|
|
|
Scholar.Utilities.prototype.debug = function(msg) {
|
2006-06-26 14:46:57 +00:00
|
|
|
Scholar.debug(msg, 4);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-09-06 04:48:13 +00:00
|
|
|
* See Scholar.Date
|
2006-06-26 14:46:57 +00:00
|
|
|
*/
|
2006-08-31 00:04:11 +00:00
|
|
|
Scholar.Utilities.prototype.formatDate = function(date) {
|
|
|
|
return Scholar.Date.formatDate(date);
|
2006-06-26 14:46:57 +00:00
|
|
|
}
|
2006-09-06 04:45:19 +00:00
|
|
|
Scholar.Utilities.prototype.strToDate = function(date) {
|
|
|
|
return Scholar.Date.strToDate(date);
|
|
|
|
}
|
|
|
|
|
2006-06-26 14:46:57 +00:00
|
|
|
/*
|
|
|
|
* Cleans extraneous punctuation off an author name
|
|
|
|
*/
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
Scholar.Utilities.prototype.cleanAuthor = function(author, type, useComma) {
|
2006-09-07 01:23:13 +00:00
|
|
|
if(typeof(author) != "string") {
|
|
|
|
throw "cleanAuthor: author must be a string";
|
|
|
|
}
|
|
|
|
|
2006-06-26 14:46:57 +00:00
|
|
|
author = author.replace(/^[\s\.\,\/\[\]\:]+/, '');
|
|
|
|
author = author.replace(/[\s\,\/\[\]\:\.]+$/, '');
|
|
|
|
author = author.replace(/ +/, ' ');
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
if(useComma) {
|
|
|
|
// Add period for initials
|
2006-09-04 21:43:23 +00:00
|
|
|
if(author.substr(author.length-2, 1) == " " || author.substr(author.length-2, 1) == ".") {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
author += ".";
|
|
|
|
}
|
2006-09-04 21:43:23 +00:00
|
|
|
var splitNames = author.split(/, ?/);
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
if(splitNames.length > 1) {
|
|
|
|
var lastName = splitNames[0];
|
|
|
|
var firstName = splitNames[1];
|
|
|
|
} else {
|
|
|
|
var lastName = author;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
var spaceIndex = author.lastIndexOf(" ");
|
|
|
|
var lastName = author.substring(spaceIndex+1);
|
|
|
|
var firstName = author.substring(0, spaceIndex);
|
2006-06-26 14:46:57 +00:00
|
|
|
}
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
// TODO: take type into account
|
|
|
|
return {firstName:firstName, lastName:lastName, creatorType:type};
|
2006-06-26 14:46:57 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Cleans whitespace off a string and replaces multiple spaces with one
|
|
|
|
*/
|
|
|
|
Scholar.Utilities.prototype.cleanString = function(s) {
|
2006-09-07 01:23:13 +00:00
|
|
|
if(typeof(s) != "string") {
|
|
|
|
throw "cleanString: argument must be a string";
|
|
|
|
}
|
|
|
|
|
2006-09-08 05:47:47 +00:00
|
|
|
s = s.replace(/[\xA0\r\n\s]+/g, " ");
|
2006-08-11 15:28:18 +00:00
|
|
|
s = s.replace(/^\s+/, "");
|
|
|
|
return s.replace(/\s+$/, "");
|
2006-06-26 14:46:57 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Cleans any non-word non-parenthesis characters off the ends of a string
|
|
|
|
*/
|
|
|
|
Scholar.Utilities.prototype.superCleanString = function(x) {
|
2006-09-07 01:30:10 +00:00
|
|
|
if(typeof(x) != "string") {
|
2006-09-07 01:23:13 +00:00
|
|
|
throw "superCleanString: argument must be a string";
|
|
|
|
}
|
|
|
|
|
2006-06-26 14:46:57 +00:00
|
|
|
var x = x.replace(/^[^\w(]+/, "");
|
|
|
|
return x.replace(/[^\w)]+$/, "");
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Eliminates HTML tags, replacing <br>s with /ns
|
|
|
|
*/
|
|
|
|
Scholar.Utilities.prototype.cleanTags = function(x) {
|
2006-09-07 01:30:10 +00:00
|
|
|
if(typeof(x) != "string") {
|
2006-09-07 01:23:13 +00:00
|
|
|
throw "cleanTags: argument must be a string";
|
|
|
|
}
|
|
|
|
|
2006-06-26 14:46:57 +00:00
|
|
|
x = x.replace(/<br[^>]*>/gi, "\n");
|
|
|
|
return x.replace(/<[^>]+>/g, "");
|
|
|
|
}
|
|
|
|
|
2006-06-29 00:56:50 +00:00
|
|
|
/*
|
|
|
|
* Test if a string is an integer
|
|
|
|
*/
|
|
|
|
Scholar.Utilities.prototype.isInt = function(x) {
|
|
|
|
if(parseInt(x) == x) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get current scholar version
|
|
|
|
*/
|
|
|
|
Scholar.Utilities.prototype.getVersion = function() {
|
|
|
|
return Scholar.version;
|
|
|
|
}
|
|
|
|
|
2006-07-05 21:44:01 +00:00
|
|
|
/*
|
|
|
|
* Get a page range, given a user-entered set of pages
|
|
|
|
*/
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
Scholar.Utilities.prototype._pageRangeRegexp = /^\s*([0-9]+)-([0-9]+)\s*$/;
|
2006-07-05 21:44:01 +00:00
|
|
|
Scholar.Utilities.prototype.getPageRange = function(pages) {
|
|
|
|
var pageNumbers;
|
|
|
|
var m = this._pageRangeRegexp.exec(pages);
|
|
|
|
if(m) {
|
|
|
|
// A page range
|
|
|
|
pageNumbers = [m[1], m[2]];
|
|
|
|
} else {
|
|
|
|
// Assume start and end are the same
|
|
|
|
pageNumbers = [pages, pages];
|
|
|
|
}
|
|
|
|
return pageNumbers;
|
|
|
|
}
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
/*
|
|
|
|
* provide inArray function
|
|
|
|
*/
|
2006-06-29 00:56:50 +00:00
|
|
|
Scholar.Utilities.prototype.inArray = Scholar.inArray;
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
/*
|
|
|
|
* pads a number or other string with a given string on the left
|
|
|
|
*/
|
|
|
|
Scholar.Utilities.prototype.lpad = function(string, pad, length) {
|
|
|
|
while(string.length < length) {
|
|
|
|
string = pad + string;
|
|
|
|
}
|
|
|
|
return string;
|
|
|
|
}
|
|
|
|
|
2006-08-06 09:34:51 +00:00
|
|
|
/*
|
|
|
|
* returns true if an item type exists, false if it does not
|
|
|
|
*/
|
|
|
|
Scholar.Utilities.prototype.itemTypeExists = function(type) {
|
|
|
|
if(Scholar.ItemTypes.getID(type)) {
|
|
|
|
return true;
|
|
|
|
} else {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-09-04 18:16:50 +00:00
|
|
|
/*
|
|
|
|
* Cleans a title, capitalizing the proper words and replacing " :" with ":"
|
|
|
|
*/
|
|
|
|
Scholar.Utilities.capitalizeSkipWords = ["but", "or", "yet", "so", "for", "and",
|
|
|
|
"nor", "a", "an", "the", "at", "by", "from", "in", "into", "of", "on", "to",
|
|
|
|
"with", "up", "down"];
|
|
|
|
Scholar.Utilities.prototype.capitalizeTitle = function(title) {
|
2006-09-09 22:45:03 +00:00
|
|
|
title = this.cleanString(title);
|
2006-09-04 18:16:50 +00:00
|
|
|
title = title.replace(/ : /g, ": ");
|
|
|
|
var words = title.split(" ");
|
|
|
|
|
|
|
|
// always capitalize first
|
|
|
|
words[0] = words[0][0].toUpperCase() + words[0].substr(1);
|
|
|
|
if(words.length > 1) {
|
|
|
|
var lastWordIndex = words.length-1;
|
|
|
|
// always capitalize last
|
|
|
|
words[lastWordIndex] = words[lastWordIndex][0].toUpperCase() + words[lastWordIndex].substr(1);
|
|
|
|
|
|
|
|
if(words.length > 2) {
|
|
|
|
for(var i=1; i<lastWordIndex; i++) {
|
|
|
|
// if not a skip word
|
|
|
|
if(Scholar.Utilities.capitalizeSkipWords.indexOf(words[i].toLowerCase()) == -1 ||
|
2006-09-09 22:45:03 +00:00
|
|
|
(words[i-1].length && words[i-1][words[i-1].length-1] == ":")) {
|
2006-09-04 18:16:50 +00:00
|
|
|
words[i] = words[i][0].toUpperCase() + words[i].substr(1);
|
|
|
|
} else {
|
|
|
|
words[i] = words[i].toLowerCase();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return words.join(" ");
|
|
|
|
}
|
|
|
|
|
2006-06-26 16:19:44 +00:00
|
|
|
/*
|
|
|
|
* END SCHOLAR FOR FIREFOX EXTENSIONS
|
|
|
|
*/
|
|
|
|
|
|
|
|
/////////////////////////////////////////////////////////////////
|
|
|
|
//
|
|
|
|
// Scholar.Utilities.Ingester
|
|
|
|
//
|
|
|
|
/////////////////////////////////////////////////////////////////
|
|
|
|
// Scholar.Utilities.Ingester extends Scholar.Utilities, offering additional
|
|
|
|
// classes relating to data extraction specifically from HTML documents.
|
|
|
|
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
Scholar.Utilities.Ingester = function(translate, proxiedURL) {
|
|
|
|
this.translate = translate;
|
2006-06-26 16:19:44 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
Scholar.Utilities.Ingester.prototype = new Scholar.Utilities();
|
|
|
|
|
|
|
|
// Takes an XPath query and returns the results
|
|
|
|
Scholar.Utilities.Ingester.prototype.gatherElementsOnXPath = function(doc, parentNode, xpath, nsResolver) {
|
|
|
|
var elmts = [];
|
|
|
|
|
|
|
|
var iterator = doc.evaluate(xpath, parentNode, nsResolver, Components.interfaces.nsIDOMXPathResult.ANY_TYPE,null);
|
|
|
|
var elmt = iterator.iterateNext();
|
|
|
|
var i = 0;
|
|
|
|
while (elmt) {
|
|
|
|
elmts[i++] = elmt;
|
|
|
|
elmt = iterator.iterateNext();
|
|
|
|
}
|
|
|
|
return elmts;
|
|
|
|
}
|
|
|
|
|
2006-06-26 14:46:57 +00:00
|
|
|
/*
|
|
|
|
* Gets a given node as a string containing all child nodes
|
|
|
|
*/
|
|
|
|
Scholar.Utilities.Ingester.prototype.getNodeString = function(doc, contextNode, xpath, nsResolver) {
|
|
|
|
var elmts = this.gatherElementsOnXPath(doc, contextNode, xpath, nsResolver);
|
|
|
|
var returnVar = "";
|
|
|
|
for(var i=0; i<elmts.length; i++) {
|
|
|
|
returnVar += elmts[i].nodeValue;
|
|
|
|
}
|
|
|
|
return returnVar;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Grabs items based on URLs
|
|
|
|
*/
|
|
|
|
Scholar.Utilities.Ingester.prototype.getItemArray = function(doc, inHere, urlRe, rejectRe) {
|
|
|
|
var availableItems = new Object(); // Technically, associative arrays are objects
|
|
|
|
|
|
|
|
// Require link to match this
|
|
|
|
if(urlRe) {
|
2006-09-08 05:47:47 +00:00
|
|
|
if(urlRe.exec) {
|
|
|
|
var urlRegexp = urlRe;
|
|
|
|
} else {
|
|
|
|
var urlRegexp = new RegExp();
|
|
|
|
urlRegexp.compile(urlRe, "i");
|
|
|
|
}
|
2006-06-26 14:46:57 +00:00
|
|
|
}
|
|
|
|
// Do not allow text to match this
|
|
|
|
if(rejectRe) {
|
2006-09-08 05:47:47 +00:00
|
|
|
if(rejectRe.exec) {
|
|
|
|
var rejectRegexp = rejectRe;
|
|
|
|
} else {
|
|
|
|
var rejectRegexp = new RegExp();
|
|
|
|
rejectRegexp.compile(rejectRe, "i");
|
|
|
|
}
|
2006-06-26 14:46:57 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if(!inHere.length) {
|
|
|
|
inHere = new Array(inHere);
|
|
|
|
}
|
|
|
|
|
|
|
|
for(var j=0; j<inHere.length; j++) {
|
|
|
|
var links = inHere[j].getElementsByTagName("a");
|
|
|
|
for(var i=0; i<links.length; i++) {
|
|
|
|
if(!urlRe || urlRegexp.test(links[i].href)) {
|
2006-09-08 05:47:47 +00:00
|
|
|
var text = links[i].textContent;
|
2006-06-26 14:46:57 +00:00
|
|
|
if(text) {
|
|
|
|
text = this.cleanString(text);
|
|
|
|
if(!rejectRe || !rejectRegexp.test(text)) {
|
|
|
|
if(availableItems[links[i].href]) {
|
|
|
|
if(text != availableItems[links[i].href]) {
|
|
|
|
availableItems[links[i].href] += " "+text;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
availableItems[links[i].href] = text;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return availableItems;
|
|
|
|
}
|
|
|
|
|
2006-08-07 05:15:30 +00:00
|
|
|
Scholar.Utilities.Ingester.prototype.lookupContextObject = function(co, done, error) {
|
|
|
|
return Scholar.OpenURL.lookupContextObject(co, done, error);
|
|
|
|
}
|
|
|
|
|
2006-08-08 01:06:33 +00:00
|
|
|
Scholar.Utilities.Ingester.prototype.parseContextObject = function(co, item) {
|
|
|
|
return Scholar.OpenURL.parseContextObject(co, item);
|
2006-08-07 05:15:30 +00:00
|
|
|
}
|
|
|
|
|
2006-06-26 16:19:44 +00:00
|
|
|
// Ingester adapters for Scholar.Utilities.HTTP to handle proxies
|
|
|
|
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
Scholar.Utilities.Ingester.prototype.loadDocument = function(url, succeeded, failed) {
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
this.processDocuments([ url ], succeeded, null, failed);
|
2006-06-26 20:02:30 +00:00
|
|
|
}
|
2006-08-18 05:58:14 +00:00
|
|
|
|
|
|
|
Scholar.Utilities.Ingester._protocolRe = new RegExp();
|
|
|
|
Scholar.Utilities.Ingester._protocolRe.compile("^(?:(?:http|https|ftp):|[^:]*/)", "i");
|
2006-08-11 15:28:18 +00:00
|
|
|
Scholar.Utilities.Ingester.prototype.processDocuments = function(urls, processor, done, exception) {
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
if(this.translate.locationIsProxied) {
|
2006-08-24 18:00:48 +00:00
|
|
|
for(var i in urls) {
|
2006-08-18 05:58:14 +00:00
|
|
|
if(this.translate.locationIsProxied) {
|
|
|
|
urls[i] = Scholar.Ingester.ProxyMonitor.properToProxy(urls[i]);
|
|
|
|
}
|
|
|
|
// check for a protocol colon
|
2006-08-24 18:00:48 +00:00
|
|
|
if(!Scholar.Utilities.Ingester._protocolRe.test(urls[i])) {
|
2006-08-18 05:58:14 +00:00
|
|
|
throw("invalid URL in processDocuments");
|
|
|
|
}
|
2006-08-11 15:28:18 +00:00
|
|
|
}
|
2006-06-27 04:08:21 +00:00
|
|
|
}
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
|
|
|
|
// unless the translator has proposed some way to handle an error, handle it
|
|
|
|
// by throwing a "scraping error" message
|
|
|
|
if(!exception) {
|
|
|
|
var translate = this.translate;
|
|
|
|
exception = function(e) {
|
2006-09-09 00:12:09 +00:00
|
|
|
translate._translationComplete(false, e);
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-08-11 15:28:18 +00:00
|
|
|
Scholar.Utilities.HTTP.processDocuments(null, urls, processor, done, exception);
|
2006-06-26 20:02:30 +00:00
|
|
|
}
|
|
|
|
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
Scholar.Utilities.Ingester.HTTP = function(translate) {
|
|
|
|
this.translate = translate;
|
2006-06-26 14:46:57 +00:00
|
|
|
}
|
|
|
|
|
2006-08-12 04:27:49 +00:00
|
|
|
Scholar.Utilities.Ingester.HTTP.prototype.doGet = function(url, onDone) {
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
if(this.translate.locationIsProxied) {
|
2006-06-26 14:46:57 +00:00
|
|
|
url = Scholar.Ingester.ProxyMonitor.properToProxy(url);
|
|
|
|
}
|
2006-08-18 05:58:14 +00:00
|
|
|
if(!Scholar.Utilities.Ingester._protocolRe.test(url)) {
|
|
|
|
throw("invalid URL in processDocuments");
|
|
|
|
}
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
|
|
|
|
var translate = this.translate;
|
|
|
|
Scholar.Utilities.HTTP.doGet(url, function(xmlhttp) {
|
|
|
|
try {
|
|
|
|
onDone(xmlhttp.responseText, xmlhttp);
|
|
|
|
} catch(e) {
|
2006-09-09 00:12:09 +00:00
|
|
|
translate._translationComplete(false, e);
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
}
|
|
|
|
})
|
2006-06-26 16:19:44 +00:00
|
|
|
}
|
|
|
|
|
2006-08-12 04:27:49 +00:00
|
|
|
Scholar.Utilities.Ingester.HTTP.prototype.doPost = function(url, body, onDone) {
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
if(this.translate.locationIsProxied) {
|
2006-06-26 16:19:44 +00:00
|
|
|
url = Scholar.Ingester.ProxyMonitor.properToProxy(url);
|
|
|
|
}
|
2006-08-18 05:58:14 +00:00
|
|
|
if(!Scholar.Utilities.Ingester._protocolRe.test(url)) {
|
|
|
|
throw("invalid URL in processDocuments");
|
|
|
|
}
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
|
|
|
|
var translate = this.translate;
|
|
|
|
Scholar.Utilities.HTTP.doPost(url, body, function(xmlhttp) {
|
|
|
|
try {
|
|
|
|
onDone(xmlhttp.responseText, xmlhttp);
|
|
|
|
} catch(e) {
|
2006-09-09 00:12:09 +00:00
|
|
|
translate._translationComplete(false, e);
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
}
|
|
|
|
})
|
2006-06-26 16:19:44 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// These are front ends for XMLHttpRequest. XMLHttpRequest can't actually be
|
|
|
|
// accessed outside the sandbox, and even if it could, it wouldn't let scripts
|
|
|
|
// access across domains, so everything's replicated here.
|
|
|
|
Scholar.Utilities.HTTP = new function() {
|
|
|
|
this.doGet = doGet;
|
|
|
|
this.doPost = doPost;
|
2006-08-09 18:37:34 +00:00
|
|
|
this.doHead = doHead;
|
2006-06-26 16:19:44 +00:00
|
|
|
this.doOptions = doOptions;
|
|
|
|
this.browserIsOffline = browserIsOffline;
|
|
|
|
|
2006-08-09 18:37:34 +00:00
|
|
|
|
2006-06-26 16:19:44 +00:00
|
|
|
/**
|
|
|
|
* Send an HTTP GET request via XMLHTTPRequest
|
|
|
|
*
|
|
|
|
* Returns false if browser is offline
|
2006-06-26 16:32:19 +00:00
|
|
|
*
|
|
|
|
* doGet can be called as:
|
|
|
|
* Scholar.Utilities.HTTP.doGet(url, onDone)
|
2006-06-26 16:19:44 +00:00
|
|
|
**/
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
function doGet(url, onDone, onError) {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
Scholar.debug("HTTP GET "+url);
|
2006-06-26 16:19:44 +00:00
|
|
|
if (this.browserIsOffline()){
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
var xmlhttp = Components.classes["@mozilla.org/xmlextras/xmlhttprequest;1"]
|
|
|
|
.createInstance();
|
|
|
|
|
|
|
|
var test = xmlhttp.open('GET', url, true);
|
|
|
|
|
|
|
|
xmlhttp.onreadystatechange = function(){
|
2006-08-12 04:27:49 +00:00
|
|
|
_stateChange(xmlhttp, onDone);
|
2006-06-26 16:19:44 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
xmlhttp.send(null);
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
2006-06-26 14:46:57 +00:00
|
|
|
|
|
|
|
|
2006-06-26 16:19:44 +00:00
|
|
|
/**
|
|
|
|
* Send an HTTP POST request via XMLHTTPRequest
|
|
|
|
*
|
|
|
|
* Returns false if browser is offline
|
2006-06-26 16:32:19 +00:00
|
|
|
*
|
|
|
|
* doPost can be called as:
|
|
|
|
* Scholar.Utilities.HTTP.doPost(url, body, onDone)
|
2006-06-26 16:19:44 +00:00
|
|
|
**/
|
2006-08-12 04:27:49 +00:00
|
|
|
function doPost(url, body, onDone) {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
Scholar.debug("HTTP POST "+body+" to "+url);
|
2006-06-26 16:19:44 +00:00
|
|
|
if (this.browserIsOffline()){
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
var xmlhttp = Components.classes["@mozilla.org/xmlextras/xmlhttprequest;1"]
|
|
|
|
.createInstance();
|
|
|
|
|
|
|
|
xmlhttp.open('POST', url, true);
|
closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions
MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
|
|
|
xmlhttp.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
|
2006-06-26 16:19:44 +00:00
|
|
|
|
|
|
|
xmlhttp.onreadystatechange = function(){
|
2006-08-12 04:27:49 +00:00
|
|
|
_stateChange(xmlhttp, onDone);
|
2006-06-26 16:19:44 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
xmlhttp.send(body);
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
2006-06-26 14:46:57 +00:00
|
|
|
|
|
|
|
|
2006-08-12 04:27:49 +00:00
|
|
|
function doHead(url, onDone) {
|
2006-08-09 18:37:34 +00:00
|
|
|
Scholar.debug("HTTP HEAD "+url);
|
|
|
|
if (this.browserIsOffline()){
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
var xmlhttp = Components.classes["@mozilla.org/xmlextras/xmlhttprequest;1"]
|
|
|
|
.createInstance();
|
|
|
|
|
|
|
|
var test = xmlhttp.open('HEAD', url, true);
|
|
|
|
|
|
|
|
xmlhttp.onreadystatechange = function(){
|
2006-08-19 20:45:27 +00:00
|
|
|
_stateChange(xmlhttp, onDone);
|
2006-08-09 18:37:34 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
xmlhttp.send(null);
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2006-06-26 16:19:44 +00:00
|
|
|
/**
|
|
|
|
* Send an HTTP OPTIONS request via XMLHTTPRequest
|
|
|
|
*
|
2006-06-26 16:32:19 +00:00
|
|
|
* doOptions can be called as:
|
|
|
|
* Scholar.Utilities.HTTP.doOptions(url, body, onDone)
|
|
|
|
*
|
|
|
|
* The status handler, which doesn't really serve a very noticeable purpose
|
|
|
|
* in our code, is required for compatiblity with the Piggy Bank project
|
2006-06-26 16:19:44 +00:00
|
|
|
**/
|
2006-08-12 04:27:49 +00:00
|
|
|
function doOptions(url, body, onDone) {
|
closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects
API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators
new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing
apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
|
|
|
Scholar.debug("HTTP OPTIONS "+url);
|
2006-06-26 16:19:44 +00:00
|
|
|
if (this.browserIsOffline()){
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
var xmlhttp = Components.classes["@mozilla.org/xmlextras/xmlhttprequest;1"]
|
|
|
|
.createInstance();
|
|
|
|
|
|
|
|
xmlhttp.open('OPTIONS', url, true);
|
|
|
|
|
|
|
|
xmlhttp.onreadystatechange = function(){
|
2006-08-12 04:27:49 +00:00
|
|
|
_stateChange(xmlhttp, onDone);
|
2006-06-26 16:19:44 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
xmlhttp.send(body);
|
|
|
|
|
|
|
|
return true;
|
2006-06-26 14:46:57 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2006-06-26 16:19:44 +00:00
|
|
|
function browserIsOffline() {
|
|
|
|
return Components.classes["@mozilla.org/network/io-service;1"]
|
|
|
|
.getService(Components.interfaces.nsIIOService).offline;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2006-08-12 04:27:49 +00:00
|
|
|
function _stateChange(xmlhttp, onDone){
|
2006-06-26 16:19:44 +00:00
|
|
|
switch (xmlhttp.readyState){
|
|
|
|
// Request not yet made
|
|
|
|
case 1:
|
|
|
|
break;
|
|
|
|
|
|
|
|
// Called multiple while downloading in progress
|
|
|
|
case 3:
|
|
|
|
break;
|
|
|
|
|
|
|
|
// Download complete
|
|
|
|
case 4:
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
if(onDone){
|
|
|
|
onDone(xmlhttp);
|
2006-06-26 16:19:44 +00:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
2006-06-26 14:46:57 +00:00
|
|
|
}
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
|
|
|
|
|
2006-06-26 20:02:30 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Downloads and processes documents with processor()
|
|
|
|
// firstDoc - the first document to process with the processor (if null,
|
|
|
|
// first document is processed without processor)
|
|
|
|
// urls - an array of URLs to load
|
|
|
|
// processor - a function to execute to process each document
|
|
|
|
// done - a function to execute when all document processing is complete
|
|
|
|
// exception - a function to execute if an exception occurs (exceptions are
|
|
|
|
// also logged in the Scholar for Firefox log)
|
|
|
|
// saveBrowser - whether to save the hidden browser object; usually, you don't
|
|
|
|
// want to do this, because it makes it easier to leak memory
|
|
|
|
Scholar.Utilities.HTTP.processDocuments = function(firstDoc, urls, processor, done, exception, saveBrowser) {
|
2006-07-27 23:01:55 +00:00
|
|
|
var hiddenBrowser = Scholar.Browser.createHiddenBrowser();
|
2006-09-04 20:19:38 +00:00
|
|
|
hiddenBrowser.docShell.allowImages = false;
|
2006-06-26 20:02:30 +00:00
|
|
|
var prevUrl, url;
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
|
|
|
|
if (urls.length == 0) {
|
|
|
|
if(firstDoc) {
|
|
|
|
processor(firstDoc, done);
|
|
|
|
} else {
|
|
|
|
done();
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
var urlIndex = -1;
|
2006-06-26 20:02:30 +00:00
|
|
|
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
var removeListeners = function() {
|
|
|
|
hiddenBrowser.removeEventListener("load", onLoad, true);
|
|
|
|
if(!saveBrowser) {
|
|
|
|
Scholar.Browser.deleteHiddenBrowser(hiddenBrowser);
|
2006-06-26 20:02:30 +00:00
|
|
|
}
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
}
|
|
|
|
var doLoad = function() {
|
|
|
|
urlIndex++;
|
|
|
|
if (urlIndex < urls.length) {
|
|
|
|
url = urls[urlIndex];
|
|
|
|
try {
|
|
|
|
Scholar.debug("loading "+url);
|
|
|
|
hiddenBrowser.loadURI(url);
|
|
|
|
} catch (e) {
|
|
|
|
removeListeners();
|
|
|
|
if(exception) {
|
2006-06-26 20:02:30 +00:00
|
|
|
exception(e);
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
return;
|
|
|
|
} else {
|
|
|
|
throw(e);
|
2006-06-26 20:02:30 +00:00
|
|
|
}
|
|
|
|
}
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
} else {
|
|
|
|
removeListeners();
|
2006-09-09 22:45:03 +00:00
|
|
|
if(done) {
|
|
|
|
done();
|
|
|
|
}
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
}
|
|
|
|
};
|
|
|
|
var onLoad = function() {
|
|
|
|
Scholar.debug(hiddenBrowser.contentDocument.location.href+" has been loaded");
|
|
|
|
if(hiddenBrowser.contentDocument.location.href != prevUrl) { // Just in case it fires too many times
|
|
|
|
prevUrl = hiddenBrowser.contentDocument.location.href;
|
|
|
|
try {
|
|
|
|
processor(hiddenBrowser.contentDocument);
|
|
|
|
} catch (e) {
|
|
|
|
removeListeners();
|
|
|
|
if(exception) {
|
2006-06-26 20:02:30 +00:00
|
|
|
exception(e);
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
return;
|
|
|
|
} else {
|
|
|
|
throw(e);
|
2006-06-26 20:02:30 +00:00
|
|
|
}
|
|
|
|
}
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
doLoad();
|
2006-06-26 20:02:30 +00:00
|
|
|
}
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
};
|
|
|
|
var init = function() {
|
|
|
|
hiddenBrowser.addEventListener("load", onLoad, true);
|
2006-06-26 20:02:30 +00:00
|
|
|
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
if (firstDoc) {
|
|
|
|
processor(firstDoc, doLoad);
|
|
|
|
} else {
|
|
|
|
doLoad();
|
|
|
|
}
|
2006-06-26 20:02:30 +00:00
|
|
|
}
|
closes #187, make berkeley's library work
closes #186, stop translators from hanging
when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
|
|
|
|
|
|
|
init();
|
2006-06-26 14:46:57 +00:00
|
|
|
}
|