2006-02-21 17:01:06 +00:00
2006-08-30 07:05:57 +00:00
GUID: 'zotero@chnm.gmu.edu',
DB_FILE: 'zotero.sqlite',
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
DB_REBUILD: false, // erase DB and recreate from schema
2006-03-22 18:53:26 +00:00
2006-06-13 14:53:38 +00:00
DEBUG_TO_CONSOLE: true, // dump debug messages to console rather than (much slower) Debug Logger
2006-08-31 22:52:27 +00:00
REPOSITORY_URL: 'http://www.zotero.org/repo',
2006-07-31 03:38:02 +00:00
2006-06-25 20:14:11 +00:00
2006-02-21 17:01:06 +00:00
* Core functions
2006-03-20 21:47:22 +00:00
var Scholar = new function(){
2006-05-27 00:20:27 +00:00
var _initialized = false;
2006-08-01 18:01:56 +00:00
var _shutdown = false;
2006-05-27 00:20:27 +00:00
var _localizedStringBundle;
2006-03-20 21:47:22 +00:00
2006-05-27 00:20:27 +00:00
// Privileged (public) methods
2006-03-20 21:47:22 +00:00
this.init = init;
2006-08-01 18:01:56 +00:00
this.shutdown = shutdown;
2006-07-27 08:45:48 +00:00
this.getProfileDirectory = getProfileDirectory;
this.getScholarDirectory = getScholarDirectory;
this.getStorageDirectory = getStorageDirectory;
2006-08-01 23:10:31 +00:00
this.getScholarDatabase = getScholarDatabase;
this.backupDatabase = backupDatabase;
2006-03-20 21:47:22 +00:00
this.debug = debug;
this.varDump = varDump;
2006-05-27 00:20:27 +00:00
this.getString = getString;
2006-03-20 21:47:22 +00:00
this.flattenArguments = flattenArguments;
Closes #259, auto-complete of tags
Addresses #260, Add auto-complete to search window
- New XPCOM autocomplete component for Zotero data -- can be used by setting the autocompletesearch attribute of a textbox to 'zotero' and passing a search scope with the autocompletesearchparam attribute. Additional parameters can be passed by appending them to the autocompletesearchparam value with a '/', e.g. 'tag/2732' (to exclude tags that show up in item 2732)
- Tag entry now uses more or less the same interface as metadata -- no more popup window -- note that tab isn't working properly yet, and there's no way to quickly enter multiple tags (though it's now considerably quicker than it was before)
- Autocomplete for tags, excluding any tags already set for the current item
- Standalone note windows now register with the Notifier (since tags needed item modification notifications to work properly), which will help with #282, "Notes opened in separate windows need item notification"
- Tags are now retrieved in alphabetical order
- Scholar.Item.replaceTag(oldTagID, newTag), with a single notify
- Scholar.getAncestorByTagName(elem, tagName) -- walk up the DOM tree from an element until an element with the specified tag name is found (also checks with 'xul:' prefix, for use in XBL), or false if not found -- probably shouldn't be used too widely, since it's doing string comparisons, but better than specifying, say, nine '.parentNode' properties, and makes for more resilient code
A few notes:
- Autocomplete in Minefield seems to self-destruct after using it in the same field a few times, taking down saving of the field with it -- this may or may not be my fault, but it makes Zotero more or less unusable in 3.0 at the moment. Sorry. (I use 3.0 myself for development, so I'll work on it.)
- This would have been much, much easier if having an autocomplete textbox (which uses an XBL-generated popup for the suggestions) within a popup (as it is in the independent note edit panes) didn't introduce all sorts of crazy bugs that had to be defeated with annoying hackery -- one side effect of this is that at the moment you can't close the tags popup with the Escape key
- Independent note windows now need to pull in itemPane.js to function properly, which is a bit messy and not ideal, but less messy and more ideal than duplicating all the dual-state editor and tabindex logic would be
- Hitting tab in a tag field not only doesn't work but also breaks things until the next window refresh.
- There are undoubtedly other bugs.
2006-09-07 08:07:48 +00:00
this.getAncestorByTagName = getAncestorByTagName;
2006-03-20 21:47:22 +00:00
this.join = join;
2006-06-20 17:32:40 +00:00
this.inArray = inArray;
this.arraySearch = arraySearch;
Fulltext search support
There are currently two types of fulltext searching: an SQL-based word index and a file scanner. They each have their advantages and drawbacks.
The word index is very fast to search and is currently used for the find-as-you-type quicksearch. However, indexing files takes some time, so we should probably offer a preference to turn it off ("Index attachment content for quicksearch" or something). There's also an issue with Chinese characters (which are indexed by character rather than word, since there are no spaces to go by, so a search for a word with common characters could produce erroneous results). The quicksearch doesn't use a left-bound index (since that would probably upset German speakers searching for "musik" in "nachtmusik," though I don't know for sure how they think of words) but still seems pretty fast.
* Note: There will be a potentially long delay when you start Firefox with this revision as it builds a fulltext word index of your existing items. We obviously need a notification/option for this. *
The file scanner, used in the Attachment Content condition of the search dialog, offers phrase searching as well as regex support (both case-sensitive and not, and defaulting to multiline). It doesn't require an index, though it should probably be optimized to use the word index, if available, for narrowing the results when not in regex mode. (It does only scan files that pass all the other search conditions, which speeds it up considerably for multi-condition searches, and skips non-text files unless instructed otherwise, but it's still relatively slow.)
Both convert HTML to text before searching (with the exception of the binary file scanning mode).
There are some issues with which files get indexed and which don't that we can't do much about and that will probably confuse users immensely. Dan C. suggested some sort of indicator (say, a green dot) to show which files are indexed.
Also added (very ugly) charset detection (anybody want to figure out getCharsetFromString(str)?), a setTimeout() replacement in the XPCOM service, an arrayToHash() method, and a new header to timedtextarea.xml, since it's really not copyright CHNM (it's really just a few lines off from the toolkit timed-textbox binding--I tried to change it to extend timed-textbox and just ignore Return keypress events so that we didn't need to duplicate the Mozilla code, but timed-textbox's reliance on html:input instead of html:textarea made things rather difficult).
To do:
- Pref/buttons to disable/clear/rebuild fulltext index
- Hidden prefs to set maximum file size to index/scan
- Don't index words of fewer than 3 non-Asian characters
- MRU cache for saved searches
- Use word index if available to narrow search scope of fulltext scanner
- Cache attachment info methods
- Show content excerpt in search results (at least in advanced search window, when it exists)
- Notification window (a la scraping) to show when indexing
- Indicator of indexed status
- Context menu option to index
- Indicator that a file scanning search is in progress, if possible
- Find other ways to make it index the NYT front page in under 10 seconds
- Probably fix lots of bugs, which you will likely start telling me about...now.
2006-09-21 00:10:29 +00:00
this.arrayToHash = arrayToHash;
2006-06-01 19:46:57 +00:00
this.randomString = randomString;
2006-06-03 20:23:19 +00:00
this.getRandomID = getRandomID;
2006-08-01 23:10:31 +00:00
this.moveToUnique = moveToUnique;
2006-05-27 00:20:27 +00:00
2006-06-15 06:13:02 +00:00
// Public properties
2006-08-30 19:18:43 +00:00
2006-09-06 07:04:02 +00:00
2006-08-30 21:56:52 +00:00
2006-06-15 06:13:02 +00:00
2006-02-21 17:01:06 +00:00
* Initialize the extension
2006-03-20 21:47:22 +00:00
function init(){
2006-05-27 00:20:27 +00:00
if (_initialized){
return false;
2006-03-20 21:47:22 +00:00
2006-05-27 00:20:27 +00:00
2006-08-01 18:01:56 +00:00
// Register shutdown handler to call Scholar.shutdown()
var observerService = Components.classes["@mozilla.org/observer-service;1"]
observe: function(subject, topic, data){
Scholar.shutdown(subject, topic, data)
}, "xpcom-shutdown", false);
2006-06-25 07:31:01 +00:00
// Load in the preferences branch for the extension
2006-06-15 06:13:02 +00:00
// Load in the extension version from the extension manager
var nsIUpdateItem = Components.interfaces.nsIUpdateItem;
var gExtensionManager =
var itemType = nsIUpdateItem.TYPE_EXTENSION;
= gExtensionManager.getItemForID(SCHOLAR_CONFIG['GUID']).version;
2006-05-27 00:20:27 +00:00
2006-08-30 19:18:43 +00:00
// OS platform
var win = Components.classes["@mozilla.org/appshell/appShellService;1"]
this.platform = win.navigator.platform;
2006-08-30 19:57:23 +00:00
this.isMac = (this.platform.substr(0, 3) == "Mac");
2006-07-27 08:45:48 +00:00
2006-09-06 07:04:02 +00:00
// Locale
var localeService = Components.classes['@mozilla.org/intl/nslocaleservice;1'].
this.locale = localeService.getLocaleComponentForUserAgent();
2006-05-27 00:20:27 +00:00
// Load in the localization stringbundle for use by getString(name)
var src = 'chrome://scholar/locale/scholar.properties';
var localeService =
var appLocale = localeService.getApplicationLocale();
var stringBundleService =
_localizedStringBundle = stringBundleService.createBundle(src, appLocale);
2006-06-15 06:13:02 +00:00
// Trigger updating of schema and scrapers
2006-08-30 00:16:07 +00:00
// Initialize integration web server
2006-05-27 00:20:27 +00:00
_initialized = true;
return true;
2006-03-20 21:47:22 +00:00
2006-02-21 17:01:06 +00:00
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
2006-08-01 18:01:56 +00:00
function shutdown(subject, topic, data){
// Called twice otherwise, for some reason
if (_shutdown){
return false;
_shutdown = true;
2006-08-01 23:10:31 +00:00
return true;
2006-08-01 18:01:56 +00:00
2006-07-27 08:45:48 +00:00
function getProfileDirectory(){
return Components.classes["@mozilla.org/file/directory_service;1"]
.get("ProfD", Components.interfaces.nsIFile);
function getScholarDirectory(){
var file = Scholar.getProfileDirectory();
2006-08-30 07:05:57 +00:00
2006-07-27 08:45:48 +00:00
// If it doesn't exist, create
if (!file.exists() || !file.isDirectory()){
2006-08-27 21:17:49 +00:00
file.create(Components.interfaces.nsIFile.DIRECTORY_TYPE, 0755);
2006-07-27 08:45:48 +00:00
return file;
function getStorageDirectory(){
var file = Scholar.getScholarDirectory();
// If it doesn't exist, create
if (!file.exists() || !file.isDirectory()){
2006-08-27 21:17:49 +00:00
file.create(Components.interfaces.nsIFile.DIRECTORY_TYPE, 0755);
2006-07-27 08:45:48 +00:00
return file;
2006-08-01 23:10:31 +00:00
function getScholarDatabase(ext){
ext = ext ? '.' + ext : '';
var file = Scholar.getScholarDirectory();
file.append(SCHOLAR_CONFIG['DB_FILE'] + ext);
return file;
* Back up the main database file
* This could probably create a corrupt file fairly easily if all changes
* haven't been flushed to disk -- proceed with caution
function backupDatabase(){
if (Scholar.DB.transactionInProgress()){
Scholar.debug('Transaction in progress--skipping DB backup', 2);
return false;
Scholar.debug('Backing up database');
var file = Scholar.getScholarDatabase();
var backupFile = Scholar.getScholarDatabase('bak');
// Copy via a temporary file so we don't run into disk space issues
// after deleting the old backup file
var tmpFile = Scholar.getScholarDatabase('tmp');
if (tmpFile.exists()){
try {
file.copyTo(file.parent, tmpFile.leafName);
catch (e){
// TODO: deal with low disk space
throw (e);
// Remove old backup file
if (backupFile.exists()){
tmpFile.moveTo(tmpFile.parent, backupFile.leafName);
return true;
2006-07-27 08:45:48 +00:00
2006-02-21 17:01:06 +00:00
* Debug logging function
* Uses DebugLogger extension available from http://mozmonkey.com/debuglogger/
2006-05-15 21:52:18 +00:00
* if available, otherwise the console (in which case boolean browser.dom.window.dump.enabled
* must be created and set to true in about:config)
2006-02-21 17:01:06 +00:00
* Defaults to log level 3 if level not provided
2006-03-20 21:47:22 +00:00
function debug(message, level) {
2006-02-21 17:01:06 +00:00
return false;
2006-06-01 18:43:44 +00:00
if (typeof message!='string'){
message = Scholar.varDump(message);
2006-02-21 17:01:06 +00:00
if (!level){
level = 3;
2006-03-22 18:53:26 +00:00
try {
var logManager =
2006-02-21 17:01:06 +00:00
2006-08-30 07:05:57 +00:00
var logger = logManager.registerLogger("Zotero");
2006-03-22 18:53:26 +00:00
catch (e){}
2006-02-21 17:01:06 +00:00
if (logger){
logger.log(level, message);
else {
2006-08-30 07:05:57 +00:00
dump('zotero(' + level + '): ' + message + "\n\n");
2006-02-21 17:01:06 +00:00
return true;
2006-03-20 21:47:22 +00:00
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
* PHP var_dump equivalent for JS
* Adapted from http://binnyva.blogspot.com/2005/10/dump-function-javascript-equivalent-of.html
2006-03-20 21:47:22 +00:00
function varDump(arr,level) {
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
var dumped_text = "";
if (!level){
level = 0;
// The padding given at the beginning of the line.
var level_padding = "";
for (var j=0;j<level+1;j++){
level_padding += " ";
if (typeof(arr) == 'object') { // Array/Hashes/Objects
for (var item in arr) {
var value = arr[item];
if (typeof(value) == 'object') { // If it is an array,
dumped_text += level_padding + "'" + item + "' ...\n";
dumped_text += arguments.callee(value,level+1);
else {
if (typeof value == 'function'){
dumped_text += level_padding + "'" + item + "' => function(...){...} \n";
else {
dumped_text += level_padding + "'" + item + "' => \"" + value + "\"\n";
else { // Stings/Chars/Numbers etc.
dumped_text = "===>"+arr+"<===("+typeof(arr)+")";
return dumped_text;
2006-03-20 21:47:22 +00:00
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
2006-05-27 00:20:27 +00:00
function getString(name){
2006-06-26 20:41:09 +00:00
try {
var l10n = _localizedStringBundle.GetStringFromName(name);
catch (e){
throw ('Localized string not available for ' + name);
return l10n;
2006-05-27 00:20:27 +00:00
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
* Flattens mixed arrays/values in a passed _arguments_ object and returns
* an array of values -- allows for functions to accept both arrays of
* values and/or an arbitrary number of individual values
2006-03-20 21:47:22 +00:00
function flattenArguments(args){
2006-06-02 20:51:34 +00:00
// Put passed scalar values into an array
2006-09-12 06:53:48 +00:00
if (typeof args!='object' || args===null){
2006-06-02 20:51:34 +00:00
args = [args];
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
var returns = new Array();
for (var i=0; i<args.length; i++){
if (typeof args[i]=='object'){
2006-08-25 19:15:03 +00:00
if(args[i]) {
for (var j=0; j<args[i].length; j++){
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
else {
return returns;
2006-03-20 21:47:22 +00:00
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
Closes #259, auto-complete of tags
Addresses #260, Add auto-complete to search window
- New XPCOM autocomplete component for Zotero data -- can be used by setting the autocompletesearch attribute of a textbox to 'zotero' and passing a search scope with the autocompletesearchparam attribute. Additional parameters can be passed by appending them to the autocompletesearchparam value with a '/', e.g. 'tag/2732' (to exclude tags that show up in item 2732)
- Tag entry now uses more or less the same interface as metadata -- no more popup window -- note that tab isn't working properly yet, and there's no way to quickly enter multiple tags (though it's now considerably quicker than it was before)
- Autocomplete for tags, excluding any tags already set for the current item
- Standalone note windows now register with the Notifier (since tags needed item modification notifications to work properly), which will help with #282, "Notes opened in separate windows need item notification"
- Tags are now retrieved in alphabetical order
- Scholar.Item.replaceTag(oldTagID, newTag), with a single notify
- Scholar.getAncestorByTagName(elem, tagName) -- walk up the DOM tree from an element until an element with the specified tag name is found (also checks with 'xul:' prefix, for use in XBL), or false if not found -- probably shouldn't be used too widely, since it's doing string comparisons, but better than specifying, say, nine '.parentNode' properties, and makes for more resilient code
A few notes:
- Autocomplete in Minefield seems to self-destruct after using it in the same field a few times, taking down saving of the field with it -- this may or may not be my fault, but it makes Zotero more or less unusable in 3.0 at the moment. Sorry. (I use 3.0 myself for development, so I'll work on it.)
- This would have been much, much easier if having an autocomplete textbox (which uses an XBL-generated popup for the suggestions) within a popup (as it is in the independent note edit panes) didn't introduce all sorts of crazy bugs that had to be defeated with annoying hackery -- one side effect of this is that at the moment you can't close the tags popup with the Escape key
- Independent note windows now need to pull in itemPane.js to function properly, which is a bit messy and not ideal, but less messy and more ideal than duplicating all the dual-state editor and tabindex logic would be
- Hitting tab in a tag field not only doesn't work but also breaks things until the next window refresh.
- There are undoubtedly other bugs.
2006-09-07 08:07:48 +00:00
function getAncestorByTagName(elem, tagName){
while (elem.parentNode){
elem = elem.parentNode;
if (elem.tagName==tagName || elem.tagName=='xul:' + tagName){
return elem;
return false;
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
* A version of join() that operates externally for use on objects other
* than arrays (e.g. _arguments_)
* Note that this is safer than extending Object()
2006-03-20 21:47:22 +00:00
function join(obj, delim){
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
var a = [];
for (var i=0, len=obj.length; i<len; i++){
return a.join(delim);
2006-03-20 21:47:22 +00:00
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
2006-06-20 17:32:40 +00:00
* PHP's in_array() for JS -- returns true if a value is contained in
* an array, false otherwise
function inArray(needle, haystack){
for (var i in haystack){
if (haystack[i]==needle){
return true;
return false;
* PHP's array_search() for JS -- searches an array for a value and
* returns the key if found, false otherwise
function arraySearch(needle, haystack){
for (var i in haystack){
if (haystack[i]==needle){
return i;
return false;
Fulltext search support
There are currently two types of fulltext searching: an SQL-based word index and a file scanner. They each have their advantages and drawbacks.
The word index is very fast to search and is currently used for the find-as-you-type quicksearch. However, indexing files takes some time, so we should probably offer a preference to turn it off ("Index attachment content for quicksearch" or something). There's also an issue with Chinese characters (which are indexed by character rather than word, since there are no spaces to go by, so a search for a word with common characters could produce erroneous results). The quicksearch doesn't use a left-bound index (since that would probably upset German speakers searching for "musik" in "nachtmusik," though I don't know for sure how they think of words) but still seems pretty fast.
* Note: There will be a potentially long delay when you start Firefox with this revision as it builds a fulltext word index of your existing items. We obviously need a notification/option for this. *
The file scanner, used in the Attachment Content condition of the search dialog, offers phrase searching as well as regex support (both case-sensitive and not, and defaulting to multiline). It doesn't require an index, though it should probably be optimized to use the word index, if available, for narrowing the results when not in regex mode. (It does only scan files that pass all the other search conditions, which speeds it up considerably for multi-condition searches, and skips non-text files unless instructed otherwise, but it's still relatively slow.)
Both convert HTML to text before searching (with the exception of the binary file scanning mode).
There are some issues with which files get indexed and which don't that we can't do much about and that will probably confuse users immensely. Dan C. suggested some sort of indicator (say, a green dot) to show which files are indexed.
Also added (very ugly) charset detection (anybody want to figure out getCharsetFromString(str)?), a setTimeout() replacement in the XPCOM service, an arrayToHash() method, and a new header to timedtextarea.xml, since it's really not copyright CHNM (it's really just a few lines off from the toolkit timed-textbox binding--I tried to change it to extend timed-textbox and just ignore Return keypress events so that we didn't need to duplicate the Mozilla code, but timed-textbox's reliance on html:input instead of html:textarea made things rather difficult).
To do:
- Pref/buttons to disable/clear/rebuild fulltext index
- Hidden prefs to set maximum file size to index/scan
- Don't index words of fewer than 3 non-Asian characters
- MRU cache for saved searches
- Use word index if available to narrow search scope of fulltext scanner
- Cache attachment info methods
- Show content excerpt in search results (at least in advanced search window, when it exists)
- Notification window (a la scraping) to show when indexing
- Indicator of indexed status
- Context menu option to index
- Indicator that a file scanning search is in progress, if possible
- Find other ways to make it index the NYT front page in under 10 seconds
- Probably fix lots of bugs, which you will likely start telling me about...now.
2006-09-21 00:10:29 +00:00
function arrayToHash(array){
var hash = {};
for each(var val in array){
hash[val] = true;
return hash;
2006-06-01 19:46:57 +00:00
* Generate a random string of length 'len' (defaults to 8)
function randomString(len) {
var chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXTZabcdefghiklmnopqrstuvwxyz";
if (!len){
len = 8;
var randomstring = '';
for (var i=0; i<len; i++) {
var rnum = Math.floor(Math.random() * chars.length);
randomstring += chars.substring(rnum,rnum+1);
return randomstring;
2006-06-03 20:23:19 +00:00
* Find a unique random id for use in a DB table
function getRandomID(table, column, max){
if (!table){
throw('SQL query not provided');
if (!column){
throw('SQL query not provided');
var sql = 'SELECT COUNT(*) FROM ' + table + ' WHERE ' + column + '=';
if (!max){
max = 16383;
2006-06-29 07:03:24 +00:00
var tries = 3; // # of tries to find a unique id
2006-06-03 20:23:19 +00:00
do {
// If no luck after number of tries, try a larger range
if (!tries){
2006-06-29 07:03:24 +00:00
max = max * 128;
2006-06-03 20:23:19 +00:00
var rnd = Math.floor(Math.random()*max);
var exists = Scholar.DB.valueQuery(sql + rnd);
while (exists);
return rnd;
2006-08-01 23:10:31 +00:00
function moveToUnique(file, newFile){
newFile.createUnique(Components.interfaces.nsIFile.NORMAL_FILE_TYPE, 0644);
var newName = newFile.leafName;
// Move file to unique name
file.moveTo(newFile.parent, newName);
return file;
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
2006-06-13 15:14:22 +00:00
2006-06-25 07:31:01 +00:00
Scholar.Prefs = new function(){
// Privileged methods
this.init = init;
this.get = get;
this.set = set;
this.register = register;
this.unregister = unregister;
this.observe = observe;
// Public properties
this.prefBranch; // set in Scholar.init()
function init(){
var prefs = Components.classes["@mozilla.org/preferences-service;1"]
this.prefBranch = prefs.getBranch("extensions.scholar.");
// Register observer to handle pref changes
* Retrieve a preference
function get(pref){
try {
switch (this.prefBranch.getPrefType(pref)){
case this.prefBranch.PREF_BOOL:
return this.prefBranch.getBoolPref(pref);
case this.prefBranch.PREF_STRING:
return this.prefBranch.getCharPref(pref);
case this.prefBranch.PREF_INT:
return this.prefBranch.getIntPref(pref);
catch (e){
throw ("Invalid preference '" + pref + "'");
* Set a preference
function set(pref, value){
try {
switch (this.prefBranch.getPrefType(pref)){
case this.prefBranch.PREF_BOOL:
return this.prefBranch.setBoolPref(pref, value);
case this.prefBranch.PREF_STRING:
return this.prefBranch.setCharPref(pref, value);
case this.prefBranch.PREF_INT:
return this.prefBranch.setIntPref(pref, value);
catch (e){
throw ("Invalid preference '" + pref + "'");
// Methods to register a preferences observer
function register(){
this.prefBranch.addObserver("", this, false);
function unregister(){
if (!this.prefBranch){
this.prefBranch.removeObserver("", this);
function observe(subject, topic, data){
// subject is the nsIPrefBranch we're observing (after appropriate QI)
// data is the name of the pref that's been changed (relative to subject)
switch (data){
2006-06-25 07:34:03 +00:00
case "automaticScraperUpdates":
if (this.get('automaticScraperUpdates')){
else {
2006-06-25 07:31:01 +00:00
2006-06-13 15:14:22 +00:00
* Class for creating hash arrays that behave a bit more sanely
* Hashes can be created in the constructor by alternating key and val:
* var hasharray = new Scholar.Hash('foo','foovalue','bar','barvalue');
* Or using hasharray.set(key, val)
* _val_ defaults to true if not provided
* If using foreach-style looping, be sure to use _for (i in arr.items)_
* rather than just _for (i in arr)_, or else you'll end up with the
* methods and members instead of the hash items
* Most importantly, hasharray.length will work as expected, even with
* non-numeric keys
* Adapated from http://www.mojavelinux.com/articles/javascript_hashes.html
* (c) Mojavelinux, Inc.
* License: Creative Commons
Scholar.Hash = function(){
this.length = 0;
this.items = new Array();
// Public methods defined on prototype below
for (var i = 0; i < arguments.length; i += 2) {
if (typeof(arguments[i + 1]) != 'undefined') {
this.items[arguments[i]] = arguments[i + 1];
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
Scholar.Hash.prototype.get = function(in_key){
2006-08-26 05:33:44 +00:00
return this.items[in_key] ? this.items[in_key] : false;
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
Scholar.Hash.prototype.set = function(in_key, in_value){
// Default to a boolean hash if value not provided
if (typeof(in_value) == 'undefined'){
in_value = true;
if (typeof(this.items[in_key]) == 'undefined') {
this.items[in_key] = in_value;
return in_value;
Scholar.Hash.prototype.remove = function(in_key){
var tmp_value;
if (typeof(this.items[in_key]) != 'undefined') {
var tmp_value = this.items[in_key];
delete this.items[in_key];
2006-02-21 17:01:06 +00:00
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
return tmp_value;
2006-02-21 17:01:06 +00:00
Renamed DB to scholar.sqlite, since that seems to be the current fashion
Added some new core functions:
- Scholar.varDump(), after PHP's var_dump()
- Scholar.flattenArguments(), to flatten mixed array/literal argument lists into a single array
- Scholar.join() -- a version of join() that operates externally, for use on, for example, the arguments object (safer than extending Object)
- Scholar.Hash, a slightly smarter associative array -- not perfect, but brings a proper length property and a few convenience methods (and allows for other additions) -- should probably be limited to places where the length property or other additional additions are needed, since its use is a little non-standard (e.g. you have to remember to do _for (i in arr.items)_ rather than just _for (i in arr)_, to use set(), etc.)
2006-03-14 11:45:19 +00:00
Scholar.Hash.prototype.has = function(in_key){
return typeof(this.items[in_key]) != 'undefined';
2006-06-13 14:53:38 +00:00
2006-06-13 15:07:08 +00:00
Scholar.Date = new function(){
this.sqlToDate = sqlToDate;
2006-08-30 23:21:49 +00:00
this.strToDate = strToDate;
2006-08-31 00:04:11 +00:00
this.formatDate = formatDate;
2006-08-01 23:10:31 +00:00
this.getFileDateString = getFileDateString;
this.getFileTimeString = getFileTimeString;
2006-06-13 15:07:08 +00:00
* Convert an SQL date in the form '2006-06-13 11:03:05' into a JS Date object
* Can also accept just the date part (e.g. '2006-06-13')
2006-08-14 03:19:01 +00:00
function sqlToDate(sqldate, isUTC){
2006-06-13 15:07:08 +00:00
try {
var datetime = sqldate.split(' ');
var dateparts = datetime[0].split('-');
if (datetime[1]){
var timeparts = datetime[1].split(':');
else {
timeparts = [false, false, false];
2006-08-14 03:19:01 +00:00
Cross-posting to BC for discussion: http://chnm.grouphub.com/projects/310105/msg/cat/2114872/3538662/comments
Changes as per my discussions with Dan:
- Separated snapshot functionality into two individual buttons, Create New Item From Current Page and Take Snapshot of Current page
- Updated schema to support primary, secondary and hidden item types (and future user customizations)
- Reorganized New Item menu, moving secondary items into sub-menu
- Removed ability to create link attachments, since it never really made much sense -- will simply use the webpage item type instead. Underlying functionality still exists for the time being, as people have existing links in their libraries--I think we're gonna have to just warn beta testers and delete them in a transition step, as converting nested links really wouldn't be worth the effort.
- Moved file link/add functions into new item menu and removed attachment drop-down
- Large, prominent View and Locate buttons in edit pane for going to an associated URL and looking up in OpenURL, respectively -- buttons gray out as appropriate
- New Item from Page stores the URL and access date (Item.save() checks for the string "CURRENT_TIMESTAMP" for accessDate and doesn't bind it as a string)
- "Website" to "Web Page" (do we prefer "Webpage"? they both look a bit funky in uppercase)
More coming.
Bugs/Known Issues:
- Since snapshots from the toolbar are now top-level in the current collection, there needs to be a way to drag them into items
- The camera icon for adding snapshots, despite being a famfamfam icon, really doesn't read too well (or perhaps just clashes with the rest of our icons). Anybody have a better one? (It also may be able to just be lightened up a bit.)
- Trying the large View/Locate buttons after discussions with Dan, but this approach may not work -- 1) a large View button for the URL makes a lot less sense when you have a parent item with a child snapshot, since people will end up clicking it all the time when they really want to view the snapshot, and 2) the Locate button is awfully big for something that only applies to certain types of items, may not get used very often when it does, and probably won't work when it is
- The access date is stored in UTC and displayed with toLocaleString() like Date Added and Date Modified, but, unlike those two, it's also user-editable. This is clearly a problem. Probably need to parse to Date on blur() with strToDate() and insert as UTC, discarding anything left over.
- Item type itself is still "website" -- should probably change that while we still can
Closes #253, OpenURL arrow should provide visual feedback on mouseover and/or look more button-like
Addresses #304, change references to "website" to "web page"
Addresses #207, openurl arrow functionality
2006-09-27 08:12:09 +00:00
// Invalid date part
if (dateparts.length==1){
return false;
2006-08-14 03:19:01 +00:00
if (isUTC){
return new Date(Date.UTC(dateparts[0], dateparts[1]-1, dateparts[2],
timeparts[0], timeparts[1], timeparts[2]));
2006-06-14 16:59:29 +00:00
return new Date(dateparts[0], dateparts[1]-1, dateparts[2],
2006-06-13 15:07:08 +00:00
timeparts[0], timeparts[1], timeparts[2]);
catch (e){
Scholar.debug(sqldate + ' is not a valid SQL date', 2)
return false;
2006-08-01 23:10:31 +00:00
2006-08-30 23:21:49 +00:00
* converts a string to an object containing:
* day: integer form of the day
* month: integer form of the month (indexed from 0, not 1)
* year: 4 digit year (or, year + BC/AD/etc.)
* part: anything that does not fall under any of the above categories
* (e.g., "Summer," etc.)
2006-09-06 07:04:02 +00:00
var _slashRe = /\b([0-9]{1,4})([\-\/\.\u5e74])([0-9]{1,2})([\-\/\.\u6708])([0-9]{1,4})\b/
var _yearRe = /^(.*)\b((?:circa |around |about |c\.? ?)?[0-9]{1,4}(?: ?B\.? ?C\.?(?: ?E\.?)?| ?C\.? ?E\.?| ?A\.? ?D\.?)|[0-9]{3,4})\b(.*)$/i;
var _monthRe = null;
var _dayRe = null;
2006-08-30 23:21:49 +00:00
function strToDate(string) {
var date = new Object();
// skip empty things
if(!string) {
return date;
2006-08-31 01:49:46 +00:00
string = string.toString().replace(/^\s+/, "").replace(/\s+$/, "").replace(/\s+/, " ");
2006-08-30 23:21:49 +00:00
2006-09-06 07:04:02 +00:00
// first, directly inspect the string
var m = _slashRe.exec(string);
if(m && (m[2] == m[4] || (m[2] == "\u5e74" && m[4] == "\u6708"))) {
// figure out date based on parts
if(m[1].length == 4 || m[2] == "\u5e74") {
// ISO 8601 style date (big endian)
date.year = m[1];
date.month = m[3];
date.day = m[5];
} else {
// local style date (middle or little endian)
date.year = m[5];
var country = Scholar.locale.substr(3);
if(country == "US" || // The United States
country == "FM" || // The Federated States of Micronesia
country == "PW" || // Palau
country == "PH") { // The Philippines
date.month = m[1];
date.day = m[3];
} else {
date.month = m[3];
date.day = m[1];
2006-08-30 23:21:49 +00:00
2006-09-06 07:04:02 +00:00
date.year = parseInt(date.year, 10);
date.month = parseInt(date.month, 10);
date.day = parseInt(date.day, 10);
if(date.month > 12) {
// swap day and month
var tmp = date.day;
date.day = date.month
date.month = tmp;
if(date.month <= 12) {
if(date.year < 100) { // for two digit years, determine proper
// four digit year
var today = new Date();
var year = today.getFullYear();
var twoDigitYear = year % 100;
var century = year - twoDigitYear;
if(date.year <= twoDigitYear) {
// assume this date is from our century
date.year = century + date.year;
} else {
// assume this date is from the previous century
date.year = century - 100 + date.year;
date.month--; // subtract one for JS style
Scholar.debug("DATE: retrieved with algorithms: "+date.toSource());
return date;
2006-08-30 23:21:49 +00:00
2006-09-06 07:04:02 +00:00
// couldn't use algorithms; use regexp
2006-08-30 23:21:49 +00:00
// first, see if we have anything resembling a year
2006-09-06 07:04:02 +00:00
var m = _yearRe.exec(string);
2006-08-30 23:21:49 +00:00
if(m) {
date.year = m[2];
date.part = m[1]+m[3];
Scholar.debug("DATE: got year ("+date.year+", "+date.part+")");
2006-08-31 00:04:11 +00:00
// get short month strings from CSL interpreter
2006-09-08 22:26:59 +00:00
if(!months) {
2006-09-11 01:05:26 +00:00
var months = Scholar.CSL.getMonthStrings("short");
2006-09-08 22:26:59 +00:00
2006-09-06 07:04:02 +00:00
if(!_monthRe) {
// then, see if have anything resembling a month anywhere
_monthRe = new RegExp("^(.*)\\b("+months.join("|")+")[^ ]* (.*)$", "i");
2006-08-31 00:04:11 +00:00
2006-09-06 07:04:02 +00:00
var m = _monthRe.exec(date.part);
2006-08-30 23:21:49 +00:00
if(m) {
date.month = months.indexOf(m[2][0].toUpperCase()+m[2].substr(1).toLowerCase());
date.part = m[1]+m[3];
Scholar.debug("DATE: got month ("+date.month+", "+date.part+")");
// then, see if there's a day
2006-09-06 07:04:02 +00:00
if(!_dayRe) {
var daySuffixes = Scholar.getString("date.daySuffixes").replace(/, ?/g, "|");
2006-09-11 01:05:26 +00:00
_dayRe = new RegExp("\\b([0-9]{1,2})(?:"+daySuffixes+")?\\b(.*)", "i");
2006-09-06 07:04:02 +00:00
var m = _dayRe.exec(date.part);
2006-08-30 23:21:49 +00:00
if(m) {
2006-09-11 01:05:26 +00:00
date.day = parseInt(m[1], 10);
if(m.index > 0) {
date.part = date.part.substr(0, m.index);
if(m[2]) {
date.part += " "+m[2];;
} else {
date.part = m[2];
2006-08-30 23:21:49 +00:00
Scholar.debug("DATE: got day ("+date.day+", "+date.part+")");
if(date.part) {
date.part = date.part.replace(/^[^A-Za-z0-9]+/, "").replace(/[^A-Za-z0-9]+$/, "");
2006-09-06 04:45:19 +00:00
if(!date.part.length) {
date.part = undefined;
2006-08-30 23:21:49 +00:00
return date;
2006-08-01 23:10:31 +00:00
2006-08-31 00:04:11 +00:00
* does pretty formatting of a date object returned by strToDate()
function formatDate(date) {
var string = "";
2006-09-07 22:10:26 +00:00
if(date.part) {
2006-08-31 00:04:11 +00:00
string += date.part+" ";
2006-09-08 22:26:59 +00:00
if(!months) {
2006-09-11 01:05:26 +00:00
var months = Scholar.CSL.getMonthStrings("short");
2006-09-08 22:26:59 +00:00
2006-09-07 22:10:26 +00:00
if(date.month != undefined && months[date.month]) {
2006-08-31 00:04:11 +00:00
// get short month strings from CSL interpreter
2006-09-11 01:05:26 +00:00
var months = Scholar.CSL.getMonthStrings("long");
2006-08-31 00:04:11 +00:00
string += months[date.month];
2006-09-07 22:10:26 +00:00
if(date.day) {
2006-09-06 04:45:19 +00:00
string += " "+date.day+", ";
2006-08-31 00:04:11 +00:00
} else {
string += " ";
2006-09-07 22:10:26 +00:00
if(date.year) {
2006-08-31 00:04:11 +00:00
string += date.year;
return string;
2006-08-01 23:10:31 +00:00
function getFileDateString(file){
var date = new Date();
return date.toLocaleDateString();
function getFileTimeString(file){
var date = new Date();
return date.toLocaleTimeString();
2006-07-27 23:01:55 +00:00
Scholar.Browser = new function() {
this.createHiddenBrowser = createHiddenBrowser;
this.deleteHiddenBrowser = deleteHiddenBrowser;
Fulltext search support
There are currently two types of fulltext searching: an SQL-based word index and a file scanner. They each have their advantages and drawbacks.
The word index is very fast to search and is currently used for the find-as-you-type quicksearch. However, indexing files takes some time, so we should probably offer a preference to turn it off ("Index attachment content for quicksearch" or something). There's also an issue with Chinese characters (which are indexed by character rather than word, since there are no spaces to go by, so a search for a word with common characters could produce erroneous results). The quicksearch doesn't use a left-bound index (since that would probably upset German speakers searching for "musik" in "nachtmusik," though I don't know for sure how they think of words) but still seems pretty fast.
* Note: There will be a potentially long delay when you start Firefox with this revision as it builds a fulltext word index of your existing items. We obviously need a notification/option for this. *
The file scanner, used in the Attachment Content condition of the search dialog, offers phrase searching as well as regex support (both case-sensitive and not, and defaulting to multiline). It doesn't require an index, though it should probably be optimized to use the word index, if available, for narrowing the results when not in regex mode. (It does only scan files that pass all the other search conditions, which speeds it up considerably for multi-condition searches, and skips non-text files unless instructed otherwise, but it's still relatively slow.)
Both convert HTML to text before searching (with the exception of the binary file scanning mode).
There are some issues with which files get indexed and which don't that we can't do much about and that will probably confuse users immensely. Dan C. suggested some sort of indicator (say, a green dot) to show which files are indexed.
Also added (very ugly) charset detection (anybody want to figure out getCharsetFromString(str)?), a setTimeout() replacement in the XPCOM service, an arrayToHash() method, and a new header to timedtextarea.xml, since it's really not copyright CHNM (it's really just a few lines off from the toolkit timed-textbox binding--I tried to change it to extend timed-textbox and just ignore Return keypress events so that we didn't need to duplicate the Mozilla code, but timed-textbox's reliance on html:input instead of html:textarea made things rather difficult).
To do:
- Pref/buttons to disable/clear/rebuild fulltext index
- Hidden prefs to set maximum file size to index/scan
- Don't index words of fewer than 3 non-Asian characters
- MRU cache for saved searches
- Use word index if available to narrow search scope of fulltext scanner
- Cache attachment info methods
- Show content excerpt in search results (at least in advanced search window, when it exists)
- Notification window (a la scraping) to show when indexing
- Indicator of indexed status
- Context menu option to index
- Indicator that a file scanning search is in progress, if possible
- Find other ways to make it index the NYT front page in under 10 seconds
- Probably fix lots of bugs, which you will likely start telling me about...now.
2006-09-21 00:10:29 +00:00
function createHiddenBrowser(myWindow) {
2006-07-27 23:01:55 +00:00
if(!myWindow) {
2006-09-04 20:19:38 +00:00
var myWindow = Components.classes["@mozilla.org/appshell/window-mediator;1"]
2006-07-27 23:01:55 +00:00
Fulltext search support
There are currently two types of fulltext searching: an SQL-based word index and a file scanner. They each have their advantages and drawbacks.
The word index is very fast to search and is currently used for the find-as-you-type quicksearch. However, indexing files takes some time, so we should probably offer a preference to turn it off ("Index attachment content for quicksearch" or something). There's also an issue with Chinese characters (which are indexed by character rather than word, since there are no spaces to go by, so a search for a word with common characters could produce erroneous results). The quicksearch doesn't use a left-bound index (since that would probably upset German speakers searching for "musik" in "nachtmusik," though I don't know for sure how they think of words) but still seems pretty fast.
* Note: There will be a potentially long delay when you start Firefox with this revision as it builds a fulltext word index of your existing items. We obviously need a notification/option for this. *
The file scanner, used in the Attachment Content condition of the search dialog, offers phrase searching as well as regex support (both case-sensitive and not, and defaulting to multiline). It doesn't require an index, though it should probably be optimized to use the word index, if available, for narrowing the results when not in regex mode. (It does only scan files that pass all the other search conditions, which speeds it up considerably for multi-condition searches, and skips non-text files unless instructed otherwise, but it's still relatively slow.)
Both convert HTML to text before searching (with the exception of the binary file scanning mode).
There are some issues with which files get indexed and which don't that we can't do much about and that will probably confuse users immensely. Dan C. suggested some sort of indicator (say, a green dot) to show which files are indexed.
Also added (very ugly) charset detection (anybody want to figure out getCharsetFromString(str)?), a setTimeout() replacement in the XPCOM service, an arrayToHash() method, and a new header to timedtextarea.xml, since it's really not copyright CHNM (it's really just a few lines off from the toolkit timed-textbox binding--I tried to change it to extend timed-textbox and just ignore Return keypress events so that we didn't need to duplicate the Mozilla code, but timed-textbox's reliance on html:input instead of html:textarea made things rather difficult).
To do:
- Pref/buttons to disable/clear/rebuild fulltext index
- Hidden prefs to set maximum file size to index/scan
- Don't index words of fewer than 3 non-Asian characters
- MRU cache for saved searches
- Use word index if available to narrow search scope of fulltext scanner
- Cache attachment info methods
- Show content excerpt in search results (at least in advanced search window, when it exists)
- Notification window (a la scraping) to show when indexing
- Indicator of indexed status
- Context menu option to index
- Indicator that a file scanning search is in progress, if possible
- Find other ways to make it index the NYT front page in under 10 seconds
- Probably fix lots of bugs, which you will likely start telling me about...now.
2006-09-21 00:10:29 +00:00
// Create a hidden browser
2006-07-27 23:01:55 +00:00
var newHiddenBrowser = myWindow.document.createElement("browser");
2006-09-04 20:19:38 +00:00
Fulltext search support
There are currently two types of fulltext searching: an SQL-based word index and a file scanner. They each have their advantages and drawbacks.
The word index is very fast to search and is currently used for the find-as-you-type quicksearch. However, indexing files takes some time, so we should probably offer a preference to turn it off ("Index attachment content for quicksearch" or something). There's also an issue with Chinese characters (which are indexed by character rather than word, since there are no spaces to go by, so a search for a word with common characters could produce erroneous results). The quicksearch doesn't use a left-bound index (since that would probably upset German speakers searching for "musik" in "nachtmusik," though I don't know for sure how they think of words) but still seems pretty fast.
* Note: There will be a potentially long delay when you start Firefox with this revision as it builds a fulltext word index of your existing items. We obviously need a notification/option for this. *
The file scanner, used in the Attachment Content condition of the search dialog, offers phrase searching as well as regex support (both case-sensitive and not, and defaulting to multiline). It doesn't require an index, though it should probably be optimized to use the word index, if available, for narrowing the results when not in regex mode. (It does only scan files that pass all the other search conditions, which speeds it up considerably for multi-condition searches, and skips non-text files unless instructed otherwise, but it's still relatively slow.)
Both convert HTML to text before searching (with the exception of the binary file scanning mode).
There are some issues with which files get indexed and which don't that we can't do much about and that will probably confuse users immensely. Dan C. suggested some sort of indicator (say, a green dot) to show which files are indexed.
Also added (very ugly) charset detection (anybody want to figure out getCharsetFromString(str)?), a setTimeout() replacement in the XPCOM service, an arrayToHash() method, and a new header to timedtextarea.xml, since it's really not copyright CHNM (it's really just a few lines off from the toolkit timed-textbox binding--I tried to change it to extend timed-textbox and just ignore Return keypress events so that we didn't need to duplicate the Mozilla code, but timed-textbox's reliance on html:input instead of html:textarea made things rather difficult).
To do:
- Pref/buttons to disable/clear/rebuild fulltext index
- Hidden prefs to set maximum file size to index/scan
- Don't index words of fewer than 3 non-Asian characters
- MRU cache for saved searches
- Use word index if available to narrow search scope of fulltext scanner
- Cache attachment info methods
- Show content excerpt in search results (at least in advanced search window, when it exists)
- Notification window (a la scraping) to show when indexing
- Indicator of indexed status
- Context menu option to index
- Indicator that a file scanning search is in progress, if possible
- Find other ways to make it index the NYT front page in under 10 seconds
- Probably fix lots of bugs, which you will likely start telling me about...now.
2006-09-21 00:10:29 +00:00
Scholar.debug("created hidden browser ("
+ myWindow.document.getElementsByTagName('browser').length + ")");
2006-07-27 23:01:55 +00:00
return newHiddenBrowser;
function deleteHiddenBrowser(myBrowser) {
// Delete a hidden browser
delete myBrowser;
Scholar.debug("deleted hidden browser");
2006-06-06 19:53:27 +00:00