zotero/userdata.sql

276 lines
7.7 KiB
MySQL
Raw Normal View History

2007-10-23 07:11:59 +00:00
-- 36
-- This file creates tables containing user-specific data -- any changes
-- to existing tables made here must be mirrored in transition steps in
2007-10-23 07:11:59 +00:00
-- schema.js::_migrateSchema()
CREATE TABLE IF NOT EXISTS version (
schema TEXT PRIMARY KEY,
version INT NOT NULL
);
CREATE INDEX IF NOT EXISTS schema ON version(schema);
2007-10-23 07:11:59 +00:00
CREATE TABLE IF NOT EXISTS settings (
setting TEXT,
key TEXT,
value,
PRIMARY KEY (setting, key)
);
-- Show or hide pre-mapped fields for system item types
CREATE TABLE IF NOT EXISTS userFieldMask (
itemTypeID INT,
fieldID INT,
hide INT,
PRIMARY KEY (itemTypeID, fieldID),
FOREIGN KEY (itemTypeID, fieldID) REFERENCES itemTypeFields(itemTypeID, fieldID)
);
-- User-defined item types -- itemTypeIDs must be >= 1000
CREATE TABLE IF NOT EXISTS userItemTypes (
2007-10-23 07:11:59 +00:00
itemTypeID INTEGER PRIMARY KEY,
typeName TEXT,
2007-10-23 07:11:59 +00:00
templateItemTypeID INT
);
Cross-posting to BC for discussion: http://chnm.grouphub.com/projects/310105/msg/cat/2114872/3538662/comments Changes as per my discussions with Dan: - Separated snapshot functionality into two individual buttons, Create New Item From Current Page and Take Snapshot of Current page - Updated schema to support primary, secondary and hidden item types (and future user customizations) - Reorganized New Item menu, moving secondary items into sub-menu - Removed ability to create link attachments, since it never really made much sense -- will simply use the webpage item type instead. Underlying functionality still exists for the time being, as people have existing links in their libraries--I think we're gonna have to just warn beta testers and delete them in a transition step, as converting nested links really wouldn't be worth the effort. - Moved file link/add functions into new item menu and removed attachment drop-down - Large, prominent View and Locate buttons in edit pane for going to an associated URL and looking up in OpenURL, respectively -- buttons gray out as appropriate - New Item from Page stores the URL and access date (Item.save() checks for the string "CURRENT_TIMESTAMP" for accessDate and doesn't bind it as a string) - "Website" to "Web Page" (do we prefer "Webpage"? they both look a bit funky in uppercase) More coming. Bugs/Known Issues: - Since snapshots from the toolbar are now top-level in the current collection, there needs to be a way to drag them into items - The camera icon for adding snapshots, despite being a famfamfam icon, really doesn't read too well (or perhaps just clashes with the rest of our icons). Anybody have a better one? (It also may be able to just be lightened up a bit.) - Trying the large View/Locate buttons after discussions with Dan, but this approach may not work -- 1) a large View button for the URL makes a lot less sense when you have a parent item with a child snapshot, since people will end up clicking it all the time when they really want to view the snapshot, and 2) the Locate button is awfully big for something that only applies to certain types of items, may not get used very often when it does, and probably won't work when it is - The access date is stored in UTC and displayed with toLocaleString() like Date Added and Date Modified, but, unlike those two, it's also user-editable. This is clearly a problem. Probably need to parse to Date on blur() with strToDate() and insert as UTC, discarding anything left over. - Item type itself is still "website" -- should probably change that while we still can Closes #253, OpenURL arrow should provide visual feedback on mouseover and/or look more button-like Addresses #304, change references to "website" to "web page" Addresses #207, openurl arrow functionality
2006-09-27 08:12:09 +00:00
-- Control visibility and placement of system and user item types
CREATE TABLE IF NOT EXISTS userItemTypeMask (
2007-10-23 07:11:59 +00:00
itemTypeID INTEGER PRIMARY KEY,
Cross-posting to BC for discussion: http://chnm.grouphub.com/projects/310105/msg/cat/2114872/3538662/comments Changes as per my discussions with Dan: - Separated snapshot functionality into two individual buttons, Create New Item From Current Page and Take Snapshot of Current page - Updated schema to support primary, secondary and hidden item types (and future user customizations) - Reorganized New Item menu, moving secondary items into sub-menu - Removed ability to create link attachments, since it never really made much sense -- will simply use the webpage item type instead. Underlying functionality still exists for the time being, as people have existing links in their libraries--I think we're gonna have to just warn beta testers and delete them in a transition step, as converting nested links really wouldn't be worth the effort. - Moved file link/add functions into new item menu and removed attachment drop-down - Large, prominent View and Locate buttons in edit pane for going to an associated URL and looking up in OpenURL, respectively -- buttons gray out as appropriate - New Item from Page stores the URL and access date (Item.save() checks for the string "CURRENT_TIMESTAMP" for accessDate and doesn't bind it as a string) - "Website" to "Web Page" (do we prefer "Webpage"? they both look a bit funky in uppercase) More coming. Bugs/Known Issues: - Since snapshots from the toolbar are now top-level in the current collection, there needs to be a way to drag them into items - The camera icon for adding snapshots, despite being a famfamfam icon, really doesn't read too well (or perhaps just clashes with the rest of our icons). Anybody have a better one? (It also may be able to just be lightened up a bit.) - Trying the large View/Locate buttons after discussions with Dan, but this approach may not work -- 1) a large View button for the URL makes a lot less sense when you have a parent item with a child snapshot, since people will end up clicking it all the time when they really want to view the snapshot, and 2) the Locate button is awfully big for something that only applies to certain types of items, may not get used very often when it does, and probably won't work when it is - The access date is stored in UTC and displayed with toLocaleString() like Date Added and Date Modified, but, unlike those two, it's also user-editable. This is clearly a problem. Probably need to parse to Date on blur() with strToDate() and insert as UTC, discarding anything left over. - Item type itself is still "website" -- should probably change that while we still can Closes #253, OpenURL arrow should provide visual feedback on mouseover and/or look more button-like Addresses #304, change references to "website" to "web page" Addresses #207, openurl arrow functionality
2006-09-27 08:12:09 +00:00
display INT, -- 0 == hide, 1 == show, 2 == primary
2007-10-23 07:11:59 +00:00
FOREIGN KEY (itemTypeID) REFERENCES userItemTypes(itemTypeID)
Cross-posting to BC for discussion: http://chnm.grouphub.com/projects/310105/msg/cat/2114872/3538662/comments Changes as per my discussions with Dan: - Separated snapshot functionality into two individual buttons, Create New Item From Current Page and Take Snapshot of Current page - Updated schema to support primary, secondary and hidden item types (and future user customizations) - Reorganized New Item menu, moving secondary items into sub-menu - Removed ability to create link attachments, since it never really made much sense -- will simply use the webpage item type instead. Underlying functionality still exists for the time being, as people have existing links in their libraries--I think we're gonna have to just warn beta testers and delete them in a transition step, as converting nested links really wouldn't be worth the effort. - Moved file link/add functions into new item menu and removed attachment drop-down - Large, prominent View and Locate buttons in edit pane for going to an associated URL and looking up in OpenURL, respectively -- buttons gray out as appropriate - New Item from Page stores the URL and access date (Item.save() checks for the string "CURRENT_TIMESTAMP" for accessDate and doesn't bind it as a string) - "Website" to "Web Page" (do we prefer "Webpage"? they both look a bit funky in uppercase) More coming. Bugs/Known Issues: - Since snapshots from the toolbar are now top-level in the current collection, there needs to be a way to drag them into items - The camera icon for adding snapshots, despite being a famfamfam icon, really doesn't read too well (or perhaps just clashes with the rest of our icons). Anybody have a better one? (It also may be able to just be lightened up a bit.) - Trying the large View/Locate buttons after discussions with Dan, but this approach may not work -- 1) a large View button for the URL makes a lot less sense when you have a parent item with a child snapshot, since people will end up clicking it all the time when they really want to view the snapshot, and 2) the Locate button is awfully big for something that only applies to certain types of items, may not get used very often when it does, and probably won't work when it is - The access date is stored in UTC and displayed with toLocaleString() like Date Added and Date Modified, but, unlike those two, it's also user-editable. This is clearly a problem. Probably need to parse to Date on blur() with strToDate() and insert as UTC, discarding anything left over. - Item type itself is still "website" -- should probably change that while we still can Closes #253, OpenURL arrow should provide visual feedback on mouseover and/or look more button-like Addresses #304, change references to "website" to "web page" Addresses #207, openurl arrow functionality
2006-09-27 08:12:09 +00:00
);
-- User-defined fields
CREATE TABLE IF NOT EXISTS userFields (
2007-10-23 07:11:59 +00:00
userFieldID INTEGER PRIMARY KEY,
fieldName TEXT
);
-- Map custom fields to system and custom item types
CREATE TABLE IF NOT EXISTS userItemTypeFields (
itemTypeID INT,
userFieldID INT,
orderIndex INT,
PRIMARY KEY (itemTypeID, userFieldID),
FOREIGN KEY (userFieldID) REFERENCES userFields(userFieldID)
);
-- The foundational table; every item collected has a unique record here
CREATE TABLE IF NOT EXISTS items (
itemID INTEGER PRIMARY KEY,
itemTypeID INT,
dateAdded DATETIME DEFAULT CURRENT_TIMESTAMP,
dateModified DATETIME DEFAULT CURRENT_TIMESTAMP
);
2007-10-23 07:11:59 +00:00
CREATE TABLE IF NOT EXISTS itemDataValues (
valueID INTEGER PRIMARY KEY,
value
);
-- Type-specific data for individual items
2007-10-23 07:11:59 +00:00
--
-- Triggers specified in schema.js due to lack of trigger IF [NOT] EXISTS in Firefox 2.0
CREATE TABLE IF NOT EXISTS itemData (
itemID INT,
fieldID INT,
2007-10-23 07:11:59 +00:00
valueID,
PRIMARY KEY (itemID, fieldID),
FOREIGN KEY (itemID) REFERENCES items(itemID),
2007-10-23 07:11:59 +00:00
FOREIGN KEY (fieldID) REFERENCES fields(fieldID),
FOREIGN KEY (valueID) REFERENCES itemDataValues(valueID)
);
-- Note data for note items
CREATE TABLE IF NOT EXISTS itemNotes (
2007-10-23 07:11:59 +00:00
itemID INTEGER PRIMARY KEY,
sourceItemID INT,
note TEXT,
FOREIGN KEY (itemID) REFERENCES items(itemID),
FOREIGN KEY (sourceItemID) REFERENCES items(itemID)
);
CREATE INDEX IF NOT EXISTS itemNotes_sourceItemID ON itemNotes(sourceItemID);
2007-10-23 07:11:59 +00:00
CREATE TABLE IF NOT EXISTS itemNoteTitles (
itemID INTEGER PRIMARY KEY,
title TEXT,
FOREIGN KEY (itemID) REFERENCES itemNotes(itemID)
);
-- Metadata for attachment items
CREATE TABLE IF NOT EXISTS itemAttachments (
2007-10-23 07:11:59 +00:00
itemID INTEGER PRIMARY KEY,
sourceItemID INT,
linkMode INT,
mimeType TEXT,
charsetID INT,
path TEXT,
originalPath TEXT,
FOREIGN KEY (itemID) REFERENCES items(itemID),
FOREIGN KEY (sourceItemID) REFERENCES items(sourceItemID)
);
CREATE INDEX IF NOT EXISTS itemAttachments_sourceItemID ON itemAttachments(sourceItemID);
CREATE INDEX IF NOT EXISTS itemAttachments_mimeType ON itemAttachments(mimeType);
-- Individual entries for each tag
CREATE TABLE IF NOT EXISTS tags (
2007-10-23 07:11:59 +00:00
tagID INTEGER PRIMARY KEY,
tag TEXT,
tagType INT,
UNIQUE (tag, tagType)
);
-- Associates items with keywords
CREATE TABLE IF NOT EXISTS itemTags (
itemID INT,
tagID INT,
PRIMARY KEY (itemID, tagID),
FOREIGN KEY (itemID) REFERENCES items(itemID),
FOREIGN KEY (tagID) REFERENCES tags(tagID)
);
CREATE INDEX IF NOT EXISTS itemTags_tagID ON itemTags(tagID);
CREATE TABLE IF NOT EXISTS itemSeeAlso (
itemID INT,
linkedItemID INT,
PRIMARY KEY (itemID, linkedItemID),
FOREIGN KEY (itemID) REFERENCES items(itemID),
FOREIGN KEY (linkedItemID) REFERENCES items(itemID)
);
CREATE INDEX IF NOT EXISTS itemSeeAlso_linkedItemID ON itemSeeAlso(linkedItemID);
-- Names of each individual "creator" (inc. authors, editors, etc.)
CREATE TABLE IF NOT EXISTS creators (
2007-10-23 07:11:59 +00:00
creatorID INTEGER PRIMARY KEY,
firstName TEXT,
lastName TEXT,
2007-10-23 07:11:59 +00:00
fieldMode INT
);
-- Associates single or multiple creators to items
CREATE TABLE IF NOT EXISTS itemCreators (
itemID INT,
creatorID INT,
creatorTypeID INT DEFAULT 1,
orderIndex INT DEFAULT 0,
PRIMARY KEY (itemID, creatorID, creatorTypeID, orderIndex),
FOREIGN KEY (itemID) REFERENCES items(itemID),
FOREIGN KEY (creatorID) REFERENCES creators(creatorID)
FOREIGN KEY (creatorTypeID) REFERENCES creatorTypes(creatorTypeID)
);
-- Collections for holding items
CREATE TABLE IF NOT EXISTS collections (
2007-10-23 07:11:59 +00:00
collectionID INTEGER PRIMARY KEY,
collectionName TEXT,
parentCollectionID INT,
FOREIGN KEY (parentCollectionID) REFERENCES collections(collectionID)
);
-- Associates items with the various collections they belong to
CREATE TABLE IF NOT EXISTS collectionItems (
collectionID INT,
itemID INT,
orderIndex INT DEFAULT 0,
PRIMARY KEY (collectionID, itemID),
FOREIGN KEY (collectionID) REFERENCES collections(collectionID),
FOREIGN KEY (itemID) REFERENCES items(itemID)
);
CREATE INDEX IF NOT EXISTS itemID ON collectionItems(itemID);
CREATE TABLE IF NOT EXISTS savedSearches (
2007-10-23 07:11:59 +00:00
savedSearchID INTEGER PRIMARY KEY,
savedSearchName TEXT
);
CREATE TABLE IF NOT EXISTS savedSearchConditions (
savedSearchID INT,
searchConditionID INT,
condition TEXT,
operator TEXT,
value TEXT,
required NONE,
2007-10-23 07:11:59 +00:00
PRIMARY KEY (savedSearchID, searchConditionID),
FOREIGN KEY (savedSearchID) REFERENCES savedSearches(savedSearchID)
);
Fulltext search support There are currently two types of fulltext searching: an SQL-based word index and a file scanner. They each have their advantages and drawbacks. The word index is very fast to search and is currently used for the find-as-you-type quicksearch. However, indexing files takes some time, so we should probably offer a preference to turn it off ("Index attachment content for quicksearch" or something). There's also an issue with Chinese characters (which are indexed by character rather than word, since there are no spaces to go by, so a search for a word with common characters could produce erroneous results). The quicksearch doesn't use a left-bound index (since that would probably upset German speakers searching for "musik" in "nachtmusik," though I don't know for sure how they think of words) but still seems pretty fast. * Note: There will be a potentially long delay when you start Firefox with this revision as it builds a fulltext word index of your existing items. We obviously need a notification/option for this. * The file scanner, used in the Attachment Content condition of the search dialog, offers phrase searching as well as regex support (both case-sensitive and not, and defaulting to multiline). It doesn't require an index, though it should probably be optimized to use the word index, if available, for narrowing the results when not in regex mode. (It does only scan files that pass all the other search conditions, which speeds it up considerably for multi-condition searches, and skips non-text files unless instructed otherwise, but it's still relatively slow.) Both convert HTML to text before searching (with the exception of the binary file scanning mode). There are some issues with which files get indexed and which don't that we can't do much about and that will probably confuse users immensely. Dan C. suggested some sort of indicator (say, a green dot) to show which files are indexed. Also added (very ugly) charset detection (anybody want to figure out getCharsetFromString(str)?), a setTimeout() replacement in the XPCOM service, an arrayToHash() method, and a new header to timedtextarea.xml, since it's really not copyright CHNM (it's really just a few lines off from the toolkit timed-textbox binding--I tried to change it to extend timed-textbox and just ignore Return keypress events so that we didn't need to duplicate the Mozilla code, but timed-textbox's reliance on html:input instead of html:textarea made things rather difficult). To do: - Pref/buttons to disable/clear/rebuild fulltext index - Hidden prefs to set maximum file size to index/scan - Don't index words of fewer than 3 non-Asian characters - MRU cache for saved searches - Use word index if available to narrow search scope of fulltext scanner - Cache attachment info methods - Show content excerpt in search results (at least in advanced search window, when it exists) - Notification window (a la scraping) to show when indexing - Indicator of indexed status - Context menu option to index - Indicator that a file scanning search is in progress, if possible - Find other ways to make it index the NYT front page in under 10 seconds - Probably fix lots of bugs, which you will likely start telling me about...now.
2006-09-21 00:10:29 +00:00
2007-10-23 07:11:59 +00:00
CREATE TABLE IF NOT EXISTS fulltextItems (
itemID INTEGER PRIMARY KEY,
version INT,
indexedPages INT,
totalPages INT,
indexedChars INT,
totalChars INT,
FOREIGN KEY (itemID) REFERENCES items(itemID)
);
CREATE INDEX IF NOT EXISTS fulltextItems_version ON fulltextItems(version);
Fulltext search support There are currently two types of fulltext searching: an SQL-based word index and a file scanner. They each have their advantages and drawbacks. The word index is very fast to search and is currently used for the find-as-you-type quicksearch. However, indexing files takes some time, so we should probably offer a preference to turn it off ("Index attachment content for quicksearch" or something). There's also an issue with Chinese characters (which are indexed by character rather than word, since there are no spaces to go by, so a search for a word with common characters could produce erroneous results). The quicksearch doesn't use a left-bound index (since that would probably upset German speakers searching for "musik" in "nachtmusik," though I don't know for sure how they think of words) but still seems pretty fast. * Note: There will be a potentially long delay when you start Firefox with this revision as it builds a fulltext word index of your existing items. We obviously need a notification/option for this. * The file scanner, used in the Attachment Content condition of the search dialog, offers phrase searching as well as regex support (both case-sensitive and not, and defaulting to multiline). It doesn't require an index, though it should probably be optimized to use the word index, if available, for narrowing the results when not in regex mode. (It does only scan files that pass all the other search conditions, which speeds it up considerably for multi-condition searches, and skips non-text files unless instructed otherwise, but it's still relatively slow.) Both convert HTML to text before searching (with the exception of the binary file scanning mode). There are some issues with which files get indexed and which don't that we can't do much about and that will probably confuse users immensely. Dan C. suggested some sort of indicator (say, a green dot) to show which files are indexed. Also added (very ugly) charset detection (anybody want to figure out getCharsetFromString(str)?), a setTimeout() replacement in the XPCOM service, an arrayToHash() method, and a new header to timedtextarea.xml, since it's really not copyright CHNM (it's really just a few lines off from the toolkit timed-textbox binding--I tried to change it to extend timed-textbox and just ignore Return keypress events so that we didn't need to duplicate the Mozilla code, but timed-textbox's reliance on html:input instead of html:textarea made things rather difficult). To do: - Pref/buttons to disable/clear/rebuild fulltext index - Hidden prefs to set maximum file size to index/scan - Don't index words of fewer than 3 non-Asian characters - MRU cache for saved searches - Use word index if available to narrow search scope of fulltext scanner - Cache attachment info methods - Show content excerpt in search results (at least in advanced search window, when it exists) - Notification window (a la scraping) to show when indexing - Indicator of indexed status - Context menu option to index - Indicator that a file scanning search is in progress, if possible - Find other ways to make it index the NYT front page in under 10 seconds - Probably fix lots of bugs, which you will likely start telling me about...now.
2006-09-21 00:10:29 +00:00
CREATE TABLE IF NOT EXISTS fulltextWords (
wordID INTEGER PRIMARY KEY,
word TEXT UNIQUE
);
CREATE INDEX IF NOT EXISTS fulltextWords_word ON fulltextWords(word);
2007-10-23 07:11:59 +00:00
CREATE TABLE IF NOT EXISTS fulltextItemWords (
Fulltext search support There are currently two types of fulltext searching: an SQL-based word index and a file scanner. They each have their advantages and drawbacks. The word index is very fast to search and is currently used for the find-as-you-type quicksearch. However, indexing files takes some time, so we should probably offer a preference to turn it off ("Index attachment content for quicksearch" or something). There's also an issue with Chinese characters (which are indexed by character rather than word, since there are no spaces to go by, so a search for a word with common characters could produce erroneous results). The quicksearch doesn't use a left-bound index (since that would probably upset German speakers searching for "musik" in "nachtmusik," though I don't know for sure how they think of words) but still seems pretty fast. * Note: There will be a potentially long delay when you start Firefox with this revision as it builds a fulltext word index of your existing items. We obviously need a notification/option for this. * The file scanner, used in the Attachment Content condition of the search dialog, offers phrase searching as well as regex support (both case-sensitive and not, and defaulting to multiline). It doesn't require an index, though it should probably be optimized to use the word index, if available, for narrowing the results when not in regex mode. (It does only scan files that pass all the other search conditions, which speeds it up considerably for multi-condition searches, and skips non-text files unless instructed otherwise, but it's still relatively slow.) Both convert HTML to text before searching (with the exception of the binary file scanning mode). There are some issues with which files get indexed and which don't that we can't do much about and that will probably confuse users immensely. Dan C. suggested some sort of indicator (say, a green dot) to show which files are indexed. Also added (very ugly) charset detection (anybody want to figure out getCharsetFromString(str)?), a setTimeout() replacement in the XPCOM service, an arrayToHash() method, and a new header to timedtextarea.xml, since it's really not copyright CHNM (it's really just a few lines off from the toolkit timed-textbox binding--I tried to change it to extend timed-textbox and just ignore Return keypress events so that we didn't need to duplicate the Mozilla code, but timed-textbox's reliance on html:input instead of html:textarea made things rather difficult). To do: - Pref/buttons to disable/clear/rebuild fulltext index - Hidden prefs to set maximum file size to index/scan - Don't index words of fewer than 3 non-Asian characters - MRU cache for saved searches - Use word index if available to narrow search scope of fulltext scanner - Cache attachment info methods - Show content excerpt in search results (at least in advanced search window, when it exists) - Notification window (a la scraping) to show when indexing - Indicator of indexed status - Context menu option to index - Indicator that a file scanning search is in progress, if possible - Find other ways to make it index the NYT front page in under 10 seconds - Probably fix lots of bugs, which you will likely start telling me about...now.
2006-09-21 00:10:29 +00:00
wordID INT,
itemID INT,
2007-10-23 07:11:59 +00:00
PRIMARY KEY (wordID, itemID),
FOREIGN KEY (wordID) REFERENCES fulltextWords(wordID),
FOREIGN KEY (itemID) REFERENCES items(itemID)
);
CREATE INDEX IF NOT EXISTS fulltextItemWords_itemID ON fulltextItemWords(itemID);
CREATE TABLE IF NOT EXISTS translators (
translatorID TEXT PRIMARY KEY,
minVersion TEXT,
maxVersion TEXT,
lastUpdated DATETIME,
inRepository INT,
priority INT,
translatorType INT,
label TEXT,
creator TEXT,
target TEXT,
detectCode TEXT,
code TEXT
);
CREATE INDEX IF NOT EXISTS translators_type ON translators(translatorType);
CREATE TABLE IF NOT EXISTS csl (
cslID TEXT PRIMARY KEY,
updated DATETIME,
title TEXT,
csl TEXT
);
CREATE TABLE IF NOT EXISTS annotations (
annotationID INTEGER PRIMARY KEY,
itemID INT,
parent TEXT,
textNode INT,
offset INT,
x INT,
y INT,
cols INT,
rows INT,
text TEXT,
collapsed BOOL,
dateModified DATE,
FOREIGN KEY (itemID) REFERENCES itemAttachments(itemID)
);
CREATE INDEX IF NOT EXISTS annotations_itemID ON annotations(itemID);
CREATE TABLE IF NOT EXISTS highlights (
highlightID INTEGER PRIMARY KEY,
itemID INTEGER,
startParent TEXT,
startTextNode INT,
startOffset INT,
endParent TEXT,
endTextNode INT,
endOffset INT,
dateModified DATE,
FOREIGN KEY (itemID) REFERENCES itemAttachments(itemID)
Fulltext search support There are currently two types of fulltext searching: an SQL-based word index and a file scanner. They each have their advantages and drawbacks. The word index is very fast to search and is currently used for the find-as-you-type quicksearch. However, indexing files takes some time, so we should probably offer a preference to turn it off ("Index attachment content for quicksearch" or something). There's also an issue with Chinese characters (which are indexed by character rather than word, since there are no spaces to go by, so a search for a word with common characters could produce erroneous results). The quicksearch doesn't use a left-bound index (since that would probably upset German speakers searching for "musik" in "nachtmusik," though I don't know for sure how they think of words) but still seems pretty fast. * Note: There will be a potentially long delay when you start Firefox with this revision as it builds a fulltext word index of your existing items. We obviously need a notification/option for this. * The file scanner, used in the Attachment Content condition of the search dialog, offers phrase searching as well as regex support (both case-sensitive and not, and defaulting to multiline). It doesn't require an index, though it should probably be optimized to use the word index, if available, for narrowing the results when not in regex mode. (It does only scan files that pass all the other search conditions, which speeds it up considerably for multi-condition searches, and skips non-text files unless instructed otherwise, but it's still relatively slow.) Both convert HTML to text before searching (with the exception of the binary file scanning mode). There are some issues with which files get indexed and which don't that we can't do much about and that will probably confuse users immensely. Dan C. suggested some sort of indicator (say, a green dot) to show which files are indexed. Also added (very ugly) charset detection (anybody want to figure out getCharsetFromString(str)?), a setTimeout() replacement in the XPCOM service, an arrayToHash() method, and a new header to timedtextarea.xml, since it's really not copyright CHNM (it's really just a few lines off from the toolkit timed-textbox binding--I tried to change it to extend timed-textbox and just ignore Return keypress events so that we didn't need to duplicate the Mozilla code, but timed-textbox's reliance on html:input instead of html:textarea made things rather difficult). To do: - Pref/buttons to disable/clear/rebuild fulltext index - Hidden prefs to set maximum file size to index/scan - Don't index words of fewer than 3 non-Asian characters - MRU cache for saved searches - Use word index if available to narrow search scope of fulltext scanner - Cache attachment info methods - Show content excerpt in search results (at least in advanced search window, when it exists) - Notification window (a la scraping) to show when indexing - Indicator of indexed status - Context menu option to index - Indicator that a file scanning search is in progress, if possible - Find other ways to make it index the NYT front page in under 10 seconds - Probably fix lots of bugs, which you will likely start telling me about...now.
2006-09-21 00:10:29 +00:00
);
2007-10-23 07:11:59 +00:00
CREATE INDEX IF NOT EXISTS highlights_itemID ON highlights(itemID);