Commit graph

48 commits

Author SHA1 Message Date
Dan Stillman
ab13c3980a Fulltext search support
There are currently two types of fulltext searching: an SQL-based word index and a file scanner. They each have their advantages and drawbacks.

The word index is very fast to search and is currently used for the find-as-you-type quicksearch. However, indexing files takes some time, so we should probably offer a preference to turn it off ("Index attachment content for quicksearch" or something). There's also an issue with Chinese characters (which are indexed by character rather than word, since there are no spaces to go by, so a search for a word with common characters could produce erroneous results). The quicksearch doesn't use a left-bound index (since that would probably upset German speakers searching for "musik" in "nachtmusik," though I don't know for sure how they think of words) but still seems pretty fast.

* Note: There will be a potentially long delay when you start Firefox with this revision as it builds a fulltext word index of your existing items. We obviously need a notification/option for this. *

The file scanner, used in the Attachment Content condition of the search dialog, offers phrase searching as well as regex support (both case-sensitive and not, and defaulting to multiline). It doesn't require an index, though it should probably be optimized to use the word index, if available, for narrowing the results when not in regex mode. (It does only scan files that pass all the other search conditions, which speeds it up considerably for multi-condition searches, and skips non-text files unless instructed otherwise, but it's still relatively slow.)

Both convert HTML to text before searching (with the exception of the binary file scanning mode).

There are some issues with which files get indexed and which don't that we can't do much about and that will probably confuse users immensely. Dan C. suggested some sort of indicator (say, a green dot) to show which files are indexed.

Also added (very ugly) charset detection (anybody want to figure out getCharsetFromString(str)?), a setTimeout() replacement in the XPCOM service, an arrayToHash() method, and a new header to timedtextarea.xml, since it's really not copyright CHNM (it's really just a few lines off from the toolkit timed-textbox binding--I tried to change it to extend timed-textbox and just ignore Return keypress events so that we didn't need to duplicate the Mozilla code, but timed-textbox's reliance on html:input instead of html:textarea made things rather difficult).

To do:

- Pref/buttons to disable/clear/rebuild fulltext index
- Hidden prefs to set maximum file size to index/scan
- Don't index words of fewer than 3 non-Asian characters
- MRU cache for saved searches
- Use word index if available to narrow search scope of fulltext scanner
- Cache attachment info methods
- Show content excerpt in search results (at least in advanced search window, when it exists)
- Notification window (a la scraping) to show when indexing
- Indicator of indexed status
- Context menu option to index
- Indicator that a file scanning search is in progress, if possible
- Find other ways to make it index the NYT front page in under 10 seconds
- Probably fix lots of bugs, which you will likely start telling me about...now.
2006-09-21 00:10:29 +00:00
Dan Stillman
302f0105bf Fixes #311, after deleting database, no new database is created
Well that probably would've been mildly frustrating for new users.
2006-09-20 02:15:49 +00:00
Dan Stillman
287e082805 Changed schema update system yet again -- removed DROP TABLE IF EXIST's from user.sql in favor of CREATE TABLE IF NOT EXIST's and changed schema.js to automatically migrate and then reload user.js if the version number has gone up
This lets us add tables to user.sql without writing migration steps for them yet still have the ability to change existing user tables and migrate data if necessary.

Also added _getDropCommands() to do a regex on the SQL file and create the DROP TABLE|INDEX steps necessary to use the DB_REBUILD flag without DROP commands in the SQL file itself, before I realized that it probably made the most sense to just delete the SQL file and storage directory. Changed _initializeSchema() to do that instead. Leaving _getDropCommands() in, in case there's ever a need for it.
2006-09-13 21:34:37 +00:00
Dan Stillman
14e3b05ca4 Separated schema into two files, system.sql and user.sql -- the former contains tables that can be wiped and reinitialized at any time *as long as ids are kept the same* (like scrapers.sql), whereas the latter contains user data that has to be migrated from one version to the other with transition steps
This should make development much easier, as we can, for example, add 80 item types without having to write transition steps

Pretty sure this won't delete anyone's data. Might want to test that theory, though.
2006-09-10 20:08:59 +00:00
Simon Kornblith
d4576d3d55 addresses #69, notification system for broken scrapers
thanks to Dan for his help on the repository side of things
2006-09-09 00:12:09 +00:00
Dan Stillman
58a1fe2bf2 Goodbye to my senior year Sociology research paper bibliography
Hello to the Zotero Quick Start Guide

After this, schema.sql will no longer apply to existing users
2006-08-31 22:21:30 +00:00
Dan Stillman
1693fa85d4 Don't coerce text values in itemData to numbers (like, say, stripping zeroes from ISBN's--thanks Simon)
(Apparently 'columnName NONE' in SQLite is functionally equivalent to 'columnName ILUVTIDDLYWINKS' in setting column affinity--as in, they both use the default of NUMERIC--which frankly I feel the docs could've been slightly more clear on, since they scatter the word NONE in capital letters all about the page with the other affinity keywords...)
2006-08-31 05:58:32 +00:00
Dan Stillman
200ca86989 Closes #237, add accessDate field to all item types with URL field 2006-08-30 16:02:21 +00:00
Dan Stillman
19763cc78a - Updated outward-facing "Scholar" references to "Zotero", along with a few of the internal ones that could be problematic to change later (DB, directory, GUID) -- let me know if I missed any
- About panel now gets version number automatically

- Change version from 1.0a1 to 1.0b1


* Important: If you're on an SVN install, you need to rename the scholar@chnm.gmu.edu text file in your profile extension directory to zotero@chnm.gmu.edu *

XPI installs will (I think) update automatically, since I kept an entry in updates.rdf with the old GUID
2006-08-30 07:05:57 +00:00
Dan Stillman
bec5e78417 Changed itemTypeFields schema to support default show/hide setting for fields in particular item types
Updated item type manager to support changing default show/hide
2006-08-26 10:35:47 +00:00
Dan Stillman
ebb9122e0a Forgot to remove removed field from attachment item type 2006-08-25 07:38:16 +00:00
Dan Stillman
9a897e15e8 Closes #205, eliminate "source" field, and add "url" field on all item types 2006-08-24 20:33:29 +00:00
Dan Stillman
496daebd8e Oops 2006-08-13 22:30:25 +00:00
Dan Stillman
cfc13ee9ac Closes #178, various changes to date fields
Closes #111, minor modifications to field list schema

Changes per above tickets and comments at http://chnm.grouphub.com/projects/310105/msg/cat/2333974/2995041/comments

- date, year and lastModified merged into date
- added date field to journalArticle
- added series, seriesTitle and seriesText to journalArticle
- added seriesNumber (along with series) to book and bookSection
2006-08-13 20:31:11 +00:00
Dan Stillman
bf96eca337 Use alternate method of differentiating between no network connection and invalid repository response in _updateScrapersRemoteCallback(), since xmlhttp.noNetwork property is no longer available
(xmlhttp.status seems to turn into a very large integer when there's a network error--I can't imagine this a fairly reliable test, but, then, neither was the previous one, and at least at the moment we're not actually using the test for anything other than an appropriate debug message)
2006-08-12 04:44:58 +00:00
Dan Stillman
f07ff9ac2a Renamed "Files" to "Attachments" -- since Files could be links as well as actually files (or both, for web page snapshots), things were getting just about as confusing as when Items were called Objects.
If you have attachments to the old terminology, feel free to file a complaint.

Changed interface code too, since David is gone (or at the very least has more important things to do with his remaining time)
2006-08-12 00:18:20 +00:00
Dan Stillman
1e8aa81c02 Fixes #108, Delay repository check by a few seconds if DB transaction is already in progress 2006-08-11 05:51:55 +00:00
Dan Stillman
9c7f33e21a Extended itemCreators primary key to include orderIndex and removed artificial restriction on adding the same creator/creatorType more than once for the same source -- who knows, maybe they just have the same name...
Properly ignore firstName for institutional creators in Item.setCreator() and Item.creatorExists() (which is now unused)
2006-08-11 05:15:56 +00:00
Dan Stillman
957b220cd3 Closes #56, add "institution" field in creators table to deal with institutional authors
- 'isInstitution' parameter added to Item.setCreator(), Creators.getID(), Creators.add()

- 'isInstitution' property added to return from Creators.get() and Item.getCreator()


var obj = Scholar.Items.getNewItemByType(1);
obj.setField('title', 'Digital History for Dummies');
obj.setCreator(0, '', 'Center for History and New Media', 1, true); // true == institutional creator
var id = obj.save();


Note: 'firstName' field is ignored when 'isInstitution' is true
2006-08-11 04:36:44 +00:00
Dan Stillman
7d48fdbeda Closes #175, Add ability to specify certain conditions as required in an ANY search
Conditions in ANY queries can be made required by passing 'true' as an extra parameter to addCondition() and updateCondition() -- this can be used for limiting ANY queries to particular collections (in place of the removed 'context' condition), but if there was an elegant way to expose it to the user for all ANY queries, it's something users might find very useful.
2006-08-10 21:05:28 +00:00
Dan Stillman
f8739ee6c5 Closes #135, Associate MIME types with abstract file types and implement Scholar.FileTypes.getIDFromMIMEType()
MIME type prefixes are handled using wildcards (e.g. audio/foobar will return the audio file type since it matches 'audio/%')
2006-08-08 08:23:23 +00:00
Dan Stillman
d67d96c321 Closes #7, Add advanced search functionality to data layer
Implemented advanced/saved search architecture -- to use, you create a new search with var search = new Scholar.Search(), add conditions to it with addCondition(condition, operator, value), and run it with search(). The standard conditions with their respective operators can be retrieved with Scholar.SearchConditions.getStandardConditions(). Others are for special search flags and can be specified as follows (condition, operator, value):

'context', null, collectionIDToSearchWithin
'recursive', 'true'|'false' (as strings!--defaults to false if not specified, though, so should probably just be removed if not wanted), null
'joinMode', 'any'|'all', null

For standard conditions, currently only 'title' and the itemData fields are supported -- more coming soon.

Localized strings created for the standard search operators


API:

search.setName(name) -- must be called before save() on new searches
search.load(savedSearchID)
search.save() -- saves search to DB and returns a savedSearchID
search.addCondition(condition, operator, value)
search.updateCondition(searchConditionID, condition, operator, value)
search.removeCondition(searchConditionID)
search.getSearchCondition(searchConditionID) -- returns a specific search condition used in the search
search.getSearchConditions() -- returns search conditions used in the search
search.search() -- runs search and returns an array of item ids for results
search.getSQL() -- will be used by Dan for search-within-search

Scholar.Searches.getAll() -- returns an array of saved searches with 'id' and 'name', in alphabetical order
Scholar.Searches.erase(savedSearchID) -- deletes a given saved search from the DB

Scholar.SearchConditions.get(condition) -- get condition data (operators, etc.)
Scholar.SearchConditions.getStandardConditions() -- retrieve conditions for use in drop-down menu (as opposed to special search flags)
Scholar.SearchConditions.hasOperator() -- used by Dan for error-checking
2006-08-08 02:04:02 +00:00
Dan Stillman
d4acec8a77 Addresses #111, minor modifications to field list schema, and http://chnm.grouphub.com/P2995041
Still some outstanding questions on Basecamp, though
2006-08-06 16:10:28 +00:00
Simon Kornblith
b4c8dbe700 closes #157, add database infrastructure for different CSL styles
CSL is stored in a new "csl" table. only metadata relevant to updates and selection (ID, date updated, and title) is stored in columns.
2006-08-03 04:54:16 +00:00
Dan Stillman
526d368aaf Closes #117, permit dashes and commas in "pages" field
Closes #118, add "translator" creator type
Closes #122, add DOI and abbreviated journal title fields
Addresses #45, reorder item fields -- source/rights moved down to bottom; date fields not yet moved
2006-07-31 04:31:44 +00:00
Dan Stillman
c50dedc90a Addresses #17, add filesystem/ability to store files
Not finished, but enough to give David something to work with

No BLOBs -- just linking/importing of files and loaded documents


New Scholar.Item methods:

incrementFileCount() (used internally)
decrementFileCount() (used internally)
isFile()
numFiles()
getFile() -- returns nsILocalFile or false if associated file doesn't exist (note: always returns false for items with LINK_MODE_LINKED_URL, since they have no files -- use getFileURL() instead)
getFileURL() -- returns URL string
getFileLinkMode() -- compare to Scholar.Files.LINK_MODE_* constants: LINKED_FILE, IMPORTED_FILE, LINKED_URL, IMPORTED_URL
getFileMimeType() -- mime type of file (e.g. text/plain)
getFileCharset() -- charsetID of file
getFiles() -- array of file itemIDs this file is a source for

New Scholar.Files methods:

importFromFile(nsIFile file [, int sourceItemID])
linkFromFile(nsIFile file [, int sourceItemID])
importFromDocument(nsIDOMDocument document [, int sourceItemID])
linkFromDocument(nsIDOMDocument document [, int sourceItemID])

New class Scholar.FileTypes -- partially implemented, not yet used

New class  Scholar.CharacterSets -- same as other *Types classes:
getID(idOrName)
getName(idOrName)
getTypes() (aliased to getAll(), which I'll probably change the others to as well)

Charsets table with all official character sets (copied from Mozilla source)

Renamed Item.setNoteSource() to setSource() and Item.getNoteSource() to getSource() and adjusted to handle both notes and files
2006-07-27 09:16:02 +00:00
Dan Stillman
6c88563ded Changed translators.type column to INT and added index
Updated scraper updated mechanism to handle new translator schema, roughly
2006-06-29 07:58:50 +00:00
Dan Stillman
2d23ba97e5 See Also functionality for items and notes
Item.addSeeAlso(itemID)
Item.removeSeeAlso(itemID)
Item.getSeeAlso() -- array of linked items

Relationships are mutual

Closes #82, Notes: see also table
2006-06-29 01:37:53 +00:00
Simon Kornblith
45b9234996 addresses #78, figure out import/export architecture
- changes scrapers table to translators table; all import/export/web translators now belong in this table
- adds Scholar.Translate to handle translation issues. eventually, Scholar.Ingester.Document will become part of this interface
- adds Scholar_File_Interface (in fileInterface.js) to handle UI for export and eventually import. (David, when you have time, please connect Scholar_File_Interface.exportFile to a button.)
- adds an export translator for MODS. all of our metadata, but not our hierarchy (projects, etc.) translates directly and unambiguously into valid MODS. eventually, we can use RDF or another format to handle hierarchy.
- adds utilities.getVersion() and utilities.inArray() for simplified scraper coding
- fixes minor interface issues with the nifty chrome scraping status window
2006-06-29 00:56:50 +00:00
Dan Stillman
52f7ed62d0 Closes #31, tag data infrastructure
New methods:

Item.addTag(tag)
Item.getTags() -- array of tagIDs
Item.removeTag(tagID)

Tags.getName(tagID)
Tags.getID(tag)
Tags.add(text) -- returns tagID of new tag
Tags.purge() -- purge obsolete tags

The last two are for use by Item.addTag() and Item.removeTag(), respectively, and probably don't need to be used elsewhere.
2006-06-28 18:06:36 +00:00
Dan Stillman
cb8f961639 Removed Scholar.HTTP, now that its functionality has been incorporated back into Scholar.Utilities.HTTP 2006-06-27 23:24:02 +00:00
Dan Stillman
d6fa0455e1 Fixes #29, independent notes
- Added 'note' item type

- Updated API to support independent note creation

Notes are, more or less, just regular items, with an item type of 1. They're created through Scholar.Notes.add(text, sourceItemID), which returns the itemID of the new note. sourceItemID is optional--if left out, an independent note will be created. (There's currently nothing stopping you from doing getNewItemByType(1) yourself, but the note would be contentless and broken, so you shouldn't do that.) Note data could've been stuffed into itemData, but I kept it separate in itemNotes to keep metadata searching faster and to keep things cleaner.

Methods calls that can be called on all items:

isNote() (same as testing for itemTypeID 1)

Method calls that can be called on source items only:

numNotes()
getNotes() (array of note itemIDs for a source)

Method calls that can be called on note items only:

updateNote(text)
setNoteSource(sourceItemID) (for changing source--use empty or false to make independent, which is currently what happens when you delete a note--will get option with #91)
getNote() (note content)
getNoteSource() (sourceItemID of a note)

Calling the above methods on the wrong item types will throw an error.


*** This will break note creation/display until David updates interface code. ***
2006-06-27 15:14:07 +00:00
Dan Stillman
05c8b0e467 Fixes #60, make sure it works well offline
- Added detection for network failure -- debug message is output and noNetwork property is added to the xmlhttp object

- Removed onStatus callback from HTTP.doGet and HTTP.doPost -- that was copied over from the Piggy Bank API, but the onDone callback has to handle errors anyway, so it can just check the status code if it actually cares to differentiate non-200 status codes from any other error

- Added error handling for empty responseXML to Schema._updateScrapersRemoteCallback

- Renamed SCHOLAR_CONFIG['REPOSITORY_CHECK_RETRY'] to SCHOLAR_CONFIG['REPOSITORY_RETRY_INTERVAL']
2006-06-25 20:14:11 +00:00
Dan Stillman
480f9d56f6 Fixes #14, Add a callNumber field to all item types, and fixes #75, Add an "extra" field to all item types
I also added accessionNumber to all types (except website), since that's what Endnote does, but we may or may not think that's necessary
2006-06-25 18:10:27 +00:00
Dan Stillman
dc8c695855 Fixes #11, Observe user pref to turn off automatic scraper updates 2006-06-25 07:34:03 +00:00
Dan Stillman
720960feb9 Addresses #5, Add as many item types as possible
New item types from Elena
2006-06-24 08:28:37 +00:00
Dan Stillman
726364d091 Scholar.History -- i.e. undo/redo functionality
Partially integrated into data layer, but I'm waiting to commit that part until I'm sure it won't break everything
2006-06-22 14:01:54 +00:00
Dan Stillman
4904c04e3e Add dateCreated and dateModified columns to itemNotes
Update itemNotes.dateModified on item update
2006-06-16 16:09:18 +00:00
Dan Stillman
fd85af40f8 Many-to-one item note support in the schema and data layer -- still some issues on (I think) the interface side 2006-06-16 07:32:48 +00:00
Dan Stillman
8e97675cc9 Added a timer to run repository checks while the browser is up
Added a separate retry interval so that the extension retries sooner after failures (browser offline, request failure, etc.)

Revision 200 -- w00t i am victorious
2006-06-15 21:06:24 +00:00
Dan Stillman
11a056ecd4 _getDBVersion() caches the version number, so make sure _updateDBVersion() updates it 2006-06-15 16:28:11 +00:00
Dan Stillman
70216ea2c7 - Added automatic scraper update mechanism (more details on Basecamp: http://chnm.grouphub.com/C2687015)
- Removed localLastUpdated field from scrapers table and renamed centralLastUpdated to lastUpdated; updated scraper queries accordingly

- Added query in scrapers.sql to update version table 'repository' row to prevent immediate downloads of newly installed scrapers

- Get version property from extension manager in Scholar.init() and assign to Scholar.version
2006-06-15 06:13:02 +00:00
Dan Stillman
d42258b168 Changed schema of scrapers table to use single GUID for scraperID
Assigned guids to scrapers, replaced INSERT queries with REPLACE queries, and removed table DELETE query at top -- this will allow scrapers to be updated without deleting any others that may exist (e.g. that someone is developing, third-party, etc.)
2006-06-12 15:43:24 +00:00
Dan Stillman
9daa0d3303 Added 'notes' field to fields table and sample data (changes are less messy than they look--I just put 'notes' after 'rights' and 'source' and incremented the other fieldIDs)
scholar.properties addition is just to keep metadata panel from breaking and could probably be removed once interface is hard-coded to not display the notes field there
2006-06-08 21:30:22 +00:00
Dan Stillman
393d19ca85 Fix SQL error on middle creator remove
Added creatorTypeID to the PK in itemCreators, because a creator could conceivably have two roles on an item

Throw an error if attempt to save() an item with two identical creator/creatorTypeID combinations, since, assuming this shouldn't be allowed (i.e. we don't have a four-column PK), there's really no way to handle this elegantly on my end -- interface code can use new method Item.creatorExists(firstName, lastName, creatorTypeID, skipIndex), with skipIndex set to the current index, to make sure this isn't done
2006-06-08 00:16:35 +00:00
Dan Stillman
393807b152 This isn't quite done (I'm discussing changing the scrapers schema with Simon to better handle scraper updates) but in the interest of getting the scrapers in for testing, I'll commit this now.
Integrated the scrapers with the schema update mechanism. Changed a bunch of schema methods to handle both schema.sql and scrapers.sql (or others, if need be) and altered the version table to track mu
ltiple versions for different files. This theoretically should detect that the version table has changed and force a reinitialization of the DB--let me know if there are problems.
2006-06-07 15:27:21 +00:00
Dan Stillman
882a96ee40 Fixed brokenness on schema reinitialization 2006-06-07 14:35:04 +00:00
Dan Stillman
448fde9ff1 Made the schema update system moderately less convoluted
- Broke schema functions into separate object and got rid of DB_VERSION config constant in favor of a toVersion variable in the _migrateSchema command (which isn't technically necessary either, since the version number at the top of schema.sql is now always compared to the DB version at startup) but will help reduce the chance that someone will update the schema file without adding migration steps)

- Removed Amazon scraper from schema.sql, as it will be loaded with the rest of the scrapers
2006-06-07 01:02:59 +00:00