Commit graph

587 commits

Author SHA1 Message Date
Dan Stillman
237db5ed58 Copied out scraping progress window for general use -- I'll use this for fulltext indexing notification, and ideally the scraper will use this instead now (Simon, let me know if there's any problem with that)
Example usage:

var windowWatcher = Components.classes["@mozilla.org/embedcomp/window-watcher;1"].
					getService(Components.interfaces.nsIWindowWatcher);
var progress = new Scholar.ProgressWindow(windowWatcher.activeWindow);
progress.changeHeadline('Indexing item...');
progress.addLines(['All About Foo'], ['chrome://scholar/skin/treeitem-book.png']);
progress.addDescription('Bar bar bar bar bar');
progress.show();
progress.fade();
2006-09-21 07:54:18 +00:00
Dan Stillman
ab13c3980a Fulltext search support
There are currently two types of fulltext searching: an SQL-based word index and a file scanner. They each have their advantages and drawbacks.

The word index is very fast to search and is currently used for the find-as-you-type quicksearch. However, indexing files takes some time, so we should probably offer a preference to turn it off ("Index attachment content for quicksearch" or something). There's also an issue with Chinese characters (which are indexed by character rather than word, since there are no spaces to go by, so a search for a word with common characters could produce erroneous results). The quicksearch doesn't use a left-bound index (since that would probably upset German speakers searching for "musik" in "nachtmusik," though I don't know for sure how they think of words) but still seems pretty fast.

* Note: There will be a potentially long delay when you start Firefox with this revision as it builds a fulltext word index of your existing items. We obviously need a notification/option for this. *

The file scanner, used in the Attachment Content condition of the search dialog, offers phrase searching as well as regex support (both case-sensitive and not, and defaulting to multiline). It doesn't require an index, though it should probably be optimized to use the word index, if available, for narrowing the results when not in regex mode. (It does only scan files that pass all the other search conditions, which speeds it up considerably for multi-condition searches, and skips non-text files unless instructed otherwise, but it's still relatively slow.)

Both convert HTML to text before searching (with the exception of the binary file scanning mode).

There are some issues with which files get indexed and which don't that we can't do much about and that will probably confuse users immensely. Dan C. suggested some sort of indicator (say, a green dot) to show which files are indexed.

Also added (very ugly) charset detection (anybody want to figure out getCharsetFromString(str)?), a setTimeout() replacement in the XPCOM service, an arrayToHash() method, and a new header to timedtextarea.xml, since it's really not copyright CHNM (it's really just a few lines off from the toolkit timed-textbox binding--I tried to change it to extend timed-textbox and just ignore Return keypress events so that we didn't need to duplicate the Mozilla code, but timed-textbox's reliance on html:input instead of html:textarea made things rather difficult).

To do:

- Pref/buttons to disable/clear/rebuild fulltext index
- Hidden prefs to set maximum file size to index/scan
- Don't index words of fewer than 3 non-Asian characters
- MRU cache for saved searches
- Use word index if available to narrow search scope of fulltext scanner
- Cache attachment info methods
- Show content excerpt in search results (at least in advanced search window, when it exists)
- Notification window (a la scraping) to show when indexing
- Indicator of indexed status
- Context menu option to index
- Indicator that a file scanning search is in progress, if possible
- Find other ways to make it index the NYT front page in under 10 seconds
- Probably fix lots of bugs, which you will likely start telling me about...now.
2006-09-21 00:10:29 +00:00
Dan Stillman
93c15fc061 Allow for single objects as bound parameters without wrapping in array (e.g. DB.query(sql, {string:isbn})) 2006-09-20 23:20:11 +00:00
Dan Stillman
302f0105bf Fixes #311, after deleting database, no new database is created
Well that probably would've been mildly frustrating for new users.
2006-09-20 02:15:49 +00:00
Simon Kornblith
0f8c3e7669 - makes proxy detection work with domain-based EZProxies (I think)
- fixes Word Integration bugs
2006-09-14 02:51:45 +00:00
Dan Stillman
fe319f033b Schema and Item Type Manager updates to handle item type templates
Note that there's no code for user types and fields yet -- just the schema (actually there's a tiny bit of code in the item type manager, since we'll probably use some of the same methods for managing user types, but not much)

Templates for primary item types are currently only used by the item type manager to make creating new types easier and to prevent the removal of fields from an item type that are associated with its template item type -- the fields are all still recorded in itemTypeFields, since they might have different orders or default visibility settings from their templates
2006-09-13 22:04:54 +00:00
Dan Stillman
287e082805 Changed schema update system yet again -- removed DROP TABLE IF EXIST's from user.sql in favor of CREATE TABLE IF NOT EXIST's and changed schema.js to automatically migrate and then reload user.js if the version number has gone up
This lets us add tables to user.sql without writing migration steps for them yet still have the ability to change existing user tables and migrate data if necessary.

Also added _getDropCommands() to do a regex on the SQL file and create the DROP TABLE|INDEX steps necessary to use the DB_REBUILD flag without DROP commands in the SQL file itself, before I realized that it probably made the most sense to just delete the SQL file and storage directory. Changed _initializeSchema() to do that instead. Leaving _getDropCommands() in, in case there's ever a need for it.
2006-09-13 21:34:37 +00:00
Dan Stillman
91def29078 Closes #189, "extra" field should allow multiple lines
Using Shift-Enter as the save keystroke within the Extra textbox so that people can use Enter to create multiple lines of text. Shift-Enter would normally be the newline command, but that's probably a convention that non-technical users of Zotero wouldn't know... Tab (and other triggers for blur()) also saves, and since Extra is the last field, tabbing away functions the same as hitting Enter does for other fields, so it's probably not that big of a deal.
2006-09-12 08:47:24 +00:00
Dan Stillman
b84181766d Fix bug in db.js::statementQuery() that prevented binding a single string parameter without putting it in an array
Also fixed array detection in flattenArguments() to handle a null value
2006-09-12 06:53:48 +00:00
Dan Stillman
cc726ef333 Not that it should happen, but survive an item with an item type of 0 or undefined (and more importantly, let you delete or change it) 2006-09-12 05:20:43 +00:00
Simon Kornblith
7c3e054ebc addresses #301, COinS bugs/enhancements; remaining issue blocked by #3 (add as many item types as possible) 2006-09-11 22:34:39 +00:00
Simon Kornblith
ecfff1393f - closes #225, ability to cite a specific page/paragraph/etc in Word integration. the output isn't quite right at the moment, but the interface works.
- removes net icons that haven't been used in months
- fixes another date bug (the last one, i hope)
- renames CSL class to Scholar.CSL
2006-09-11 01:05:26 +00:00
Dan Stillman
b6f78acfd8 Don't reuse item type and field ids in the item type manager anymore 2006-09-10 20:16:48 +00:00
Dan Stillman
14e3b05ca4 Separated schema into two files, system.sql and user.sql -- the former contains tables that can be wiped and reinitialized at any time *as long as ids are kept the same* (like scrapers.sql), whereas the latter contains user data that has to be migrated from one version to the other with transition steps
This should make development much easier, as we can, for example, add 80 item types without having to write transition steps

Pretty sure this won't delete anyone's data. Might want to test that theory, though.
2006-09-10 20:08:59 +00:00
Simon Kornblith
3dfca25879 - closes #277, disambiguation and notifier updates for Word integration
- closes #217, ability to exclude notes/attachments from select items window
- closes #244, ability to quick search from select items window
- fixes a bug with footnotes in Word integration
- fixes a bug in InnoPAC translator where items would sometimes appear twice
2006-09-10 17:38:17 +00:00
Simon Kornblith
d5bc6cbe4b - fixes a bug in capitalizeTitle
- better feedback for search translator errors
2006-09-09 22:45:03 +00:00
Simon Kornblith
14c5c40a50 - closes #279, Refer/EndNote translator
- fixes a bug in text handling that was previously masked by another
2006-09-09 22:00:04 +00:00
Simon Kornblith
67f6ae3ed2 - closes #69, notification system for broken scrapers
- don't put "Page" before page in WaPo scraper
2006-09-09 19:47:47 +00:00
Simon Kornblith
d4576d3d55 addresses #69, notification system for broken scrapers
thanks to Dan for his help on the repository side of things
2006-09-09 00:12:09 +00:00
Dan Stillman
e2aa1a06db Closes #247, Add interface option for institutional creators in item edit pane
Here's a shot at a single/double creator field toggle switch -- let me know what you think

A few issues:

- There's currently no comma between the last name and first name when in two-field mode -- I removed it to greatly simplify the code, hoping to be able to use the CSS :after pseudo-element, but that seems to not work with XUL -- I'll figure out a clean solution and add it back ( refs #288 ) 

- It's not very smart about switching between single-field mode and two-field mode, as it currently just keeps the last word (even if it's "Jr." or "III") as the last name and puts the rest in the first name field -- not a big deal, but it should at least be a bit smarter about this ( refs #289 )

- There are probably some other bugs
2006-09-08 22:46:49 +00:00
Simon Kornblith
60422e032e - closes #261, work around content-disposition: attachment on endnote links. this workaround is far from the most elegant, but it seemed nicer than writing a stream converter component that didn't really convert streams
- fixes bugs in RIS import
2006-09-08 22:26:59 +00:00
Simon Kornblith
539957a93b - closes #281, look for BOM when importing to override charset. the BOM is a nice way to detect UTF encodings, although it won't help distinguish, e.g., ISO 8859-1 from MacRoman. since EndNote adds a BOM to all of its export files, this means non-ASCII charaacters should now be preserved when exported from EndNote.
- better error handling for translators ("Could Not Add Item" should now pop up in all circumstances)
2006-09-08 20:44:05 +00:00
Dan Stillman
ad5ce20c82 Fixes #286, if a quick search is entered in the item pane, and a new item is added to the library, the item appears regardless
New logic in itemTreeView notify() target:

- Items are only selected on add in the active window -- this fixes a fairly major flaw in the previous system that would cause new items to be selected in all open windows

- If a quicksearch is open in the active window and a new item is added, clear the search

- If quicksearch and active window and item modify, rerun search

- If quicksearch and not active window, rerun search

- If not quicksearch and not active window, update list but retain previous selection
2006-09-08 06:01:29 +00:00
Simon Kornblith
7b7d3d85e3 - added Washington Post translator
- translation works properly even when a user has switched to a different page
2006-09-08 05:47:47 +00:00
Simon Kornblith
b8ddba3a67 CiteSeer translator 2006-09-08 01:59:22 +00:00
Dan Stillman
eba2974ce1 Fixes #251, UI refresh problem with page snapshot
Also fixes snapshot add breakage when another tab is selected caused by blur() fix from other day
2006-09-08 01:23:22 +00:00
Dan Stillman
ea72af8be8 Don't break new note creation 2006-09-08 00:42:46 +00:00
Dan Stillman
bbedd3bd93 Closes #274, autosave in notes pane puts cursor at top of pane 2006-09-08 00:20:42 +00:00
Simon Kornblith
5028880d38 closes #280, BibTeX translator
- fixes date bugs
- fixes (again) an issue that would cause the "unresponsive script" dialog to appear when importing or exporting
2006-09-07 22:10:26 +00:00
Dan Stillman
f011c9b8be Fixes #282, Notes opened in separate windows need item notification 2006-09-07 08:53:04 +00:00
Dan Stillman
14b24f3638 Closes #259, auto-complete of tags
Addresses #260, Add auto-complete to search window

- New XPCOM autocomplete component for Zotero data -- can be used by setting the autocompletesearch attribute of a textbox to 'zotero' and passing a search scope with the autocompletesearchparam attribute. Additional parameters can be passed by appending them to the autocompletesearchparam value with a '/', e.g. 'tag/2732' (to exclude tags that show up in item 2732)

- Tag entry now uses more or less the same interface as metadata -- no more popup window -- note that tab isn't working properly yet, and there's no way to quickly enter multiple tags (though it's now considerably quicker than it was before)

- Autocomplete for tags, excluding any tags already set for the current item

- Standalone note windows now register with the Notifier (since tags needed item modification notifications to work properly), which will help with #282, "Notes opened in separate windows need item notification"

- Tags are now retrieved in alphabetical order

- Scholar.Item.replaceTag(oldTagID, newTag), with a single notify

- Scholar.getAncestorByTagName(elem, tagName) -- walk up the DOM tree from an element until an element with the specified tag name is found (also checks with 'xul:' prefix, for use in XBL), or false if not found -- probably shouldn't be used too widely, since it's doing string comparisons, but better than specifying, say, nine '.parentNode' properties, and makes for more resilient code


A few notes:

- Autocomplete in Minefield seems to self-destruct after using it in the same field a few times, taking down saving of the field with it -- this may or may not be my fault, but it makes Zotero more or less unusable in 3.0 at the moment. Sorry. (I use 3.0 myself for development, so I'll work on it.)

- This would have been much, much easier if having an autocomplete textbox (which uses an XBL-generated popup for the suggestions) within a popup (as it is in the independent note edit panes) didn't introduce all sorts of crazy bugs that had to be defeated with annoying hackery -- one side effect of this is that at the moment you can't close the tags popup with the Escape key

- Independent note windows now need to pull in itemPane.js to function properly, which is a bit messy and not ideal, but less messy and more ideal than duplicating all the dual-state editor and tabindex logic would be

- Hitting tab in a tag field not only doesn't work but also breaks things until the next window refresh.

- There are undoubtedly other bugs.
2006-09-07 08:07:48 +00:00
Simon Kornblith
e9ba093c15 oops 2006-09-07 01:30:10 +00:00
Simon Kornblith
cf8dc232b1 - new translators: New York Review of Books, Chronicle of Higher Education
- more useful errors in utilities
- fixes minor bugs in citation styling
2006-09-07 01:23:13 +00:00
Simon Kornblith
451be4b3a3 closes #242, internationalized date handling 2006-09-06 07:04:02 +00:00
Simon Kornblith
7f40d696a4 more useful comments in utilities.js 2006-09-06 04:48:13 +00:00
Simon Kornblith
89cf0c7235 closes #276, fix RIS bugs
- import translators no longer fail when trying to import an item with no name
- the T2/BT field becomes the publication title when no JO/JF field is available (fixes newspaper issues)
- Y2 is now treated as part of the date if and only if it is improperly formatted (seriously, why can't Thomson get their own specs straight?)
- work around EndNote's strange behavior of putting article titles into notes for no apparent reason
- RIS export gives dates as per specification
- fixed a bug that could have (potentially) caused problems formatting "January"
- allow translators to access strToDate function
2006-09-06 04:45:19 +00:00
Simon Kornblith
858c0145e6 closes #216, support for non-ascii characters in word integration 2006-09-06 03:49:41 +00:00
Dan Stillman
697fcedc58 Capitalization typo in new item drop-down 2006-09-06 01:22:06 +00:00
Simon Kornblith
b3bb6b9013 remove unnecessary debug code 2006-09-05 07:59:25 +00:00
Simon Kornblith
045780d9ac closes #250, figure out proper text encodings for import/export
MODS uses the encoding as specified in the <?xml tag, or else UTF-8
RIS uses IBM850, since the spec says "IBM Extended Character Set" and it's the only code page Mozilla supports. (should I do this? or just use unicode?)
MARC uses UTF-8, since I don't think there's any way to get full MARC-8 support, and UTF-8 is now the preferred encoding anyway
2006-09-05 07:51:55 +00:00
Simon Kornblith
05f56aa489 closes #273, no location asked for in bibliography export (i think)
- improved bibliography (especially Chicago Manual of Style)
- improved error handling for import/export/bibliography
- bibliographic export now ignores notes and standalone attachment (before, they made export silently fail). an error appears if you try to generate a bibliography from only notes or standalone attachments.
2006-09-05 02:03:59 +00:00
Simon Kornblith
e0f6f023d8 various fixes to citation formatting (mostly Chicago Manual of Style) 2006-09-05 01:09:04 +00:00
Simon Kornblith
dd0c537ce1 closes #267, MODS export option uses an rdf extension (should be xml)
thanks to Dan for the idea
2006-09-04 22:57:23 +00:00
Simon Kornblith
7d93903e2d closes #239, fix embedded RDF translator 2006-09-04 21:43:23 +00:00
Simon Kornblith
370fe48388 - remove extraneous debug code
- update scrapers.sql version (do not put into the repository)
2006-09-04 20:21:38 +00:00
Simon Kornblith
aa6e2cfab1 closes #264, UMich lib catalog doesn't work on Windows; other issues related to Mirlyn
positions "saving item" window in a slightly better place on Windows

the UMich bug was actually bigger than I though. as it turns out, the HiddenDOMWindow in Windows is not a chrome window, so i had to modify createHiddenBrowser() to attach the hidden browser object to an existing browser window. i don't believe this should have any adverse effects for snapshots, etc., but Dan, correct me if i'm wrong. it would be nice to be able to create a real chrome instance instead of a XUL element, but all of my attempts at doing so have failed.
2006-09-04 20:19:38 +00:00
Simon Kornblith
2b0bebe7a4 closes #258, MARC translator should capitalize titles 2006-09-04 18:16:50 +00:00
Simon Kornblith
e5404f4938 closes #269, For some COinS pages "could not save item" error 2006-09-04 17:37:07 +00:00
Simon Kornblith
0ab9e8b36c references #268, occasional problems with ingest of pages with multiple references
i've fixed the Amazon.com bug (i think) and made the translator show a "Could Not Save Item" prompt rather than show an empty list, but if you see any other pages where this happens, let me know
2006-09-04 17:09:44 +00:00
Simon Kornblith
ed6650c4e7 closes #218, Windows support for Word integration. this solution seems to work with both Word 2003 and Word 2007. i have not tested with earlier versions. Zotero.dot is the Windows verison; Zotero.dot.dmg is the Mac version. the only difference is the function call used to perform SOAP requests.
to get this to work right, you'll need the SOAP toolkit from http://www.microsoft.com/downloads/details.aspx?FamilyID=ba611554-5943-444c-b53c-c0a450b7013c&DisplayLang=en
I may replace the SOAP object with a simple XMLHTTP object, since that page says that the SOAP toolkit is deprecated.
2006-09-04 08:06:04 +00:00