Commit graph

90 commits

Author SHA1 Message Date
Abe Jellinek
da1eb6fda9 Feeds: Prefer content to summary when available 2024-04-30 02:14:57 -04:00
Abe Jellinek
a5393ca0e5 Merge: Keep external annotations on master, don't erase on other
PDFWorker only re-imports external annotations when the file changes on
disk. Keep annotation items corresponding to external annotations so
that they don't disappear after the merge (and then come back when the
file is edited).

Tweaks behavior introduced in 2aa34a6.

https://forums.zotero.org/discussion/113943/zotero-7-beta-merging-pdf-files-leaves-ghost-external-annotations
2024-04-25 16:35:09 -04:00
Abe Jellinek
8b13f717b4 SAXXMLReader: Handle non-UTF-8 encodings (#3846) 2024-04-20 06:41:51 -04:00
Abe Jellinek
1f599283df
Fix indexing files with text content types that Firefox won't display (#3708) 2024-02-19 05:11:16 -05:00
Abe Jellinek
2ef560f7d8 Extract ISBNs and DOIs from EPUB content (#64)
And move EPUB functionality to class.
2023-08-07 16:07:55 -04:00
Abe Jellinek
cab0fa93e7 Extend Retrieve Metadata to support EPUBs (#57) 2023-08-07 16:07:55 -04:00
Abe Jellinek
ba1b1b0639 Add EPUB format to Zotero.MIME 2023-08-07 16:07:52 -04:00
Abe Jellinek
2639981dda
Block remote content when indexing HTML file (#3157) 2023-06-12 23:43:18 -04:00
Dan Stillman
3ba78e28bb Update connectorTypeSchemaData and test data for Dataset Number removal 2023-04-27 03:44:25 -04:00
Tom Najdek
a6042d3958
Mendeley importer: Fix issue with empty tags (#3018)
Also adds a test for this particular case and for importing tags in
general.
2023-04-06 17:10:13 +02:00
Tom Najdek
96022847d7
Mendeley importer: Fix issue with empty creators (#3016)
It does not appear to be possible to create a creator with no values in
Mendeley, however we got reports of these causing the imports to fail.
This tweak makes the importer more resilient by discarding empty/invalid
creators.
2023-04-06 17:10:13 +02:00
Tom Najdek
4b523555d6
Mendeley Import: Auth using direct login
* Importer will now ask user for a login and password via form and will
  perform sign-in directly using credentials rather than oauth
* Signing in this way enables importer to obtain desktop document ID
  which is now stored for each item
* It's possible to switch back to the old method (ouath) by setting
  `import.mendeleyUseOAuth` pref to `true`.
* New option to only import new items. This options only appears if
  database contains previously imported items.
* Importer will now update mendeleyDB:documentUUID on existing items to
  match value used in Mendeley Desktop if available
* Importer will no longer create collections when no new items are
  imported * Importer will only report number of new items imported on
  re-import * Importer will now preserve dateAdded on re-import

Co-authored-by: Dan Stillman <dstillman@zotero.org>
2023-04-06 17:10:12 +02:00
Dan Stillman
ab2e163234 Update test sample data for authority/legislativeBody base mapping 2023-04-03 00:41:35 -04:00
Dan Stillman
e27c1b5335 Add Dataset and Standard item types
zotero/zotero-bits#22
zotero/zotero-bits#52
2023-04-01 16:34:43 -04:00
Dan Stillman
9869eb4bc8 Restore cell.csl for style tests
Removed in a28908f0b4
2022-10-27 05:07:46 -04:00
Adomas Venčkauskas
a28908f0b4 Change integration test citation style to APA and some refactoring 2022-10-25 14:50:05 +03:00
Tom Najdek
223f44fdfd Map annotation colors colors on import #2819 (#2822) 2022-09-12 19:58:21 -04:00
Abe Jellinek
1f9e518581 Duplicates Merge: Preserve embedded annotations (#2728) 2022-08-11 03:52:40 -04:00
Tom Najdek
141258d564 Fix a bug in regex extracting fields to "extra"
Because regex is built using a template string, \s* is actually escaped
into s*, i.e. literal "s" appearing 0 or more times. In most cases this
would mean that output can have spacing slightly off. In extreme case,
when identifier starts with letter "s", this could this could lead to
identifier being stored incorrectly.

Also adjusted tests to be more strict and mock data to cover this case.
2022-08-11 02:46:37 -04:00
Dan Stillman
65318a442e Fix support tests after 28ed3e34b 2022-08-11 02:29:44 -04:00
Dan Stillman
cb2594f53f Feed import: Don't fail on OPML entry with no title or text
https://forums.zotero.org/discussion/96841/impossible-dimporter-ompl-rss
2022-08-11 02:28:03 -04:00
Dan Stillman
bbdcb92042 Add missing test data files 2022-06-21 01:39:07 -04:00
Dan Stillman
df3e7a600e Update Zotero.File.getContentsAsync() tests 2022-06-19 18:59:01 -04:00
Dan Stillman
6a2949be8a fx-compat: Add HiddenBrowser.jsm
Remove Zotero.Browser and add HiddenBrowser.jsm. Post-Fission, web/file
content loads in a separate process, so it's not possible (as best as I
can tell) to directly access the contents of a hidden browser -- it just
appears as about:blank in the parent process. We now use Mozilla's
JSWindowActor mechanism [1] to get page data, including character set
and body text for full-text indexing. We'll have to evaluate other uses
of hidden browsers to see how to handle them.

This also adds include.jsm for loading the Zotero object into a JSM.

[1] https://firefox-source-docs.mozilla.org/dom/ipc/jsactors.html
2022-06-17 20:28:58 -04:00
Abe Jellinek
9829ea7009
Update utilities, move tests, add to CI (#2584) 2022-04-30 04:55:11 -04:00
Tom Najdek
56321a7a6d
Fix regression: download() fails for non-http URLs (#2497)
This fixes compatiblity with some addons that use Zotero.File.download
to extract files from their XPI bundle.
2022-03-30 12:15:40 -04:00
Tom Najdek
776769f480
Citavi import: Tweak how page label is determined (#2494)
Instead of attempting to extract `PageRange` value we now let pdf worker
always determine page label.

Also improved citavi tests and fixtures.
2022-03-30 09:34:07 -04:00
Abe Jellinek
ef82becf00
Merge attachments and update notes (#2336)
We follow a different merge procedure for each attachment type:

- For PDF attachments, compare by MD5. If no match, get the top 50 words
  in the attachment's text and hash those, then check again for a match.
  Update references to item keys in notes and annotations.
- For web (snapshot / link) attachments, compare by title and URL.
  Prefer a title + URL match but accept a title-only match.
- For other attachment types, keep all attachments from all items being
  merged.

Also:

- Move most merge tests from Duplicates to Items#merge(). It just doesn't
  make sense to worry about the UI in these.
2022-03-09 17:26:26 -05:00
Tom Najdek
1ad2056674
Add support for importing Citavi annotatons (#2351) 2022-03-09 04:06:44 -05:00
Dan Stillman
1ce47bc404 Add Preprint item type to additional sample data for tests 2022-03-05 07:23:50 -05:00
Dan Stillman
32fc1cad9c Add Preprint item type to sample data for tests 2022-03-05 06:03:44 -05:00
Tom Najdek
092459dbfc
Mendeley Import: Tests for group annotations
Extended Mendeley Import test to include a scenario where other users
attached an annotation to an item in a group library that also exists
in user's library.
2021-11-15 11:39:57 +01:00
Tom Najdek
7664fedf70
Mendeley Import: Test skipping mismatched annotations 2021-11-15 11:37:34 +01:00
Tom Najdek
f10649483e
Mendeley Import: Add more tests for the importer
Also rephrased a comment in the importer code and renamed tests file
to mendeleyImportTest.js for consistency.
2021-11-15 11:19:19 +01:00
Tom Najdek
7940915bb0
Fix arXiv ID not imported. Fix #2236. (#2238)
Mendeley online schema uses "arxiv", local DB uses "arxivId" hence it
was skipped. This commit adds mapping and a test.
2021-11-04 15:32:35 -04:00
Tom Najdek
882ecc205e
Mendeley import: Remove code to patch after earlier imports (#2234)
Fixes #2233, Mendeley import: Invalid-field-for-type error
2021-11-03 23:32:36 -04:00
Dan Stillman
eac98d1c2e Add test for 4.0 → 5.0 DB upgrade
With a mechanism for specifying a zipped DB copy to use as the initial
DB when resetting the DB in tests
2021-08-17 00:41:59 -04:00
J. Ryan Stinnett
bc4aafa8e4 Add feed reader tests for parsing behavior
This adds extra tests to check parsing behavior such as entities, tag handling,
CDATA, etc. This will help ensure the new feed processor matches the previous
behavior.
2021-06-16 20:59:57 +01:00
Dan Stillman
4b60c6ca27 Type/field handling overhaul
This changes the way item types, item fields, creator types, and CSL
mappings are defined and handled, in preparation for updated types and
fields.

Instead of being predefined in SQL files or code, type/field info is
read from a bundled JSON file shared with other parts of the Zotero
ecosystem [1], referred to as the "global schema". Updates to the
bundled schema file are automatically applied to the database at first
run, allowing changes to be made consistently across apps.

When syncing, invalid JSON properties are now rejected instead of being
ignored and processed later, which will allow for schema changes to be
made without causing problems in existing clients. We considered many
alternative approaches, but this approach is by far the simplest,
safest, and most transparent to the user.

For now, there are no actual changes to types and fields, since we'll
first need to do a sync cut-off for earlier versions that don't reject
invalid properties.

For third-party code, the main change is that type and field IDs should
no longer be hard-coded, since they may not be consistent in new
installs. For example, code should use `Zotero.ItemTypes.getID('note')`
instead of hard-coding `1`.

[1] https://github.com/zotero/zotero-schema
2019-09-16 02:27:22 -04:00
Frank Bennett
e618410eb2 Export CSL JSON with title-short rather than shortTitle 2019-04-04 00:45:32 +09:00
Dan Stillman
8f1f1f1fba Update itemFromCSLJSON test for podcast broadcast mapping
Follow up to bf4deeff8f
2019-03-21 02:20:38 -04:00
Dan Stillman
1061893998 "Attachment Content" search improvements
- Fix incorrect results for ANY search with multiple "Attachment
  Content" conditions and no other conditions
- Dramatically speed up single-word searches by avoiding unnecessary
  text scans (which probably addresses #1595)
- Clean up code
2019-02-19 04:10:25 -05:00
Dan Stillman
9c2d0d7272 Add skipped test for importing related items from Zotero RDF
This is hard to do currently because the natural place to do it (and
where the previous seeAlso stuff was done) is translate_item.js, but
with async import translators that now only gets one item at a time,
whereas saving item relations requires all items to be saved. So this
would probably need to be done in the import code in translate.js.

It might also require undoing
https://github.com/zotero/zotero/pull/453 so that getResourceURI() works
on notes and figuring out another solution for the problem that was
trying to solve.
2019-01-14 02:36:59 -05:00
Adomas Venčkauskas
4072d444e7 Ensure Test Import Translator.js #doImport() does not rely on #detectImport() 2018-12-21 15:16:27 +02:00
Dan Stillman
2a7f31813e Disable JS in hidden browser when indexing HTML files without a charset
This could cause imports that linked to HTML files to hang, possibly
from network requests that failed.
2018-06-18 20:19:02 -04:00
Martynas Bagdonas
c0a4fa43f0 Add a test for PDF recognition by DOI (#1496) 2018-05-04 03:14:26 -04:00
Dan Stillman
8f39e9cb36 Rename PDF recognizer tests to reflect arXiv ID lookup
Addresses #1494 and #1495
2018-05-04 01:16:04 -04:00
Dan Stillman
4f9847da04 Save parent item to correct library when recognizing PDF without DOI 2018-04-02 15:34:22 -04:00
Dan Stillman
676ab7852b Fix date parsing from Atom feeds
Use Atom namespace when getting fields, and use `<updated>` date before
`<published>`. (The dates are also available on the nsIFeedContainer
(`feedEntry`), but we're getting them directly from the fields for some
reason.)
2017-10-31 02:21:21 -04:00
Adomas Venčkauskas
269a250b4f Fetch a style if it is not installed on document preferences load 2017-04-10 11:24:22 +03:00