Commit graph

41 commits

Author SHA1 Message Date
Dan Stillman
13adfd131c fx-compat: Update full-text indexing
Use the new PageData mechanism for character set detection, don't try to
index HTML files directly without properly detecting the charset, and
generally simplify the indexing code.

HTML files are now considered cached files that require indexing and
won't be indexed automatically in Zotero.FullText.findTextInItems(),
which breaks certain expectations, including in some tests. This will
need to be addressed.
2022-06-17 20:29:01 -04:00
Adomas Ven
4405b59044
Add a function to download PDFs via a browser (#2248)
Fixes zotero/translators#2739
2021-12-02 04:27:33 -05:00
Dan Stillman
0bc6b2ccc6 Transfer annotations when converting linked files to stored files
Previously, any annotations on the linked file were partially deleted,
leaving broken `items` rows without `itemAnnotations` rows.
2021-03-12 06:35:21 -05:00
Dan Stillman
4142f4b316 Replace occurrences of .getNote() with .note 2021-03-02 17:36:05 -05:00
Dan Stillman
d9cf53725a Make Find PDF test timing a bit more forgiving 2021-02-09 17:12:25 -05:00
Dan Stillman
4c048f6fd2 Relax HTML checking in SingleFile tests
Somehow the saved page starts with "<html style>" instead of "<html>" on
GitHub Actions
2021-01-18 23:06:13 -05:00
Fletcher Hazlehurst
a2620b757d Update SingleFile and fix several bugs
- Using `sandboxPrototype` properly uses window as prototype
- This commit removes the need for our patch in babel-worker.js:
3d0bc4cf9f
- We properly inject into frames in the client if we ever include frames
2020-11-02 17:24:14 -07:00
fletcherhaz
76ae5d9f59
Switch back to SingleFile from SingleFileZ (#1904)
Our SingleFileZ integration would save images inside directories following the
SingleFileZ format. However, Zotero does not support syncing sub-directories of
attachments. This commit switch back to a single HTML file with base64 encoded
resources. We think that the 33% increase in resources will be offset by the
compression of HTML and removal of JavaScript and unused CSS.

This commit does not fix past snapshots that were saved using SingleFileZ.
2020-10-23 19:39:07 -04:00
Fletcher Hazlehurst
bb8325ff9b Fix tests for new version of SingleFileZ 2020-10-13 11:08:23 -06:00
Fletcher Hazlehurst
1c5cefaffd Fix handling of network errors for SingleFile save 2020-09-28 10:43:32 -07:00
Fletcher Hazlehurst
0fba08b3c9 Use SingleFile to create snapshots of web pages 2020-09-23 09:37:09 -07:00
Dan Stillman
b2e902746a Strip HTML tags from titles when generating filenames 2020-06-02 17:06:42 -04:00
Dan Stillman
86a5c46b1e Find Available PDF: Don't mark URLs that redirect as tried
https://forums.zotero.org/discussion/81182
2020-02-02 23:47:40 -05:00
Dan Stillman
bb59429664 Add "Convert Linked Files to Stored Files…" menu option
In new File → Manage Attachments submenu

Closes #1637
2019-08-19 05:00:32 -04:00
Dan Stillman
2d71b13ce0 Fix some spurious failures in PDF retrieval test 2019-01-29 07:35:39 -05:00
Dan Stillman
6137aeddb8 Follow meta redirects for Find Available PDF
This fixes direct and VPN-based retrieval of PDFs for Elsevier (e.g.,
ScienceDirect) items that have a DOI but no URL, since Elsevier resolves
DOIs through an intermediate page.
2018-11-26 00:57:48 -07:00
Dan Stillman
4b81e03f28 Improve reliability of PDF retrieval delay tests 2018-10-09 19:14:59 -04:00
Dan Stillman
1b9811c31d Fix test failures after 18f79f9796 2018-10-06 01:38:32 -04:00
Dan Stillman
d899134e7c Automatically delay between PDF retrieval requests to the same domain
Delay requests to the same domain by 1 second, respect a Retry-After
header if present for 429 and 503, and delay for 10 seconds on 429 or
5xx otherwise.
2018-09-22 04:03:25 -04:00
Dan Stillman
7cf466a0b6 Save OA PDFs when the DOI resolves directly to the file 2018-09-06 16:44:11 -04:00
Dan Stillman
05d8e7a8a3 Check Extra field for DOIs for PDF retrieval
E.g., a book with a DOI in Extra

Closes #1551
2018-08-30 16:52:24 -04:00
Dan Stillman
ce5be0bc75 Automatically download open-access PDFs when saving via the connector
If there's no translated PDF or the translated PDF fails and the item
has a DOI, check Zotero's Unpaywall mirror for possible sources and try
to download one of those.

Unlike with "Add Item by Identifier" and "Find Available PDF" in the
item context menu, this does not try the DOI/URL page, since it would
result in more data leakage and most of the time you'd be saving from
the DOI page already. We could consider offering it as an option, but
for it to be useful, you'd have to have an institutional subscription,
be on-campus or connected via VPN (for now), and be saving from
somewhere other than the main page.

A new connector endpoint, sessionProgress, takes the place of
attachmentProgress. Unlike attachmentProgress, sessionProgress can show
new attachments that have been added to the save, and with a little more
work should also be able to show when a parent item has been recognized
for a directly saved PDF.

This also adds support for custom PDF resolvers, available to all PDF
retrieval methods. I'll document those separately.

Closes #1542
2018-08-16 00:57:22 -04:00
Dan Stillman
679a6d5cc7 PDF retrieval improvements
- Add the ability to extract a PDF URL from a given webpage using the
  translation framework
- Add the ability to get open-access PDFs from landing pages from
  Unpaywall data in addition to direct PDF URLs
- Use the above functionality to improve PDF retrieval for "Add Item by
  Identifier"
- Add "Find Available PDFs" option to the item context menu to retrieve
  PDFs for existing items from the DOI or URL page or using Unpaywall
  data. The option appears for single items with a DOI or URL and no PDF,
  and it always appears when selecting multiple top-level items (but
  skips ineligible items).

PDF extraction from DOI/URL pages will currently only work with
unauthenticated access (i.e., on-campus or VPN, but not via a web-based
proxy).

Supersedes and closes #948
2018-08-07 04:58:15 -04:00
Dan Stillman
99584dc918 Import base-directory-relative linked files
Zotero RDF contained 'attachments:' paths when files weren't included
but they weren't imported properly
2018-06-30 09:19:09 +02:00
Dan Stillman
2a7f31813e Disable JS in hidden browser when indexing HTML files without a charset
This could cause imports that linked to HTML files to hang, possibly
from network requests that failed.
2018-06-18 20:19:02 -04:00
Dan Stillman
7386b376f3 Fix linked attachment base directory handling at drive root
The first letter of the relative path was being removed on save if the
base directory was set to the drive root (e.g. D:\ instead of D:\foo).
2017-08-18 16:06:56 +02:00
Dan Stillman
80f888f374 Fix replacement of existing item storage directories
- Make Zotero.Attachments.createDirectoryForItem() delete existing
  directory instead of moving it to orphaned-files; also now returns a
  string path instead of an nsIFile
- Use above function during file sync instead of
  _deleteExistingAttachmentFiles(), which was partly broken
- Fix throwing on errors when saving some attachment types
2016-12-12 04:06:01 -05:00
Dan Stillman
b5b8f2cd2f Update test for fcb6e0c06 2016-06-02 16:37:26 -04:00
Dan Stillman
fcb6e0c068 Save snapshots via nsIWebBrowserPersist instead of WebPageDump
WPD code hasn't been updated in many years, and there was an issue with
document permissions in 5.0. We'll need to replace nsIWBP in Electron,
but this will do for now.

Attachments are opened using file:// URIs instead of
zotero://attachment, which is what Standalone does anyway. Ancient HTML
annotations and highlights won't be displayed anymore, but I'm not sure
they worked anyway, and it hasn't been possible to create them in years.
We might be able to write out existing annotations to notes.

iframes are skipped during saving, in an attempt to reduce the number of
junk ad files. JS can still cause problems with viewing, so we might
still want to either disable scripts or force the viewed page offline
(if such a thing is possible).

There might be issues with auxiliary filename length/characters during
cross-platform file syncing. (We modified the WPD code to shorten/clean
them.)
2016-06-02 16:14:29 -04:00
Dan Stillman
cb8b2bda1b Windows file path fixes
- Fix upgrading of Mozilla-style attachments/storage file paths on upgrade
  (requires re-upgrade)
- Save relative paths using forward slashes for consistency, and convert
  to platform-appropriate slashes on use
2016-05-09 02:30:00 -04:00
Dan Stillman
23e01fcefd Fix saving to My Library if Zotero pane hasn't been opened 2016-04-09 18:34:54 -04:00
Dan Stillman
74cf2a3c22 Fix hang on import that includes an HTML attachment
Closes #734, for the moment
2016-03-22 01:31:20 -04:00
Dan Stillman
ad0d6765d7 Fix Zotero.Attachments.linkFromDocument() 2016-02-11 02:54:52 -05:00
Dan Stillman
d9b5e17c9c Asyncify Zotero.Attachments.getNumFiles() and add hasMultipleFiles()
Latter is probably all that's needed
2015-09-22 04:11:31 -04:00
Dan Stillman
fc1137b769 Asyncify Zotero.Attachments.getTotalFileSize() 2015-09-22 04:11:30 -04:00
Dan Stillman
75bcfcb685 Clean up initialization of attachments tests 2015-06-01 20:23:21 -04:00
Dan Stillman
2154673dd3 Return a Zotero.Item from all Zotero.Attachments methods
These previously returned an itemID, but now that new saved items can be edited
without a refetch, they should just return the new item.

(Possible I missed a few spots where these are called.)
2015-05-29 05:33:54 -04:00
Dan Stillman
1e7c822ab0 Fix linked file creation 2015-05-29 01:09:24 -04:00
Dan Stillman
3fc09add3a Attachment fixes
Change all attachment functions to take parameter objects, including a
'collections' property to assign collections. (Previously, calling code
assigned collections separately, which required a nested transaction,
which is no longer possible.)

Fixes #723, Can't attach files by dragging
2015-05-23 04:43:37 -04:00
Dan Stillman
14d435b8d8 Closes #711, Remove support for nested transactions 2015-05-10 18:32:10 -04:00
Dan Stillman
0471a393eb Always return promise from Zotero.Attachments._postProcessFile() 2015-04-26 18:08:26 -04:00