By moving the setAutoAttachmentTitle() calls to importFromFile() /
_addToDB().
Also:
- Chop off file extension when setting the parent's title based on the
filename in Create Parent Item -> Manual Entry.
- Fix Manual Entry not renaming the attachment correctly by awaiting
createEmptyParent().
- Currently enabled only for ScienceDirect. Can be enabled via a whitelist
- Matches the HiddenBrowser loaded HTML page for a captcha element. If
the captcha element class changes, this will break (but the
alternative is potentially displaying a captcha clearing window when
something else that is not a captcha guard is loaded).
- Captcha clear timeout for 60s.
- Doesn't automatically switch focus back to the browser which intiated
the item save via the Connector.
- Stores the cookies used to clear the captcha for future saves from the
same domain. Discards Connector supplied User Agent, since CF bot
detector checks UA header against actual UA behavior like TLS handshake
and if the UA acts different to what it's supposed to, the bot
challenge is not cleared.
Other changes:
- Adjusted the cookie sandbox to allow multiple cookie sandboxes to be
active (and simplified some legacy code that was meant to cover a bug
in old FX codebase).
- HiddenBrowser API changed to be Object oriented, translator tester
in the translate repo will need to be updated after a merge (have the
change ready).
- Improved Connector Server attachment progress handling
Passing unformatted = true to Item#getField() now returns a bidi control
character-less result, and we use that in Reader#updateTitle() and
getFileBaseNameFromItem() to prevent bidi control characters from showing up in
filenames and window titles (the former everywhere, the latter on Windows only).
We also strip bidi control characters in getValidFileName() to be extra safe.
HTML files are now indexed instead of read directly, and indexing was
previous skipped in tests and otherwise performed on a delay, so set a
flag in the affected tests that triggers inline indexing.
Use the new PageData mechanism for character set detection, don't try to
index HTML files directly without properly detecting the charset, and
generally simplify the indexing code.
HTML files are now considered cached files that require indexing and
won't be indexed automatically in Zotero.FullText.findTextInItems(),
which breaks certain expectations, including in some tests. This will
need to be addressed.
- Using `sandboxPrototype` properly uses window as prototype
- This commit removes the need for our patch in babel-worker.js:
3d0bc4cf9f
- We properly inject into frames in the client if we ever include frames
Our SingleFileZ integration would save images inside directories following the
SingleFileZ format. However, Zotero does not support syncing sub-directories of
attachments. This commit switch back to a single HTML file with base64 encoded
resources. We think that the 33% increase in resources will be offset by the
compression of HTML and removal of JavaScript and unused CSS.
This commit does not fix past snapshots that were saved using SingleFileZ.
This fixes direct and VPN-based retrieval of PDFs for Elsevier (e.g.,
ScienceDirect) items that have a DOI but no URL, since Elsevier resolves
DOIs through an intermediate page.
Delay requests to the same domain by 1 second, respect a Retry-After
header if present for 429 and 503, and delay for 10 seconds on 429 or
5xx otherwise.
If there's no translated PDF or the translated PDF fails and the item
has a DOI, check Zotero's Unpaywall mirror for possible sources and try
to download one of those.
Unlike with "Add Item by Identifier" and "Find Available PDF" in the
item context menu, this does not try the DOI/URL page, since it would
result in more data leakage and most of the time you'd be saving from
the DOI page already. We could consider offering it as an option, but
for it to be useful, you'd have to have an institutional subscription,
be on-campus or connected via VPN (for now), and be saving from
somewhere other than the main page.
A new connector endpoint, sessionProgress, takes the place of
attachmentProgress. Unlike attachmentProgress, sessionProgress can show
new attachments that have been added to the save, and with a little more
work should also be able to show when a parent item has been recognized
for a directly saved PDF.
This also adds support for custom PDF resolvers, available to all PDF
retrieval methods. I'll document those separately.
Closes#1542
- Add the ability to extract a PDF URL from a given webpage using the
translation framework
- Add the ability to get open-access PDFs from landing pages from
Unpaywall data in addition to direct PDF URLs
- Use the above functionality to improve PDF retrieval for "Add Item by
Identifier"
- Add "Find Available PDFs" option to the item context menu to retrieve
PDFs for existing items from the DOI or URL page or using Unpaywall
data. The option appears for single items with a DOI or URL and no PDF,
and it always appears when selecting multiple top-level items (but
skips ineligible items).
PDF extraction from DOI/URL pages will currently only work with
unauthenticated access (i.e., on-campus or VPN, but not via a web-based
proxy).
Supersedes and closes#948
- Make Zotero.Attachments.createDirectoryForItem() delete existing
directory instead of moving it to orphaned-files; also now returns a
string path instead of an nsIFile
- Use above function during file sync instead of
_deleteExistingAttachmentFiles(), which was partly broken
- Fix throwing on errors when saving some attachment types
WPD code hasn't been updated in many years, and there was an issue with
document permissions in 5.0. We'll need to replace nsIWBP in Electron,
but this will do for now.
Attachments are opened using file:// URIs instead of
zotero://attachment, which is what Standalone does anyway. Ancient HTML
annotations and highlights won't be displayed anymore, but I'm not sure
they worked anyway, and it hasn't been possible to create them in years.
We might be able to write out existing annotations to notes.
iframes are skipped during saving, in an attempt to reduce the number of
junk ad files. JS can still cause problems with viewing, so we might
still want to either disable scripts or force the viewed page offline
(if such a thing is possible).
There might be issues with auxiliary filename length/characters during
cross-platform file syncing. (We modified the WPD code to shorten/clean
them.)
- Fix upgrading of Mozilla-style attachments/storage file paths on upgrade
(requires re-upgrade)
- Save relative paths using forward slashes for consistency, and convert
to platform-appropriate slashes on use