* New `attachmentTitle` field, returns the title of the current attachment (or
the future title of the attachment being created)
* New function `match` to enable testing values with a regex.
* New function `start` to enable truncating from the beginning.
* Ignore new line characters in the template for easier editing.
* Avoid repeated characters when changing case (snake/dash)
* Increase the size of the template input field.
Closes#3252
By moving the setAutoAttachmentTitle() calls to importFromFile() /
_addToDB().
Also:
- Chop off file extension when setting the parent's title based on the
filename in Create Parent Item -> Manual Entry.
- Fix Manual Entry not renaming the attachment correctly by awaiting
createEmptyParent().
- Currently enabled only for ScienceDirect. Can be enabled via a whitelist
- Matches the HiddenBrowser loaded HTML page for a captcha element. If
the captcha element class changes, this will break (but the
alternative is potentially displaying a captcha clearing window when
something else that is not a captcha guard is loaded).
- Captcha clear timeout for 60s.
- Doesn't automatically switch focus back to the browser which intiated
the item save via the Connector.
- Stores the cookies used to clear the captcha for future saves from the
same domain. Discards Connector supplied User Agent, since CF bot
detector checks UA header against actual UA behavior like TLS handshake
and if the UA acts different to what it's supposed to, the bot
challenge is not cleared.
Other changes:
- Adjusted the cookie sandbox to allow multiple cookie sandboxes to be
active (and simplified some legacy code that was meant to cover a bug
in old FX codebase).
- HiddenBrowser API changed to be Object oriented, translator tester
in the translate repo will need to be updated after a merge (have the
change ready).
- Improved Connector Server attachment progress handling
Passing unformatted = true to Item#getField() now returns a bidi control
character-less result, and we use that in Reader#updateTitle() and
getFileBaseNameFromItem() to prevent bidi control characters from showing up in
filenames and window titles (the former everywhere, the latter on Windows only).
We also strip bidi control characters in getValidFileName() to be extra safe.
HTML files are now indexed instead of read directly, and indexing was
previous skipped in tests and otherwise performed on a delay, so set a
flag in the affected tests that triggers inline indexing.
Use the new PageData mechanism for character set detection, don't try to
index HTML files directly without properly detecting the charset, and
generally simplify the indexing code.
HTML files are now considered cached files that require indexing and
won't be indexed automatically in Zotero.FullText.findTextInItems(),
which breaks certain expectations, including in some tests. This will
need to be addressed.
- Using `sandboxPrototype` properly uses window as prototype
- This commit removes the need for our patch in babel-worker.js:
3d0bc4cf9f
- We properly inject into frames in the client if we ever include frames
Our SingleFileZ integration would save images inside directories following the
SingleFileZ format. However, Zotero does not support syncing sub-directories of
attachments. This commit switch back to a single HTML file with base64 encoded
resources. We think that the 33% increase in resources will be offset by the
compression of HTML and removal of JavaScript and unused CSS.
This commit does not fix past snapshots that were saved using SingleFileZ.
This fixes direct and VPN-based retrieval of PDFs for Elsevier (e.g.,
ScienceDirect) items that have a DOI but no URL, since Elsevier resolves
DOIs through an intermediate page.
Delay requests to the same domain by 1 second, respect a Retry-After
header if present for 429 and 503, and delay for 10 seconds on 429 or
5xx otherwise.
If there's no translated PDF or the translated PDF fails and the item
has a DOI, check Zotero's Unpaywall mirror for possible sources and try
to download one of those.
Unlike with "Add Item by Identifier" and "Find Available PDF" in the
item context menu, this does not try the DOI/URL page, since it would
result in more data leakage and most of the time you'd be saving from
the DOI page already. We could consider offering it as an option, but
for it to be useful, you'd have to have an institutional subscription,
be on-campus or connected via VPN (for now), and be saving from
somewhere other than the main page.
A new connector endpoint, sessionProgress, takes the place of
attachmentProgress. Unlike attachmentProgress, sessionProgress can show
new attachments that have been added to the save, and with a little more
work should also be able to show when a parent item has been recognized
for a directly saved PDF.
This also adds support for custom PDF resolvers, available to all PDF
retrieval methods. I'll document those separately.
Closes#1542
- Add the ability to extract a PDF URL from a given webpage using the
translation framework
- Add the ability to get open-access PDFs from landing pages from
Unpaywall data in addition to direct PDF URLs
- Use the above functionality to improve PDF retrieval for "Add Item by
Identifier"
- Add "Find Available PDFs" option to the item context menu to retrieve
PDFs for existing items from the DOI or URL page or using Unpaywall
data. The option appears for single items with a DOI or URL and no PDF,
and it always appears when selecting multiple top-level items (but
skips ineligible items).
PDF extraction from DOI/URL pages will currently only work with
unauthenticated access (i.e., on-campus or VPN, but not via a web-based
proxy).
Supersedes and closes#948