- Add the ability to extract a PDF URL from a given webpage using the
translation framework
- Add the ability to get open-access PDFs from landing pages from
Unpaywall data in addition to direct PDF URLs
- Use the above functionality to improve PDF retrieval for "Add Item by
Identifier"
- Add "Find Available PDFs" option to the item context menu to retrieve
PDFs for existing items from the DOI or URL page or using Unpaywall
data. The option appears for single items with a DOI or URL and no PDF,
and it always appears when selecting multiple top-level items (but
skips ineligible items).
PDF extraction from DOI/URL pages will currently only work with
unauthenticated access (i.e., on-campus or VPN, but not via a web-based
proxy).
Supersedes and closes#948
Handle an array of objects with 'url' and 'version' rather than just an
array of URLs.
Also:
- Don't throw an error from addOpenAccessPDF() if there's an error from
getOpenAccessPDFURLs()
- Make addPDFFromURLs() a separate function so URL lookup can be done
separately from download
Previously the handler would be called even on error pages, which often
meant that an import translator (e.g., BibTeX) would fail to find
anything on the page and the save popup would just close silently. The
popup will now show an error message as soon as the error occurs.
At some point Mendeley seems to have changed the default path to the
data directory on Windows to remove the period, and for people with the
old directory we were linking rather than storing attachment files from
"Downloaded".
E.g., if a .pdf is really an HTML file, we try to load it in a hidden
browser (because we properly detect the content type), but then the .pdf
extension causes the hidden browser to launch it via the OS and the
hidden browser never finishes loading it. This adds a 5-second timeout
to abort the process.
When the associated-files pref is enabled, Add Item by Identifier uses a
Zotero Unpaywall mirror to find available open-access PDFs. No details
about the contents of searches are logged.
For each PDF with an associated URL in the Downloaded directory, we were
copying all files in the directory (!) to the attachment's storage
directory. (Zotero imports always have files in separate directories,
and this was a function used to save both single files and HTML
snapshots.)
We'll clean up the extra files in a separate step.