Commit graph

66 commits

Author SHA1 Message Date
Abe Jellinek
dd1601793c
Don't set default attachment title if not renaming file (#4459)
Except from Rename File from Parent Metadata.
2024-07-31 01:39:25 -04:00
Tom Najdek
00ae8bb9b2 Add more features to the file renaming functionality (#4424)
* New `attachmentTitle` field, returns the title of the current attachment (or
  the future title of the attachment being created)
* New function `match` to enable testing values with a regex.
* New function `start` to enable truncating from the beginning.
* Ignore new line characters in the template for easier editing.
* Avoid repeated characters when changing case (snake/dash)
* Increase the size of the template input field.

Closes #3252
2024-07-28 02:57:14 -04:00
Grace Dinh
29f4aece24 Add regex replace feature for file renaming (#3562) 2024-07-28 02:56:19 -04:00
Abe Jellinek
4187819cd1
File renaming: Match content type prefixes, add UI (#4431) 2024-07-27 03:01:43 -04:00
Abe Jellinek
7020d60351 Generalize Find Available PDF -> Find Full Text (#4397) 2024-07-27 02:11:22 -04:00
Tom Najdek
f227aeb6e0
File renaming: suppress duplicate suffixes #3317 (#4389) 2024-07-16 01:59:07 -04:00
Abe Jellinek
833ecca364
Set automatic titles in more or less all cases (#4369)
By moving the setAutoAttachmentTitle() calls to importFromFile() /
_addToDB().

Also:

- Chop off file extension when setting the parent's title based on the
  filename in Create Parent Item -> Manual Entry.
- Fix Manual Entry not renaming the attachment correctly by awaiting
  createEmptyParent().
2024-07-14 23:37:24 -04:00
Tom Najdek
1b751d675b
Trim spaces from values in getFileBaseNameFromItem (#3711) 2024-02-19 04:43:19 -05:00
Tom Najdek
abe8def0f1
Trim leading/trailing space in filename format string. Fix #3701 2024-02-16 10:17:03 +01:00
Adomas Ven
8b77c96e97
Displays a browser window to clear captcha when saving attachments. (#3526)
- Currently enabled only for ScienceDirect. Can be enabled via a whitelist
- Matches the HiddenBrowser loaded HTML page for a captcha element. If
  the captcha element class changes, this will break (but the
  alternative is potentially displaying a captcha clearing window when
  something else that is not a captcha guard is loaded).
- Captcha clear timeout for 60s.
- Doesn't automatically switch focus back to the browser which intiated
  the item save via the Connector.
- Stores the cookies used to clear the captcha for future saves from the
  same domain. Discards Connector supplied User Agent, since CF bot
  detector checks UA header against actual UA behavior like TLS handshake
  and if the UA acts different to what it's supposed to, the bot
  challenge is not cleared.

Other changes:
- Adjusted the cookie sandbox to allow multiple cookie sandboxes to be
  active (and simplified some legacy code that was meant to cover a bug
  in old FX codebase).
- HiddenBrowser API changed to be Object oriented, translator tester
  in the translate repo will need to be updated after a merge (have the
  change ready).
- Improved Connector Server attachment progress handling
2023-12-27 04:43:50 -05:00
Dan Stillman
fb96cd595d Add startHTTPServer() support function
Centralize httpd creation and add automatic retry to try to deal with
NS_ERROR_SOCKET_ADDRESS_IN_USE errors in CI.
2023-08-16 01:16:49 -04:00
Dan Stillman
69ba2310a2 Actually fix NS_ERROR_SOCKET_ADDRESS_IN_USE during tests 2023-08-09 08:05:30 -04:00
Dan Stillman
4ac3128b17 Use more ports for attachment tests
To try to avoid this stupid NS_ERROR_SOCKET_ADDRESS_IN_USE error
2023-08-07 17:36:19 -04:00
Dan Stillman
9b0ce9558c
attachmentRenameFormatStringattachmentRenameTemplate (#3249) 2023-08-04 05:58:15 -04:00
Dan Stillman
be1ab236c8 Better cycling through httpd.js ports to avoid CI failures 2023-07-26 07:25:22 -04:00
Abe Jellinek
676f820f87
Strip bidi control characters in filenames and elsewhere (#3208)
Passing unformatted = true to Item#getField() now returns a bidi control
character-less result, and we use that in Reader#updateTitle() and
getFileBaseNameFromItem() to prevent bidi control characters from showing up in
filenames and window titles (the former everywhere, the latter on Windows only).

We also strip bidi control characters in getValidFileName() to be extra safe.
2023-07-22 03:30:28 -04:00
Tom Najdek
0ba766f2e0
Customizable renaming rules #1413 (#2297) 2023-07-20 06:50:34 -04:00
Dan Stillman
14f7d3acad Cycle through httpd ports to prevent CI failures
We didn't seem to be doing this anymore, even though we had a comment
for it, and some tests were failing with NS_ERROR_SOCKET_ADDRESS_IN_USE.
2023-06-19 06:54:19 -04:00
Dan Stillman
0858960d33 Better logging for request count mismatches in Find Available PDF tests 2023-04-29 17:50:49 -04:00
Dan Stillman
2796e6c80a Fix attachment tests that depend on HTML indexing
HTML files are now indexed instead of read directly, and indexing was
previous skipped in tests and otherwise performed on a delay, so set a
flag in the affected tests that triggers inline indexing.
2023-04-15 00:24:35 -04:00
Abe Jellinek
0612a9e6f5 fx-compat: Run translation and SingleFile in [hidden] browser
And replace loadDocuments().
2023-04-14 11:44:44 -04:00
Dan Stillman
7ffc509ee6 Fix response content type in Find Available PDF test 2022-11-21 01:14:07 -05:00
Dan Stillman
c6df0b586c Use clearer name for Find Available PDF tests 2022-11-21 01:14:07 -05:00
Dan Stillman
b5862ba780 Handle relative PDF links when using custom PDF resolver 2022-11-21 01:14:07 -05:00
Dan Stillman
8e59e49d29 Avoid infinite/excessive loops in Find Available PDF
https://forums.zotero.org/discussion/100634/potential-infinite-loop-when-trying-to-find-available-pdf

Closes #2883
2022-10-30 04:44:31 -04:00
Dan Stillman
13adfd131c fx-compat: Update full-text indexing
Use the new PageData mechanism for character set detection, don't try to
index HTML files directly without properly detecting the charset, and
generally simplify the indexing code.

HTML files are now considered cached files that require indexing and
won't be indexed automatically in Zotero.FullText.findTextInItems(),
which breaks certain expectations, including in some tests. This will
need to be addressed.
2022-06-17 20:29:01 -04:00
Adomas Ven
4405b59044
Add a function to download PDFs via a browser (#2248)
Fixes zotero/translators#2739
2021-12-02 04:27:33 -05:00
Dan Stillman
0bc6b2ccc6 Transfer annotations when converting linked files to stored files
Previously, any annotations on the linked file were partially deleted,
leaving broken `items` rows without `itemAnnotations` rows.
2021-03-12 06:35:21 -05:00
Dan Stillman
4142f4b316 Replace occurrences of .getNote() with .note 2021-03-02 17:36:05 -05:00
Dan Stillman
d9cf53725a Make Find PDF test timing a bit more forgiving 2021-02-09 17:12:25 -05:00
Dan Stillman
4c048f6fd2 Relax HTML checking in SingleFile tests
Somehow the saved page starts with "<html style>" instead of "<html>" on
GitHub Actions
2021-01-18 23:06:13 -05:00
Fletcher Hazlehurst
a2620b757d Update SingleFile and fix several bugs
- Using `sandboxPrototype` properly uses window as prototype
- This commit removes the need for our patch in babel-worker.js:
3d0bc4cf9f
- We properly inject into frames in the client if we ever include frames
2020-11-02 17:24:14 -07:00
fletcherhaz
76ae5d9f59
Switch back to SingleFile from SingleFileZ (#1904)
Our SingleFileZ integration would save images inside directories following the
SingleFileZ format. However, Zotero does not support syncing sub-directories of
attachments. This commit switch back to a single HTML file with base64 encoded
resources. We think that the 33% increase in resources will be offset by the
compression of HTML and removal of JavaScript and unused CSS.

This commit does not fix past snapshots that were saved using SingleFileZ.
2020-10-23 19:39:07 -04:00
Fletcher Hazlehurst
bb8325ff9b Fix tests for new version of SingleFileZ 2020-10-13 11:08:23 -06:00
Fletcher Hazlehurst
1c5cefaffd Fix handling of network errors for SingleFile save 2020-09-28 10:43:32 -07:00
Fletcher Hazlehurst
0fba08b3c9 Use SingleFile to create snapshots of web pages 2020-09-23 09:37:09 -07:00
Dan Stillman
b2e902746a Strip HTML tags from titles when generating filenames 2020-06-02 17:06:42 -04:00
Dan Stillman
86a5c46b1e Find Available PDF: Don't mark URLs that redirect as tried
https://forums.zotero.org/discussion/81182
2020-02-02 23:47:40 -05:00
Dan Stillman
bb59429664 Add "Convert Linked Files to Stored Files…" menu option
In new File → Manage Attachments submenu

Closes #1637
2019-08-19 05:00:32 -04:00
Dan Stillman
2d71b13ce0 Fix some spurious failures in PDF retrieval test 2019-01-29 07:35:39 -05:00
Dan Stillman
6137aeddb8 Follow meta redirects for Find Available PDF
This fixes direct and VPN-based retrieval of PDFs for Elsevier (e.g.,
ScienceDirect) items that have a DOI but no URL, since Elsevier resolves
DOIs through an intermediate page.
2018-11-26 00:57:48 -07:00
Dan Stillman
4b81e03f28 Improve reliability of PDF retrieval delay tests 2018-10-09 19:14:59 -04:00
Dan Stillman
1b9811c31d Fix test failures after 18f79f9796 2018-10-06 01:38:32 -04:00
Dan Stillman
d899134e7c Automatically delay between PDF retrieval requests to the same domain
Delay requests to the same domain by 1 second, respect a Retry-After
header if present for 429 and 503, and delay for 10 seconds on 429 or
5xx otherwise.
2018-09-22 04:03:25 -04:00
Dan Stillman
7cf466a0b6 Save OA PDFs when the DOI resolves directly to the file 2018-09-06 16:44:11 -04:00
Dan Stillman
05d8e7a8a3 Check Extra field for DOIs for PDF retrieval
E.g., a book with a DOI in Extra

Closes #1551
2018-08-30 16:52:24 -04:00
Dan Stillman
ce5be0bc75 Automatically download open-access PDFs when saving via the connector
If there's no translated PDF or the translated PDF fails and the item
has a DOI, check Zotero's Unpaywall mirror for possible sources and try
to download one of those.

Unlike with "Add Item by Identifier" and "Find Available PDF" in the
item context menu, this does not try the DOI/URL page, since it would
result in more data leakage and most of the time you'd be saving from
the DOI page already. We could consider offering it as an option, but
for it to be useful, you'd have to have an institutional subscription,
be on-campus or connected via VPN (for now), and be saving from
somewhere other than the main page.

A new connector endpoint, sessionProgress, takes the place of
attachmentProgress. Unlike attachmentProgress, sessionProgress can show
new attachments that have been added to the save, and with a little more
work should also be able to show when a parent item has been recognized
for a directly saved PDF.

This also adds support for custom PDF resolvers, available to all PDF
retrieval methods. I'll document those separately.

Closes #1542
2018-08-16 00:57:22 -04:00
Dan Stillman
679a6d5cc7 PDF retrieval improvements
- Add the ability to extract a PDF URL from a given webpage using the
  translation framework
- Add the ability to get open-access PDFs from landing pages from
  Unpaywall data in addition to direct PDF URLs
- Use the above functionality to improve PDF retrieval for "Add Item by
  Identifier"
- Add "Find Available PDFs" option to the item context menu to retrieve
  PDFs for existing items from the DOI or URL page or using Unpaywall
  data. The option appears for single items with a DOI or URL and no PDF,
  and it always appears when selecting multiple top-level items (but
  skips ineligible items).

PDF extraction from DOI/URL pages will currently only work with
unauthenticated access (i.e., on-campus or VPN, but not via a web-based
proxy).

Supersedes and closes #948
2018-08-07 04:58:15 -04:00
Dan Stillman
99584dc918 Import base-directory-relative linked files
Zotero RDF contained 'attachments:' paths when files weren't included
but they weren't imported properly
2018-06-30 09:19:09 +02:00
Dan Stillman
2a7f31813e Disable JS in hidden browser when indexing HTML files without a charset
This could cause imports that linked to HTML files to hang, possibly
from network requests that failed.
2018-06-18 20:19:02 -04:00