incremental verification for web special remote

Except when configuration makes curl be used. It did not seem worth
trying to tail the file when curl is downloading.

But when an interrupted download is resumed, it does not read the whole
existing file to hash it. Same reason discussed in
commit 7eb3742e4b; that could take a long
time with no progress being displayed. And also there's an open http
request, which needs to be consumed; taking a long time to hash the file
might cause it to time out.

Also in passing implemented it for git and external special remotes when
downloading from the web. Several others like S3 are within striking
distance now as well.

Sponsored-by: Dartmouth College's DANDI project
This commit is contained in:
Joey Hess 2021-08-18 14:49:01 -04:00
parent 88b63a43fa
commit d154e7022e
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
15 changed files with 101 additions and 67 deletions

View file

@ -5,24 +5,14 @@
content="""
The concurrency problem is fixed now.
Directory and webdav now also do incremental hashing.
Directory and webdav and web now also do incremental hashing.
There seems to have been a reversion in annex.verify handling;
I'm seeing directory do incremental hashing even when annex.verify is
false. Noticed while benchmarking it to see how much incremental hashing
sped it up. Seems that in Remote.Helper.Special, it uses
RemoteVerify baser, but when shouldVerify checks that value, it
sees that Types.Remote.isExportSupported is true. Despite the remote
not actually being an export remote. Because adjustExportImport gets
run after that point, I think.. (update: this is fixed)
As well as the web special remote, these do not do incremental hashing
These do not do incremental hashing
still: gitlfs, S3, httpalso. Problem is, these open the file
for write. That prevents tailVerify re-opening it for read, because the
haskell RTS actually does not allowing opening a file for read that it has
open for write. The new `fileRetriever\`` can be used instead to fix these,
but will take some more work. Also, the git remote, when accessing a
repository over http does not do incremental hashing.
but will take some more work.
Also, retrieval from export/import special remotes does not do incremental
hashing (except for versioned ones, which sometimes use retrieveKeyFile).