IncrementalVerifier moved to Utility.Hash, which will let Utility.Url
use it later.
It's perhaps not really specific to hashing, but making a separate
module just for the data type seemed unncessary.
Sponsored-by: Dartmouth College's DANDI project
This fixes the recent reversion that annex.verify is not honored,
because retrieveChunks was passed RemoteVerify baser, but baser
did not have export/import set up.
Sponsored-by: Dartmouth College's DANDI project
Added fileRetriever', which will let the remaining special remotes
eventually also support incremental verify.
Sponsored-by: Dartmouth College's DANDI project
As happens when using the directory special remote, gitlfs, webdav, and
S3. But not external, adb, gcrypt, hook, or rsync.
Sponsored-by: Dartmouth College's DANDI project
Now it's run in VerifyStage.
I thought about keeping the file handle open, and resuming reading where
tailVerify left off. But that risks leaking open file handles, until the
GC closes them, if the deferred verification does not get resumed. Since
that could perhaps happen if there's an exception somewhere, I decided
that was too unsafe.
Instead, re-open the file, seek, and resume.
Sponsored-by: Dartmouth College's DANDI project
Wait for the file to get modified, not only opened. This way, if a
remote does not support resuming, and opens a new file over top of the
existing file, it will wait until that remote starts writing, and open
the file it's writing to, not the old file.
Sponsored-by: Dartmouth College's DANDI project
I saw this:
.git/annex/tmp/SHA256E-s1234376--5ba8e06e0163b217663907482bbed57684d7188024155ddc81da0710dfd2687d: openBinaryFile: resource busy (file is locked)
guess catching IO exceptions did not catch that one.
It uses tailVerify to hash the file while it's being written.
This is able to sometimes avoid a separate checksum step. Although
if the file gets written quickly enough, tailVerify may not see it
get created before the write finishes, and the checksum still happens.
Testing with the directory special remote, incremental checksumming did
not happen. But then I disabled the copy CoW probing, and it did work.
What's going on with that is the CoW probe creates an empty file on
failure, then deletes it, and then the file is created again. tailVerify
will open the first, empty file, and so fails to read the content that
gets written to the file that replaces it.
The directory special remote really ought to be able to avoid needing to
use tailVerify, and while other special remotes could do things that
cause similar problems, they probably don't. And if they do, it just
means the checksum doesn't get done incrementally.
Sponsored-by: Dartmouth College's DANDI project
Not yet used, but this will let all remotes verify incrementally if it's
acceptable to pay the performance price. See comment for details of when
it will perform badly. I anticipate using this for all special remotes
that use fileRetriever. Except perhaps for a few like GitLFS that could
feed the incremental verifier themselves despite using that.
Sponsored-by: Dartmouth College's DANDI project
Simply feed each chunk in turn to the incremental verifier.
When resuming an interrupted retrieve, it does not do incremental
verification. That would need to read the file, up to the resume point,
and feed it to the incremental verifier. That seems easy to get wrong.
Also it would mean extra work done before the transfer can start. Which
would complicate displaying progress, and would perhaps not appear to the
user as if it was resuming from where it left off. Instead, in that
situation, return UnVerified, and let the verification be done in a
separate pass.
Granted, Annex.CopyFile does manage all that, but it's not complicated
by dealing with chunks too.
Sponsored-by: Dartmouth College's DANDI project