IncrementalVerifier moved to Utility.Hash, which will let Utility.Url
use it later.
It's perhaps not really specific to hashing, but making a separate
module just for the data type seemed unncessary.
Sponsored-by: Dartmouth College's DANDI project
Now it's run in VerifyStage.
I thought about keeping the file handle open, and resuming reading where
tailVerify left off. But that risks leaking open file handles, until the
GC closes them, if the deferred verification does not get resumed. Since
that could perhaps happen if there's an exception somewhere, I decided
that was too unsafe.
Instead, re-open the file, seek, and resume.
Sponsored-by: Dartmouth College's DANDI project
It uses tailVerify to hash the file while it's being written.
This is able to sometimes avoid a separate checksum step. Although
if the file gets written quickly enough, tailVerify may not see it
get created before the write finishes, and the checksum still happens.
Testing with the directory special remote, incremental checksumming did
not happen. But then I disabled the copy CoW probing, and it did work.
What's going on with that is the CoW probe creates an empty file on
failure, then deletes it, and then the file is created again. tailVerify
will open the first, empty file, and so fails to read the content that
gets written to the file that replaces it.
The directory special remote really ought to be able to avoid needing to
use tailVerify, and while other special remotes could do things that
cause similar problems, they probably don't. And if they do, it just
means the checksum doesn't get done incrementally.
Sponsored-by: Dartmouth College's DANDI project
As yet unused.
Backend.External could perhaps implement it too, although that would
involve sending chunks of data to it via a pipe or something, so likely
to be slow.
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
This is groundwork for external backends, but also makes sense to keep
this information with the rest of a Backend's implementation.
Also, removed isVerifiable. I noticed that the same information is
encoded by whether a Backend implements verifyKeyContent or not.
retrieveExport is part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.
getKey very rarely fails, and when it does it's always for the same reason
(user configured annex.backend to url for some reason). So, this will
avoid dealing with Nothing everywhere it's used.
This commit was sponsored by Ilya Shlyakhter on Patreon.
No behavior changes, but this shows everywhere that a progress meter
could be displayed when hashing a file to add to the annex.
Many of the places don't make sense to display a progress meter though,
eg when importing the copy of the file probably swamps the hashing of
the file.
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.
Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.
(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
Added annex.maxextensionlength for use cases where extensions longer than 4
characters are needed.
This commit was sponsored by Henrik Riomar on Patreon.
Where before the "name" of a key and a backend was a string, this makes
it a concrete data type.
This is groundwork for allowing some varieties of keys to be disabled
in file2key, so git-annex won't use them at all.
Benchmarks ran in my big repo:
old git-annex info:
real 0m3.338s
user 0m3.124s
sys 0m0.244s
new git-annex info:
real 0m3.216s
user 0m3.024s
sys 0m0.220s
new git-annex find:
real 0m7.138s
user 0m6.924s
sys 0m0.252s
old git-annex find:
real 0m7.433s
user 0m7.240s
sys 0m0.232s
Surprising result; I'd have expected it to be slower since it now parses
all the key varieties. But, the parser is very simple and perhaps
sharing KeyVarieties uses less memory or something like that.
This commit was supported by the NSF-funded DataLad project.
Needed for resuming from chunks.
Url keys are considered not stable. I considered treating url keys with a
known size as stable, but just don't feel that is enough information.
* Bugfix: Remove leading \ from checksums output by sha*sum commands,
when the filename contains \ or a newline. Closes: #696384
* fsck: Still accept checksums with a leading \ as valid, now that
above bug is fixed.
* migrate: Remove leading \ in checksums
Kqueue needs to remember which files failed to be added due to being open,
and retry them. This commit gets the data in place for such a retry thread.
Broke KeySource out into its own file, and added Eq and Ord instances
so it can be stored in a Set.
Fscking a remote is now supported. It's done by retrieving
the contents of the specified files from the remote, and checking them,
so can be an expensive operation.
(Several optimisations are possible, to speed it up, of course.. This is
the slow and stupid remote fsck to start with.)
Still, if the remote is a special remote, or a git repository that you
cannot run fsck in locally, it's nice to have the ability to fsck it.
If you have any directory special remotes, now would be a good time to
fsck them, in case you were hit by the data loss bug fixed in the
previous release!
The only remaining vestiage of backends is different types of keys. These
are still called "backends", mostly to avoid needing to change user interface
and configuration. But everything to do with storing keys in different
backends was gone; instead different types of remotes are used.
In the refactoring, lots of code was moved out of odd corners like
Backend.File, to closer to where it's used, like Command.Drop and
Command.Fsck. Quite a lot of dead code was removed. Several data structures
became simpler, which may result in better runtime efficiency. There should
be no user-visible changes.