Commit graph

40397 commits

Author SHA1 Message Date
Joey Hess
c4aba8e032
better handling of finishing up incomplete incremental verify
Now it's run in VerifyStage.

I thought about keeping the file handle open, and resuming reading where
tailVerify left off. But that risks leaking open file handles, until the
GC closes them, if the deferred verification does not get resumed. Since
that could perhaps happen if there's an exception somewhere, I decided
that was too unsafe.

Instead, re-open the file, seek, and resume.

Sponsored-by: Dartmouth College's DANDI project
2021-08-16 14:52:59 -04:00
Joey Hess
e0b7f391bd
improve tailVerify
Wait for the file to get modified, not only opened. This way, if a
remote does not support resuming, and opens a new file over top of the
existing file, it will wait until that remote starts writing, and open
the file it's writing to, not the old file.

Sponsored-by: Dartmouth College's DANDI project
2021-08-16 14:47:37 -04:00
Joey Hess
e46a7dff6f
fix windows build 2021-08-13 16:36:33 -04:00
Joey Hess
037bf68269
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-13 16:35:20 -04:00
Joey Hess
751242b55e
status update 2021-08-13 16:34:18 -04:00
Joey Hess
16dd3dd4ca
catch more exceptions
I saw this:

  .git/annex/tmp/SHA256E-s1234376--5ba8e06e0163b217663907482bbed57684d7188024155ddc81da0710dfd2687d: openBinaryFile: resource busy (file is locked)

 guess catching IO exceptions did not catch that one.
2021-08-13 16:16:46 -04:00
Joey Hess
dadbb510f6
incremental hashing for fileRetriever
It uses tailVerify to hash the file while it's being written.

This is able to sometimes avoid a separate checksum step. Although
if the file gets written quickly enough, tailVerify may not see it
get created before the write finishes, and the checksum still happens.

Testing with the directory special remote, incremental checksumming did
not happen. But then I disabled the copy CoW probing, and it did work.
What's going on with that is the CoW probe creates an empty file on
failure, then deletes it, and then the file is created again. tailVerify
will open the first, empty file, and so fails to read the content that
gets written to the file that replaces it.

The directory special remote really ought to be able to avoid needing to
use tailVerify, and while other special remotes could do things that
cause similar problems, they probably don't. And if they do, it just
means the checksum doesn't get done incrementally.

Sponsored-by: Dartmouth College's DANDI project
2021-08-13 15:43:29 -04:00
Joey Hess
ff2dc5eb18
INotify.removeWatch can crash
Unsure why, possibly if the file has been replaced by another file.
2021-08-13 15:35:18 -04:00
Joey Hess
7503b8448b
inotify reports paths relative to directory being watched
Sponsored-by: Dartmouth College's DANDI project
2021-08-13 14:51:15 -04:00
Joey Hess
e07625df8a
convert tailVerify to not finalize the verification
Added failIncremental so it can force failure to verify.

Sponsored-by: Dartmouth College's DANDI project
2021-08-13 13:39:02 -04:00
Joey Hess
9d533b347f
tailVerify: return deferred action when it gets behind
Sponsored-by: Dartmouth College's DANDI project
2021-08-13 12:32:01 -04:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb
41ef5da4e0 the fact that I needed a modification/patch to build mentioned 2021-08-13 03:42:10 +00:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb
3dc6c7a9a0 prop_view_roundtrips fails (occasionally) 2021-08-13 03:31:45 +00:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb
57884e5442 windows build fails as of 7550ef9a2 2021-08-13 02:17:50 +00:00
Joey Hess
7550ef9a2c
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-12 14:50:12 -04:00
Joey Hess
51d59fb260
comment 2021-08-12 14:49:48 -04:00
Joey Hess
b6efba8139
add tailVerify
Not yet used, but this will let all remotes verify incrementally if it's
acceptable to pay the performance price. See comment for details of when
it will perform badly. I anticipate using this for all special remotes
that use fileRetriever. Except perhaps for a few like GitLFS that could
feed the incremental verifier themselves despite using that.

Sponsored-by: Dartmouth College's DANDI project
2021-08-12 14:38:02 -04:00
yarikoptic
6318c0f27f a report on the flood of failing tests on discovery 2021-08-11 20:25:51 +00:00
Joey Hess
2e54564061
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-11 14:51:05 -04:00
jasonb@ab4484d9961a46440958fa1a528e0fc435599057
285026eb91 Added a comment: I have this behavior consistently on the 2 repos I use 2021-08-11 18:49:41 +00:00
Joey Hess
7eb3742e4b
incremental verify for chunked remotes
Simply feed each chunk in turn to the incremental verifier.

When resuming an interrupted retrieve, it does not do incremental
verification. That would need to read the file, up to the resume point,
and feed it to the incremental verifier. That seems easy to get wrong.
Also it would mean extra work done before the transfer can start. Which
would complicate displaying progress, and would perhaps not appear to the
user as if it was resuming from where it left off. Instead, in that
situation, return UnVerified, and let the verification be done in a
separate pass.

Granted, Annex.CopyFile does manage all that, but it's not complicated
by dealing with chunks too.

Sponsored-by: Dartmouth College's DANDI project
2021-08-11 14:42:49 -04:00
Lukey
e134f411d4 Added a comment 2021-08-11 18:25:51 +00:00
Joey Hess
c20358b671
incremental verify for byteRetriever special remotes
Several special remotes verify content while it is being retrieved,
avoiding a separate checksum pass. They are: S3, bup, ddar, and
gcrypt (with a local repository).

Not done when using chunking, yet.

Complicated by Retriever needing to change to be polymorphic. Which in turn
meant RankNTypes is needed, and also needed some code changes. The
change in Remote.External does not change behavior at all but avoids
the type checking failing because of a "rigid, skolem type" which
"would escape its scope". So I refactored slightly to make the type
checker's job easier there.

Unfortunately, directory uses fileRetriever (except when chunked),
so it is not amoung the improved ones. Fixing that would need a way for
FileRetriever to return a Verification. But, since the file retrieved
may be encrypted or chunked, it would be extra work to always
incrementally checksum the file while retrieving it. Hm.

Some other special remotes use fileRetriever, and so don't get incremental
verification, but could be converted to byteRetriever later. One is
GitLFS, which uses downloadConduit, which writes to the file, so could
verify as it goes. Other special remotes like web could too, but don't
use Remote.Helper.Special and so will need to be addressed separately.

Sponsored-by: Dartmouth College's DANDI project
2021-08-11 14:20:38 -04:00
gabrielhidasy@c3d26e2c0b3e669d012f06736616088b42ad0dbe
b9a9273a87 2021-08-11 16:29:37 +00:00
Joey Hess
9518aca2f5
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-11 12:16:49 -04:00
Joey Hess
fa62c98910
simplify and speed up Utility.FileSystemEncoding
This eliminates the distinction between decodeBS and decodeBS', encodeBS
and encodeBS', etc. The old implementation truncated at NUL, and the
primed versions had to do extra work to avoid that problem. The new
implementation does not truncate at NUL, and is also a lot faster.
(Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the
primed versions.)

Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation,
and upgrading to it will speed up to/fromRawFilePath.

AFAIK, nothing relied on the old behavior of truncating at NUL. Some
code used the faster versions in places where I was sure there would not
be a NUL. So this change is unlikely to break anything.

Also, moved s2w8 and w82s out of the module, as they do not involve
filesystem encoding really.

Sponsored-by: Shae Erisson on Patreon
2021-08-11 12:13:31 -04:00
Joey Hess
a38b724bfa
remove unused function 2021-08-10 20:04:17 -04:00
Ilya_Shlyakhter
2df44abad8 Added a comment: sorry 2021-08-10 16:28:33 +00:00
Joey Hess
d424f43116
comment 2021-08-09 16:00:57 -04:00
Joey Hess
885bbed2d4
sheeeeeeeeesh 2021-08-09 15:33:59 -04:00
Joey Hess
a331321d2a
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-09 15:20:19 -04:00
Joey Hess
a871bcfe77
simplify 2021-08-09 15:17:48 -04:00
yarikoptic
8ede4b606d Added a comment 2021-08-09 17:58:55 +00:00
yarikoptic
c7e4af1652 Added a comment 2021-08-09 17:34:39 +00:00
Ilya_Shlyakhter
c4b166aa17 Added a comment: standalone build version vs standard release version 2021-08-09 17:28:31 +00:00
Joey Hess
f54b9f2389
comment 2021-08-09 13:03:19 -04:00
Joey Hess
9d684e4dfa
response 2021-08-09 12:46:10 -04:00
Joey Hess
56fbf57e5f
typo 2021-08-09 12:44:20 -04:00
Joey Hess
5990942b6c
don't use changelog version in commit message
changelog may have a new unreleased version open already
2021-08-09 12:31:48 -04:00
Joey Hess
c9b1b7d067
close 2021-08-09 12:31:36 -04:00
Joey Hess
15ef5e62d2
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-09 12:11:47 -04:00
Joey Hess
f1176f82a5
rsync special remote: Stop displaying rsync progress, and use git-annex's own progress display
Reasons are same as in commit cee14f147a.
(It was already done when using -J.)

Sponsored-by: Mark Reidenbach on Patreon
2021-08-09 12:06:10 -04:00
alex
1801400bbb 2021-08-09 04:21:11 +00:00
jgsuess@732b8c62c50d8595d7b1d58eea11e5019c2308b1
251c24b388 Added a comment: Automatic watch for the heuristic 2021-08-07 12:25:02 +00:00
yarikoptic
4bebc46ce5 get failing to get if with --debug 2021-08-06 22:11:46 +00:00
yarikoptic
29fad2ec55 reporting on odds in downloads. 2021-08-06 21:53:42 +00:00
Lukey
768bcd18a7 Added a comment 2021-08-06 06:02:38 +00:00
Rob
649079413b Added a comment: creating directory special remote "in-place" 2021-08-05 18:13:36 +00:00
Joey Hess
c5abe37141
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-04 12:40:56 -04:00
Joey Hess
8886ff1cff
done! 2021-08-04 12:40:25 -04:00