Commit graph

40542 commits

Author SHA1 Message Date
Joey Hess
16dd3dd4ca
catch more exceptions
I saw this:

  .git/annex/tmp/SHA256E-s1234376--5ba8e06e0163b217663907482bbed57684d7188024155ddc81da0710dfd2687d: openBinaryFile: resource busy (file is locked)

 guess catching IO exceptions did not catch that one.
2021-08-13 16:16:46 -04:00
Joey Hess
dadbb510f6
incremental hashing for fileRetriever
It uses tailVerify to hash the file while it's being written.

This is able to sometimes avoid a separate checksum step. Although
if the file gets written quickly enough, tailVerify may not see it
get created before the write finishes, and the checksum still happens.

Testing with the directory special remote, incremental checksumming did
not happen. But then I disabled the copy CoW probing, and it did work.
What's going on with that is the CoW probe creates an empty file on
failure, then deletes it, and then the file is created again. tailVerify
will open the first, empty file, and so fails to read the content that
gets written to the file that replaces it.

The directory special remote really ought to be able to avoid needing to
use tailVerify, and while other special remotes could do things that
cause similar problems, they probably don't. And if they do, it just
means the checksum doesn't get done incrementally.

Sponsored-by: Dartmouth College's DANDI project
2021-08-13 15:43:29 -04:00
Joey Hess
ff2dc5eb18
INotify.removeWatch can crash
Unsure why, possibly if the file has been replaced by another file.
2021-08-13 15:35:18 -04:00
Joey Hess
7503b8448b
inotify reports paths relative to directory being watched
Sponsored-by: Dartmouth College's DANDI project
2021-08-13 14:51:15 -04:00
Joey Hess
e07625df8a
convert tailVerify to not finalize the verification
Added failIncremental so it can force failure to verify.

Sponsored-by: Dartmouth College's DANDI project
2021-08-13 13:39:02 -04:00
Joey Hess
9d533b347f
tailVerify: return deferred action when it gets behind
Sponsored-by: Dartmouth College's DANDI project
2021-08-13 12:32:01 -04:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb
41ef5da4e0 the fact that I needed a modification/patch to build mentioned 2021-08-13 03:42:10 +00:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb
3dc6c7a9a0 prop_view_roundtrips fails (occasionally) 2021-08-13 03:31:45 +00:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb
57884e5442 windows build fails as of 7550ef9a2 2021-08-13 02:17:50 +00:00
Joey Hess
7550ef9a2c
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-12 14:50:12 -04:00
Joey Hess
51d59fb260
comment 2021-08-12 14:49:48 -04:00
Joey Hess
b6efba8139
add tailVerify
Not yet used, but this will let all remotes verify incrementally if it's
acceptable to pay the performance price. See comment for details of when
it will perform badly. I anticipate using this for all special remotes
that use fileRetriever. Except perhaps for a few like GitLFS that could
feed the incremental verifier themselves despite using that.

Sponsored-by: Dartmouth College's DANDI project
2021-08-12 14:38:02 -04:00
yarikoptic
6318c0f27f a report on the flood of failing tests on discovery 2021-08-11 20:25:51 +00:00
Joey Hess
2e54564061
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-11 14:51:05 -04:00
jasonb@ab4484d9961a46440958fa1a528e0fc435599057
285026eb91 Added a comment: I have this behavior consistently on the 2 repos I use 2021-08-11 18:49:41 +00:00
Joey Hess
7eb3742e4b
incremental verify for chunked remotes
Simply feed each chunk in turn to the incremental verifier.

When resuming an interrupted retrieve, it does not do incremental
verification. That would need to read the file, up to the resume point,
and feed it to the incremental verifier. That seems easy to get wrong.
Also it would mean extra work done before the transfer can start. Which
would complicate displaying progress, and would perhaps not appear to the
user as if it was resuming from where it left off. Instead, in that
situation, return UnVerified, and let the verification be done in a
separate pass.

Granted, Annex.CopyFile does manage all that, but it's not complicated
by dealing with chunks too.

Sponsored-by: Dartmouth College's DANDI project
2021-08-11 14:42:49 -04:00
Lukey
e134f411d4 Added a comment 2021-08-11 18:25:51 +00:00
Joey Hess
c20358b671
incremental verify for byteRetriever special remotes
Several special remotes verify content while it is being retrieved,
avoiding a separate checksum pass. They are: S3, bup, ddar, and
gcrypt (with a local repository).

Not done when using chunking, yet.

Complicated by Retriever needing to change to be polymorphic. Which in turn
meant RankNTypes is needed, and also needed some code changes. The
change in Remote.External does not change behavior at all but avoids
the type checking failing because of a "rigid, skolem type" which
"would escape its scope". So I refactored slightly to make the type
checker's job easier there.

Unfortunately, directory uses fileRetriever (except when chunked),
so it is not amoung the improved ones. Fixing that would need a way for
FileRetriever to return a Verification. But, since the file retrieved
may be encrypted or chunked, it would be extra work to always
incrementally checksum the file while retrieving it. Hm.

Some other special remotes use fileRetriever, and so don't get incremental
verification, but could be converted to byteRetriever later. One is
GitLFS, which uses downloadConduit, which writes to the file, so could
verify as it goes. Other special remotes like web could too, but don't
use Remote.Helper.Special and so will need to be addressed separately.

Sponsored-by: Dartmouth College's DANDI project
2021-08-11 14:20:38 -04:00
gabrielhidasy@c3d26e2c0b3e669d012f06736616088b42ad0dbe
b9a9273a87 2021-08-11 16:29:37 +00:00
Joey Hess
9518aca2f5
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-11 12:16:49 -04:00
Joey Hess
fa62c98910
simplify and speed up Utility.FileSystemEncoding
This eliminates the distinction between decodeBS and decodeBS', encodeBS
and encodeBS', etc. The old implementation truncated at NUL, and the
primed versions had to do extra work to avoid that problem. The new
implementation does not truncate at NUL, and is also a lot faster.
(Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the
primed versions.)

Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation,
and upgrading to it will speed up to/fromRawFilePath.

AFAIK, nothing relied on the old behavior of truncating at NUL. Some
code used the faster versions in places where I was sure there would not
be a NUL. So this change is unlikely to break anything.

Also, moved s2w8 and w82s out of the module, as they do not involve
filesystem encoding really.

Sponsored-by: Shae Erisson on Patreon
2021-08-11 12:13:31 -04:00
Joey Hess
a38b724bfa
remove unused function 2021-08-10 20:04:17 -04:00
Ilya_Shlyakhter
2df44abad8 Added a comment: sorry 2021-08-10 16:28:33 +00:00
Joey Hess
d424f43116
comment 2021-08-09 16:00:57 -04:00
Joey Hess
885bbed2d4
sheeeeeeeeesh 2021-08-09 15:33:59 -04:00
Joey Hess
a331321d2a
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-09 15:20:19 -04:00
Joey Hess
a871bcfe77
simplify 2021-08-09 15:17:48 -04:00
yarikoptic
8ede4b606d Added a comment 2021-08-09 17:58:55 +00:00
yarikoptic
c7e4af1652 Added a comment 2021-08-09 17:34:39 +00:00
Ilya_Shlyakhter
c4b166aa17 Added a comment: standalone build version vs standard release version 2021-08-09 17:28:31 +00:00
Joey Hess
f54b9f2389
comment 2021-08-09 13:03:19 -04:00
Joey Hess
9d684e4dfa
response 2021-08-09 12:46:10 -04:00
Joey Hess
56fbf57e5f
typo 2021-08-09 12:44:20 -04:00
Joey Hess
5990942b6c
don't use changelog version in commit message
changelog may have a new unreleased version open already
2021-08-09 12:31:48 -04:00
Joey Hess
c9b1b7d067
close 2021-08-09 12:31:36 -04:00
Joey Hess
15ef5e62d2
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-09 12:11:47 -04:00
Joey Hess
f1176f82a5
rsync special remote: Stop displaying rsync progress, and use git-annex's own progress display
Reasons are same as in commit cee14f147a.
(It was already done when using -J.)

Sponsored-by: Mark Reidenbach on Patreon
2021-08-09 12:06:10 -04:00
alex
1801400bbb 2021-08-09 04:21:11 +00:00
jgsuess@732b8c62c50d8595d7b1d58eea11e5019c2308b1
251c24b388 Added a comment: Automatic watch for the heuristic 2021-08-07 12:25:02 +00:00
yarikoptic
4bebc46ce5 get failing to get if with --debug 2021-08-06 22:11:46 +00:00
yarikoptic
29fad2ec55 reporting on odds in downloads. 2021-08-06 21:53:42 +00:00
Lukey
768bcd18a7 Added a comment 2021-08-06 06:02:38 +00:00
Rob
649079413b Added a comment: creating directory special remote "in-place" 2021-08-05 18:13:36 +00:00
Joey Hess
c5abe37141
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-04 12:40:56 -04:00
Joey Hess
8886ff1cff
done! 2021-08-04 12:40:25 -04:00
Joey Hess
9b9b5759b0
Merge branch 'vectorclock' 2021-08-04 12:39:54 -04:00
Joey Hess
1acdd18ea8
deal better with clock skew situations, using vector clocks
* Deal with clock skew, both forwards and backwards, when logging
  information to the git-annex branch.
* GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1)
  rather than needing to be advanced each time a new change is made.
* Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex.

When changing a file in the git-annex branch, the vector clock to use is now
determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK
when set), and comparing it to the newest vector clock already in use in
that file. If a newer time stamp was already in use, advance it forward by
a second instead.

When the clock is set to a time in the past, this avoids logging with
an old timestamp, which would risk that log line later being ignored in favor
of "newer" line that is really not newer.

When a log entry has been made with a clock that was set far ahead in the
future, this avoids newer information being logged with an older timestamp
and so being ignored in favor of that future-timestamped information.
Once all clocks get fixed, this will result in the vector clocks being
incremented, until finally enough time has passed that time gets back ahead
of the vector clock value, and then it will return to usual operation.

(This latter situation is not ideal, but it seems the best that can be done.
The issue with it is, since all writers will be incrementing the last
vector clock they saw, there's no way to tell when one writer made a write
significantly later in time than another, so the earlier write might
arbitrarily be picked when merging. This problem is why git-annex uses
timestamps in the first place, rather than pure vector clocks.)

Advancing forward by 1 second is somewhat arbitrary. setDead
advances a timestamp by just 1 picosecond, and the vector clock could
too. But then it would interfere with setDead, which wants to be
overrulled by any change. So it could use 2 picoseconds or something,
but that seems weird. It could just as well advance it forward by a
minute or whatever, but then it would be harder for real time to catch
up with the vector clock when forward clock slew had happened.

A complication is that many log files contain several different peices of
information, and it may be best to only use vector clocks for the same peice
of information. For example, a key's location log file contains
InfoPresent/InfoMissing for each UUID, and it only looks at the vector
clocks for the UUID that is being changed, and not other UUIDs.

Although exactly where the dividing line is can be hard to determine.
Consider metadata logs, where a field "tag" can have multiple values set
at different times. Should it advance forward past the last tag?
Probably. What about when a different field is set, should it look at
the clocks of other fields? Perhaps not, but currently it does, and
this does not seems like it will cause any problems.

Another one I'm not entirely sure about is the export log, which is
keyed by (fromuuid, touuid). So if multiple repos are exporting to the
same remote, different vector clocks can be used for that remote.
It looks like that's probably ok, because it does not try to determine
what order things occurred when there was an export conflict.

Sponsored-by: Jochen Bartl on Patreon
2021-08-04 12:33:46 -04:00
Ilya_Shlyakhter
af0fcf81a1 Added a comment: downloading torrent files to annex 2021-08-04 15:47:12 +00:00
Joey Hess
1a48d51e5b
devblog 2021-08-03 17:14:06 -04:00
Joey Hess
5c23489859
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-03 17:06:27 -04:00