git-annex

Author	SHA1	Message	Date
Joey Hess	16dd3dd4ca	catch more exceptions I saw this: .git/annex/tmp/SHA256E-s1234376--5ba8e06e0163b217663907482bbed57684d7188024155ddc81da0710dfd2687d: openBinaryFile: resource busy (file is locked) guess catching IO exceptions did not catch that one.	2021-08-13 16:16:46 -04:00
Joey Hess	dadbb510f6	incremental hashing for fileRetriever It uses tailVerify to hash the file while it's being written. This is able to sometimes avoid a separate checksum step. Although if the file gets written quickly enough, tailVerify may not see it get created before the write finishes, and the checksum still happens. Testing with the directory special remote, incremental checksumming did not happen. But then I disabled the copy CoW probing, and it did work. What's going on with that is the CoW probe creates an empty file on failure, then deletes it, and then the file is created again. tailVerify will open the first, empty file, and so fails to read the content that gets written to the file that replaces it. The directory special remote really ought to be able to avoid needing to use tailVerify, and while other special remotes could do things that cause similar problems, they probably don't. And if they do, it just means the checksum doesn't get done incrementally. Sponsored-by: Dartmouth College's DANDI project	2021-08-13 15:43:29 -04:00
Joey Hess	ff2dc5eb18	INotify.removeWatch can crash Unsure why, possibly if the file has been replaced by another file.	2021-08-13 15:35:18 -04:00
Joey Hess	7503b8448b	inotify reports paths relative to directory being watched Sponsored-by: Dartmouth College's DANDI project	2021-08-13 14:51:15 -04:00
Joey Hess	e07625df8a	convert tailVerify to not finalize the verification Added failIncremental so it can force failure to verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-13 13:39:02 -04:00
Joey Hess	9d533b347f	tailVerify: return deferred action when it gets behind Sponsored-by: Dartmouth College's DANDI project	2021-08-13 12:32:01 -04:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb	41ef5da4e0	the fact that I needed a modification/patch to build mentioned	2021-08-13 03:42:10 +00:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb	3dc6c7a9a0	prop_view_roundtrips fails (occasionally)	2021-08-13 03:31:45 +00:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb	57884e5442	windows build fails as of `7550ef9a2`	2021-08-13 02:17:50 +00:00
Joey Hess	7550ef9a2c	Merge branch 'master' of ssh://git-annex.branchable.com	2021-08-12 14:50:12 -04:00
Joey Hess	51d59fb260	comment	2021-08-12 14:49:48 -04:00
Joey Hess	b6efba8139	add tailVerify Not yet used, but this will let all remotes verify incrementally if it's acceptable to pay the performance price. See comment for details of when it will perform badly. I anticipate using this for all special remotes that use fileRetriever. Except perhaps for a few like GitLFS that could feed the incremental verifier themselves despite using that. Sponsored-by: Dartmouth College's DANDI project	2021-08-12 14:38:02 -04:00
yarikoptic	6318c0f27f	a report on the flood of failing tests on discovery	2021-08-11 20:25:51 +00:00
Joey Hess	2e54564061	Merge branch 'master' of ssh://git-annex.branchable.com	2021-08-11 14:51:05 -04:00
jasonb@ab4484d9961a46440958fa1a528e0fc435599057	285026eb91	Added a comment: I have this behavior consistently on the 2 repos I use	2021-08-11 18:49:41 +00:00
Joey Hess	7eb3742e4b	incremental verify for chunked remotes Simply feed each chunk in turn to the incremental verifier. When resuming an interrupted retrieve, it does not do incremental verification. That would need to read the file, up to the resume point, and feed it to the incremental verifier. That seems easy to get wrong. Also it would mean extra work done before the transfer can start. Which would complicate displaying progress, and would perhaps not appear to the user as if it was resuming from where it left off. Instead, in that situation, return UnVerified, and let the verification be done in a separate pass. Granted, Annex.CopyFile does manage all that, but it's not complicated by dealing with chunks too. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:42:49 -04:00
Lukey	e134f411d4	Added a comment	2021-08-11 18:25:51 +00:00
Joey Hess	c20358b671	incremental verify for byteRetriever special remotes Several special remotes verify content while it is being retrieved, avoiding a separate checksum pass. They are: S3, bup, ddar, and gcrypt (with a local repository). Not done when using chunking, yet. Complicated by Retriever needing to change to be polymorphic. Which in turn meant RankNTypes is needed, and also needed some code changes. The change in Remote.External does not change behavior at all but avoids the type checking failing because of a "rigid, skolem type" which "would escape its scope". So I refactored slightly to make the type checker's job easier there. Unfortunately, directory uses fileRetriever (except when chunked), so it is not amoung the improved ones. Fixing that would need a way for FileRetriever to return a Verification. But, since the file retrieved may be encrypted or chunked, it would be extra work to always incrementally checksum the file while retrieving it. Hm. Some other special remotes use fileRetriever, and so don't get incremental verification, but could be converted to byteRetriever later. One is GitLFS, which uses downloadConduit, which writes to the file, so could verify as it goes. Other special remotes like web could too, but don't use Remote.Helper.Special and so will need to be addressed separately. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:20:38 -04:00
gabrielhidasy@c3d26e2c0b3e669d012f06736616088b42ad0dbe	b9a9273a87		2021-08-11 16:29:37 +00:00
Joey Hess	9518aca2f5	Merge branch 'master' of ssh://git-annex.branchable.com	2021-08-11 12:16:49 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	a38b724bfa	remove unused function	2021-08-10 20:04:17 -04:00
Ilya_Shlyakhter	2df44abad8	Added a comment: sorry	2021-08-10 16:28:33 +00:00
Joey Hess	d424f43116	comment	2021-08-09 16:00:57 -04:00
Joey Hess	885bbed2d4	sheeeeeeeeesh	2021-08-09 15:33:59 -04:00
Joey Hess	a331321d2a	Merge branch 'master' of ssh://git-annex.branchable.com	2021-08-09 15:20:19 -04:00
Joey Hess	a871bcfe77	simplify	2021-08-09 15:17:48 -04:00
yarikoptic	8ede4b606d	Added a comment	2021-08-09 17:58:55 +00:00
yarikoptic	c7e4af1652	Added a comment	2021-08-09 17:34:39 +00:00
Ilya_Shlyakhter	c4b166aa17	Added a comment: standalone build version vs standard release version	2021-08-09 17:28:31 +00:00
Joey Hess	f54b9f2389	comment	2021-08-09 13:03:19 -04:00
Joey Hess	9d684e4dfa	response	2021-08-09 12:46:10 -04:00
Joey Hess	56fbf57e5f	typo	2021-08-09 12:44:20 -04:00
Joey Hess	5990942b6c	don't use changelog version in commit message changelog may have a new unreleased version open already	2021-08-09 12:31:48 -04:00
Joey Hess	c9b1b7d067	close	2021-08-09 12:31:36 -04:00
Joey Hess	15ef5e62d2	Merge branch 'master' of ssh://git-annex.branchable.com	2021-08-09 12:11:47 -04:00
Joey Hess	f1176f82a5	rsync special remote: Stop displaying rsync progress, and use git-annex's own progress display Reasons are same as in commit `cee14f147a`. (It was already done when using -J.) Sponsored-by: Mark Reidenbach on Patreon	2021-08-09 12:06:10 -04:00
alex	1801400bbb		2021-08-09 04:21:11 +00:00
jgsuess@732b8c62c50d8595d7b1d58eea11e5019c2308b1	251c24b388	Added a comment: Automatic watch for the heuristic	2021-08-07 12:25:02 +00:00
yarikoptic	4bebc46ce5	get failing to get if with --debug	2021-08-06 22:11:46 +00:00
yarikoptic	29fad2ec55	reporting on odds in downloads.	2021-08-06 21:53:42 +00:00
Lukey	768bcd18a7	Added a comment	2021-08-06 06:02:38 +00:00
Rob	649079413b	Added a comment: creating directory special remote "in-place"	2021-08-05 18:13:36 +00:00
Joey Hess	c5abe37141	Merge branch 'master' of ssh://git-annex.branchable.com	2021-08-04 12:40:56 -04:00
Joey Hess	8886ff1cff	done!	2021-08-04 12:40:25 -04:00
Joey Hess	9b9b5759b0	Merge branch 'vectorclock'	2021-08-04 12:39:54 -04:00
Joey Hess	1acdd18ea8	deal better with clock skew situations, using vector clocks * Deal with clock skew, both forwards and backwards, when logging information to the git-annex branch. * GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1) rather than needing to be advanced each time a new change is made. * Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex. When changing a file in the git-annex branch, the vector clock to use is now determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK when set), and comparing it to the newest vector clock already in use in that file. If a newer time stamp was already in use, advance it forward by a second instead. When the clock is set to a time in the past, this avoids logging with an old timestamp, which would risk that log line later being ignored in favor of "newer" line that is really not newer. When a log entry has been made with a clock that was set far ahead in the future, this avoids newer information being logged with an older timestamp and so being ignored in favor of that future-timestamped information. Once all clocks get fixed, this will result in the vector clocks being incremented, until finally enough time has passed that time gets back ahead of the vector clock value, and then it will return to usual operation. (This latter situation is not ideal, but it seems the best that can be done. The issue with it is, since all writers will be incrementing the last vector clock they saw, there's no way to tell when one writer made a write significantly later in time than another, so the earlier write might arbitrarily be picked when merging. This problem is why git-annex uses timestamps in the first place, rather than pure vector clocks.) Advancing forward by 1 second is somewhat arbitrary. setDead advances a timestamp by just 1 picosecond, and the vector clock could too. But then it would interfere with setDead, which wants to be overrulled by any change. So it could use 2 picoseconds or something, but that seems weird. It could just as well advance it forward by a minute or whatever, but then it would be harder for real time to catch up with the vector clock when forward clock slew had happened. A complication is that many log files contain several different peices of information, and it may be best to only use vector clocks for the same peice of information. For example, a key's location log file contains InfoPresent/InfoMissing for each UUID, and it only looks at the vector clocks for the UUID that is being changed, and not other UUIDs. Although exactly where the dividing line is can be hard to determine. Consider metadata logs, where a field "tag" can have multiple values set at different times. Should it advance forward past the last tag? Probably. What about when a different field is set, should it look at the clocks of other fields? Perhaps not, but currently it does, and this does not seems like it will cause any problems. Another one I'm not entirely sure about is the export log, which is keyed by (fromuuid, touuid). So if multiple repos are exporting to the same remote, different vector clocks can be used for that remote. It looks like that's probably ok, because it does not try to determine what order things occurred when there was an export conflict. Sponsored-by: Jochen Bartl on Patreon	2021-08-04 12:33:46 -04:00
Ilya_Shlyakhter	af0fcf81a1	Added a comment: downloading torrent files to annex	2021-08-04 15:47:12 +00:00
Joey Hess	1a48d51e5b	devblog	2021-08-03 17:14:06 -04:00
Joey Hess	5c23489859	Merge branch 'master' of ssh://git-annex.branchable.com	2021-08-03 17:06:27 -04:00

... 2 3 4 5 6 ...

40542 commits