Commit graph

42001 commits

Author SHA1 Message Date
Lukey
768bcd18a7 Added a comment 2021-08-06 06:02:38 +00:00
Rob
649079413b Added a comment: creating directory special remote "in-place" 2021-08-05 18:13:36 +00:00
Joey Hess
c5abe37141
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-04 12:40:56 -04:00
Joey Hess
8886ff1cff
done! 2021-08-04 12:40:25 -04:00
Joey Hess
9b9b5759b0
Merge branch 'vectorclock' 2021-08-04 12:39:54 -04:00
Joey Hess
1acdd18ea8
deal better with clock skew situations, using vector clocks
* Deal with clock skew, both forwards and backwards, when logging
  information to the git-annex branch.
* GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1)
  rather than needing to be advanced each time a new change is made.
* Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex.

When changing a file in the git-annex branch, the vector clock to use is now
determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK
when set), and comparing it to the newest vector clock already in use in
that file. If a newer time stamp was already in use, advance it forward by
a second instead.

When the clock is set to a time in the past, this avoids logging with
an old timestamp, which would risk that log line later being ignored in favor
of "newer" line that is really not newer.

When a log entry has been made with a clock that was set far ahead in the
future, this avoids newer information being logged with an older timestamp
and so being ignored in favor of that future-timestamped information.
Once all clocks get fixed, this will result in the vector clocks being
incremented, until finally enough time has passed that time gets back ahead
of the vector clock value, and then it will return to usual operation.

(This latter situation is not ideal, but it seems the best that can be done.
The issue with it is, since all writers will be incrementing the last
vector clock they saw, there's no way to tell when one writer made a write
significantly later in time than another, so the earlier write might
arbitrarily be picked when merging. This problem is why git-annex uses
timestamps in the first place, rather than pure vector clocks.)

Advancing forward by 1 second is somewhat arbitrary. setDead
advances a timestamp by just 1 picosecond, and the vector clock could
too. But then it would interfere with setDead, which wants to be
overrulled by any change. So it could use 2 picoseconds or something,
but that seems weird. It could just as well advance it forward by a
minute or whatever, but then it would be harder for real time to catch
up with the vector clock when forward clock slew had happened.

A complication is that many log files contain several different peices of
information, and it may be best to only use vector clocks for the same peice
of information. For example, a key's location log file contains
InfoPresent/InfoMissing for each UUID, and it only looks at the vector
clocks for the UUID that is being changed, and not other UUIDs.

Although exactly where the dividing line is can be hard to determine.
Consider metadata logs, where a field "tag" can have multiple values set
at different times. Should it advance forward past the last tag?
Probably. What about when a different field is set, should it look at
the clocks of other fields? Perhaps not, but currently it does, and
this does not seems like it will cause any problems.

Another one I'm not entirely sure about is the export log, which is
keyed by (fromuuid, touuid). So if multiple repos are exporting to the
same remote, different vector clocks can be used for that remote.
It looks like that's probably ok, because it does not try to determine
what order things occurred when there was an export conflict.

Sponsored-by: Jochen Bartl on Patreon
2021-08-04 12:33:46 -04:00
Ilya_Shlyakhter
af0fcf81a1 Added a comment: downloading torrent files to annex 2021-08-04 15:47:12 +00:00
Joey Hess
1a48d51e5b
devblog 2021-08-03 17:14:06 -04:00
Joey Hess
5c23489859
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-03 17:06:27 -04:00
Joey Hess
c67b1e31a6
branch 2021-08-03 17:05:50 -04:00
spwhitton
b28fb4305b Added a comment 2021-08-03 19:10:10 +00:00
Joey Hess
629e95fd8e
update 2021-08-03 14:03:25 -04:00
Joey Hess
bb56186daa
new todo.. I seem to have cracked a longstanding problem
Sponsored-by: Jochen Bartl on Patreon
2021-08-03 13:51:23 -04:00
jwrauch
aba263450a Added a comment 2021-08-03 16:36:21 +00:00
Joey Hess
899983058f
add: When adding a dotfile, avoid treating its name as an extension. 2021-08-03 12:22:58 -04:00
Joey Hess
2572950ca7
add news item for git-annex 8.20210803 2021-08-03 12:21:10 -04:00
Joey Hess
9cae7c5bbf
releasing package git-annex version 8.20210803 2021-08-03 12:20:45 -04:00
Joey Hess
9b85d95333
whitespace 2021-08-03 12:18:10 -04:00
Ilya_Shlyakhter
6e67ea5d7c Added a comment 2021-08-03 15:14:53 +00:00
Ilya_Shlyakhter
4ce24173e8 Added a comment: don't give up ;) 2021-08-03 15:06:35 +00:00
Lukey
2bd3be5430 Added a comment 2021-08-03 07:54:13 +00:00
Joey Hess
7334893d42
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-02 14:11:36 -04:00
Joey Hess
6111958440
fix test suite
14683da9eb caused a test suite failure.
When the content of a key is not present, a LinkAnnexFailed is returned,
but replaceFile then tried to move the file into place, and since it was
not written, that crashed.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2021-08-02 13:59:23 -04:00
Joey Hess
86bd9ac186
fix missing new lines in processTranscript 2021-08-02 13:42:27 -04:00
jwrauch
e732bc914f 2021-08-02 17:03:23 +00:00
yarikoptic
20ffa01087 initial observation of .dot filename to consider having .dot extension 2021-08-02 16:56:19 +00:00
jgsuess@732b8c62c50d8595d7b1d58eea11e5019c2308b1
70fe0862b4 Added a comment: Also seeing this behaviour 2021-08-02 06:14:41 +00:00
mattplasmastrike@59cb7d099c8665f6c7668d78d9f17db87cabc2fe
2ad890f3df Added a comment 2021-07-31 23:59:12 +00:00
Joey Hess
b3c4579c79
work around strange auto-init bug
git-annex get when run as the first git-annex command in a new repo did not
populate unlocked files. (Reversion in version 8.20210621)

I am not entirely happy with this, because I don't understand how
428c91606b caused the problem in the first
place, and I don't fully understand how skipping calling scanAnnexedFiles
during autoinit avoids the problem.

Kept the explicit call to scanAnnexedFiles during git-annex init,
so that when reconcileStaged is expensive, it can be made to run then,
rather than at some later point when the information is needed.

Sponsored-by: Brock Spratlen on Patreon
2021-07-30 18:36:03 -04:00
Joey Hess
9f94d2894e
remove unused code 2021-07-30 18:01:36 -04:00
Joey Hess
748addbe05
remove second pass in scanAnnexedFiles
The pass was needed to populate files when annex.thin was set,
but in commit 73e0cbbb19,
reconcileStaged started to do that. So, this second pass is not needed
any longer.
2021-07-30 17:46:11 -04:00
Joey Hess
c912e7c4fd
bug 2021-07-30 17:13:15 -04:00
Joey Hess
461035c6ec
close
I'm now reasonably sure I've identified both cases where this can
happen. v8 upgrades and certian filesystems eg NFS. Both are handled as
well as can be, though it may involve some extra checksumming work.
2021-07-30 15:22:22 -04:00
Joey Hess
66089e97de
Fix a rounding bug in display of data sizes
Eg, showImprecise 1 1.99 returned "1.1" rather than "2". The 9 rounded
upward to 10, and that was wrongly used as the decimal, rather than
carrying the 1.

Sponsored-by: Jack Hill on Patreon
2021-07-30 09:56:04 -04:00
uli@8484a70fbfd489faef5f72c230d340b01e2676ca
1c4d6dee90 bugreport: git-annex info . formats 2 TB as 1.1 TB 2021-07-30 07:24:53 +00:00
Joey Hess
d2aead67bd
fsck: Detect and correct stale or missing inode caches for object files
An easy way to see this in action is to have an unlocked file, and touch the
object file.

While all code that compares inode caches for object files needs to be
prepared for this kind of problem and fall back to verification, having
fsck notice it and correct it is cheap (as long as fsck is being run
anyway) and ensures that if it happens for some unusual reason, there's a
way for the user to notice that it's happening.

Not that, when annex.thin is in use, the earlier call to isUnmodified
(and also potentially earlier calls to inAnnex in eg, verifyLocationLog)
will fix up the same problem silently. That might prevent the warning
being displayed, although probably it still will be, because the
Database.Keys write of the InodeCache will be queued but will not have
happened yet. I can't see a way to improve this, but it's not great.

Sponsored-by: Dartmouth College's Datalad project
2021-07-29 14:06:42 -04:00
Joey Hess
817ccbbc47
split verifyKeyContent
This avoids it calling enteringStage VerifyStage when it's used in
places that only fall back to verification rarely, and which might be
called while in TransferStage and be going to perform a transfer after
the verification.
2021-07-29 13:58:40 -04:00
Joey Hess
d4fc506f27
comment 2021-07-29 13:33:11 -04:00
Joey Hess
3c5280b1cf
improve comment wording 2021-07-29 13:21:23 -04:00
Joey Hess
897fd5c104
add note 2021-07-29 13:14:03 -04:00
Joey Hess
72a13d2a5f
remove unused parameter 2021-07-29 13:12:11 -04:00
Joey Hess
3af0e0b4de
Merge branch 'master' of ssh://git-annex.branchable.com 2021-07-29 12:30:39 -04:00
Joey Hess
067a9c70c7
simplify code 2021-07-29 12:28:13 -04:00
Joey Hess
3e0b210039
remove unncessary debugs
Keeping the ones in Annex.InodeSentinal
2021-07-29 12:19:37 -04:00
mih
1e5c60132a Added a comment: Cause cannot (only) be a v7->v8 upgrade 2021-07-29 05:40:36 +00:00
Joey Hess
a306560374
use SQL.addInodeCaches
This avoids deadlock when opening the database handle calls
reconcileStaged.
2021-07-27 17:34:56 -04:00
Joey Hess
73e0cbbb19
fix problem populating pointer files
This is a result of an audit of every use of getInodeCaches,
to find places that misbehave when the annex object is not in the inode
cache, despite pointer files for the same key being in the inode cache.

Unfortunately, that is the case for objects that were in v7 repos that
upgraded to v8. Added a note about this gotcha to getInodeCaches.

Database.Keys.reconcileStaged, then annex.thin is set, would fail to
populate pointer files in this situation. Changed it to check if the
annex object is unmodified the same way inAnnex does, falling back to a
checksum if the inode cache is not recorded.

Sponsored-by: Dartmouth College's Datalad project
2021-07-27 14:26:49 -04:00
Joey Hess
de482c7eeb
move verifyKeyContent to Annex.Verify
The goal is that Database.Keys be able to use it; it can't use
Annex.Content.Presence due to an import loop.

Several other things also needed to be moved to Annex.Verify as a
conseqence.
2021-07-27 14:07:23 -04:00
Joey Hess
0ec5919bbe
comment 2021-07-27 13:45:33 -04:00
Joey Hess
ac59507809
Merge branch 'master' of ssh://git-annex.branchable.com 2021-07-27 13:16:22 -04:00