fsck: Detect situations where annex.thin has caused data loss to the content of locked files.

In particular, when two files had the same content, and one was unlocked
and modified, with annex.thin that can corrupt the content of the
annex object, and so fsck on the other file should detect that.

getKeyStatus was relying on Database.Keys.getAssociatedFiles to tell
when a file is unlocked, but that can false positive because the
database can list old associated files.

Instead, separate out the case of unlocked object which has multiple
hardlinks when annex.thin is in use.
This commit is contained in:
Joey Hess 2019-03-18 15:53:54 -04:00
parent 60ca3ce043
commit d5ee5fef65
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 73 additions and 26 deletions

View file

@ -110,3 +110,5 @@ fsck ffoo ok
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
I'm hopeful! But just getting started :(
> git-annex fsck made to detect and clean up after this, which I consider
> sufficient, so [[done]] --[[Joey]]

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2019-03-18T18:16:53Z"
content="""
annex.thin does allow data loss to happen, and just because there's
another annex link in the repository that points at the same object
file does not mean data loss is not acceptable.
While it might seem that, when adding the file with dup content, git-annex
could notice that the object file has been modified, and so replace it with
the new copy from ffoo, that would leave the same problem in
this equivilant situation:
# echo hi > 1
# echo hi > 2
# git annex add 1 2
# git annex unlock 1
# echo bye > 1
# cat 2
bye
The surprising thing to me is that `git annex fsck 2` (or ffoo in your
example) doesn't find any problem with it, despite it pointing at a
changed object file that doesn't have the right hash.
Fsck doesn't want to complain about *expected* data loss when an unlocked
file has been modified and annex.thin caused the old version to be lost.
But, when fscking an annex symlink, that shouldn't apply.
"""]]