analysis, plan

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2018-08-27 13:14:25 -04:00
parent d7f386a81d
commit 8478544b58
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38

View file

@ -0,0 +1,53 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2018-08-27T16:27:47Z"
content="""
I was able to reproduce what I described in comment #2 with current
git-annex.
Also, I was able to reproduce the fsck failure, by touching the
.git/annex/objects file. Even though the file is not modifiable, touching
it updates its mtime, and so the inode cache is considered stale.
This makes inAnnex think it's not present.
And another way is, when annex.thin is enabled, to touch the unlocked file,
which also touched the .git/annex/objects it's hardlinked to.
git-annex lock then checksums it (because the inode cache is stale),
concludes it's still good, and so proceeds to lock it, but the stale inode
cache again causes inAnnex to think it's not present.
I still don't know how to reproduce git-annex get redownloading a file that
git-annex find lists.
> Should inAnnex even be checking the inode cache for locked content? This seems unncessary, and note that it's done for v4 mode as well as v6.
Unnecessary for v4 certianly, but in v6 with annex.thin,
unlocked content is hard linked and could be modified,
so it does need to check that the inode cache is valid.
Could remove the inode cache when locking a file, as long as there
are no other associated files that are unlocked. That solves the problem
for that case, and makes inAnnex avoid some unncessary work too.
When the same content has a mix of locked and unlocked associated files,
the inode cache needs to remain populated (to support annex.thin and so
git-annex drop will drop the copies of the content from the locked files).
But then if the inode cache for the locked content becomes stale, and
the unlocked files get modified, inAnnex will again be wrong.
So, it seems that inAnnex also needs to check if the annex object
has no hard links, and if so, treat it as present even when the inode cache
does not match. That's cheaper than checking the inode cache, so it ought
to be done first.
Conclusion: Only check the inode cache for the annex object
when annex.thin has made a hard link to it. If its link count is 1,
inAnnex knows it's present and locked, so it can assume it's good.
(Also, it's possible there could be a race when locking a file
where the file gets modified after the inode cache is checked,
so it's treated as unmodified but is really modified and as a hard link to
the object file, the object file has the wrong content. Need to make sure
this race can't occur.)
"""]]