analysis, plan
This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
parent
d7f386a81d
commit
8478544b58
1 changed files with 53 additions and 0 deletions
|
@ -0,0 +1,53 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 7"""
|
||||||
|
date="2018-08-27T16:27:47Z"
|
||||||
|
content="""
|
||||||
|
I was able to reproduce what I described in comment #2 with current
|
||||||
|
git-annex.
|
||||||
|
|
||||||
|
Also, I was able to reproduce the fsck failure, by touching the
|
||||||
|
.git/annex/objects file. Even though the file is not modifiable, touching
|
||||||
|
it updates its mtime, and so the inode cache is considered stale.
|
||||||
|
This makes inAnnex think it's not present.
|
||||||
|
|
||||||
|
And another way is, when annex.thin is enabled, to touch the unlocked file,
|
||||||
|
which also touched the .git/annex/objects it's hardlinked to.
|
||||||
|
git-annex lock then checksums it (because the inode cache is stale),
|
||||||
|
concludes it's still good, and so proceeds to lock it, but the stale inode
|
||||||
|
cache again causes inAnnex to think it's not present.
|
||||||
|
|
||||||
|
I still don't know how to reproduce git-annex get redownloading a file that
|
||||||
|
git-annex find lists.
|
||||||
|
|
||||||
|
> Should inAnnex even be checking the inode cache for locked content? This seems unncessary, and note that it's done for v4 mode as well as v6.
|
||||||
|
|
||||||
|
Unnecessary for v4 certianly, but in v6 with annex.thin,
|
||||||
|
unlocked content is hard linked and could be modified,
|
||||||
|
so it does need to check that the inode cache is valid.
|
||||||
|
|
||||||
|
Could remove the inode cache when locking a file, as long as there
|
||||||
|
are no other associated files that are unlocked. That solves the problem
|
||||||
|
for that case, and makes inAnnex avoid some unncessary work too.
|
||||||
|
|
||||||
|
When the same content has a mix of locked and unlocked associated files,
|
||||||
|
the inode cache needs to remain populated (to support annex.thin and so
|
||||||
|
git-annex drop will drop the copies of the content from the locked files).
|
||||||
|
But then if the inode cache for the locked content becomes stale, and
|
||||||
|
the unlocked files get modified, inAnnex will again be wrong.
|
||||||
|
|
||||||
|
So, it seems that inAnnex also needs to check if the annex object
|
||||||
|
has no hard links, and if so, treat it as present even when the inode cache
|
||||||
|
does not match. That's cheaper than checking the inode cache, so it ought
|
||||||
|
to be done first.
|
||||||
|
|
||||||
|
Conclusion: Only check the inode cache for the annex object
|
||||||
|
when annex.thin has made a hard link to it. If its link count is 1,
|
||||||
|
inAnnex knows it's present and locked, so it can assume it's good.
|
||||||
|
|
||||||
|
(Also, it's possible there could be a race when locking a file
|
||||||
|
where the file gets modified after the inode cache is checked,
|
||||||
|
so it's treated as unmodified but is really modified and as a hard link to
|
||||||
|
the object file, the object file has the wrong content. Need to make sure
|
||||||
|
this race can't occur.)
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue