narrow the race where a file gets modified before update-index

Check just before running update-index if the worktree file's content is
still the same, don't update it when it's been modified. This narrows
the race window a lot, from possibly minutes or hours, to seconds or
less.

(Use replaceFile so that the worktree update happens atomically,
allowing the InodeCache of the new worktree file to itself be gathered
w/o any other race.)

This doesn't eliminate the race; it can still occur in the window before
update-index runs. When annex.queue is large, a lot of files will be
statted by the checks, and so the window may still be large enough to be a
problem.

When only a few files are being processed, the window is as small as it
is in the race where a modification gets overwritten by git-annex when
it updates the worktree. Or maybe as small as whatever race git
checkout/pull/merge may have when the worktree gets modified during it.
Still, I've kept a todo about this race.

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2018-08-16 15:15:20 -04:00
parent 6a445dc086
commit 82a239675f
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 52 additions and 29 deletions

View file

@ -12,21 +12,6 @@ git-annex should use smudge/clean filters.
# because it doesn't know it has that name
# git commit clears up this mess
* If the user is getting a file that was not present, and at the same
time overwrites the file with new content, the new content can be staged
accidentially when git-annex runs git update-index on the file.
This race's window is wide because git-annex will process annex.queuesize
files before updating the index. It could be narrowed by running
update-index more frequently. Or, could check for modified files before
running it and throw those out, which would narrow the window a lot,
but not eliminate the race entirely.
(Of course there's also a race where the modification gets overwritten
by git-annex when it updates the worktree. Which is much like the race
that git checkout/pull/merge can overwite a modification, which is small
and unlikely but afaik unclosable.)
* Potentially: Use git's new `filter.<driver>.process` interface, which will
let only 1 git-annex process be started by git when processing
multiple files, and so should be faster.
@ -65,6 +50,19 @@ git-annex should use smudge/clean filters.
Note that the long-running filter process interface has the same problem.
* If the user is getting a file that was not present, and at the same
time overwrites the file with new content, the new content can be staged
accidentially when git-annex runs git update-index on the file.
The race window size has been made fairly small, but still
varies with annex.queuesize, since it filters out modified files
before running git update-index on all queued files. A modification
that occurs after the filter checks the file triggers the race.
To fully close this race would need a way to manually update the index
with the information git-annex knows, including the inode etc of the
worktree file.
* Eventually (but not yet), make v6 the default for new repositories.
Note that the assistant forces repos into direct mode; that will need to
be changed then, and it should enable annex.thin instead.