better index file refresh method

Use git update-index --refresh, since it's a little bit more
efficient and the user can be told to run it if a locked index prevents
git-annex from running it.

This also fixes the problem where an annexed file was deleted in the index
and a get of another file that uses the same key caused the index update to
add back the deleted file. update-index will not add back the deleted file.

Documented in tips/unlocked_files.mdwn the gotcha that the index update
may conflict with other operations. I can't see any way to possibly avoid
that conflict.

One new todo about a race that causes a modification to be accidentially
staged.

Note that the assistant only flushes the git command queue when it
commits a modification. I have not tested the assistant with v6 unlocked
files, but assume most users of the assistant won't care if the index
shows a file as modified for a while.

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2018-08-16 13:51:32 -04:00
parent 5e87389f40
commit 82cfcfc838
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 69 additions and 46 deletions

View file

@ -24,7 +24,6 @@ module Annex.Content (
checkDiskSpace,
needMoreDiskSpace,
moveAnnex,
Restage(..),
populatePointerFile,
linkToAnnex,
linkFromAnnex,
@ -597,26 +596,11 @@ populatePointerFile restage k obj f = go =<< liftIO (isPointerFile f)
ifM (linkOrCopy k obj f destmode)
( do
thawContent f
restagePointerFile restage k f destmode
restagePointerFile restage f
, liftIO $ writePointerFile f k destmode
)
go _ = return ()
newtype Restage = Restage Bool
{- Re-stages a pointer file. This is used after updating a worktree file
- when content is added/removed, to prevent git from treating the worktree
- file as modified.
-
- If the index is known to be locked (eg, git add has run git-annex),
- the staging would fail, and Restage False will prevent it.
-}
restagePointerFile :: Restage -> Key -> FilePath -> Maybe FileMode -> Annex ()
restagePointerFile (Restage False) _ _ _ = return ()
restagePointerFile (Restage True) k f mode = do
pointersha <- hashPointerFile k
stagePointerFile f mode pointersha
data LinkAnnexResult = LinkAnnexOk | LinkAnnexFailed | LinkAnnexNoop
{- Populates the annex object file by hard linking or copying a source
@ -856,7 +840,7 @@ removeAnnex (ContentRemovalLock key) = withObjectLoc key remove removedirect
secureErase file
liftIO $ nukeFile file
liftIO $ writePointerFile file key mode
restagePointerFile (Restage True) key file mode
restagePointerFile (Restage True) file
-- Modified file, so leave it alone.
-- If it was a hard link to the annex object,
-- that object might have been frozen as part of the

View file

@ -7,7 +7,7 @@
-
- Pointer files are used instead of symlinks for unlocked files.
-
- Copyright 2013-2015 Joey Hess <id@joeyh.name>
- Copyright 2013-2018 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU GPL version 3 or higher.
-}
@ -131,6 +131,41 @@ writePointerFile file k mode = do
writeFile file (formatPointer k)
maybe noop (setFileMode file) mode
newtype Restage = Restage Bool
{- Restage pointer file. This is used after updating a worktree file
- when content is added/removed, to prevent git status from showing
- it as modified.
-
- Asks git to refresh its index information for the file.
- That in turn runs the clean filter on the file; when the clean
- filter produces the same pointer that was in the index before, git
- realizes that the file has not actually been modified.
-
- Note that, if the pointer file is staged for deletion, or has different
- content than the current worktree content staged, this won't change
- that. So it's safe to call at any time and any situation.
-
- If the index is known to be locked (eg, git add has run git-annex),
- that would fail. Restage False will prevent the index being updated.
- Will display a message to help the user understand why
- the file will appear to be modified.
-
- This uses the git queue, so the update is not performed immediately,
- and this can be run multiple times cheaply.
-}
restagePointerFile :: Restage -> FilePath -> Annex ()
restagePointerFile (Restage False) f = toplevelWarning True $ unwords
[ "git status will show " ++ f
, "to be modified, since its content availability has changed."
, "This is only a cosmetic problem affecting git status; git add,"
, "git commit, etc won't be affected."
, "To fix the git status display, you can run:"
, "git update-index -q --refresh " ++ f
]
restagePointerFile (Restage True) f =
Annex.Queue.addCommand "update-index" [Param "-q", Param "--refresh"] [f]
{- Parses a symlink target or a pointer file to a Key.
- Only looks at the first line, as pointer files can have subsequent
- lines. -}

View file

@ -110,6 +110,23 @@ to having them all unlocked, you can do so using `git annex adjust
useful when using filesystems like FAT, and OS's like Windows that don't
support symlinks.
## index gotchas
When git-annex gets or drops the content of an unlocked file, it updates
the file in git's worktree accordingly. Then it needs to update the index
file to reflect the change. Otherwise, `git status` would show the file
as modified, even though there are no changes to commit.
This means that when git-annex is running a command that gets or drops the
content of an unlocked file, the index will sometimes be locked. This might
prevent you from `git commit` at the same time. Or, if you have a git
commit in progress, or are running multiple git-annex processes, git-annex
may complain that the index is locked.
To manually update the index when git-annex was not able to, you can run:
git update-index -q --refresh $file
## using less disk space
Unlocked files are handy, but they have one significant disadvantage
@ -154,7 +171,7 @@ So, using `git checkout` to check out a different branch, or even
working tree, and using more disk space. A warning will be printed out in
this situation. You can always run `git annex fix` to re-thin such files.
## tradeoffs
## annex.thin tradeoffs
[[!template id=note text="""
When a [[direct_mode]] repository is upgraded, annex.thin is automatically

View file

@ -12,33 +12,20 @@ git-annex should use smudge/clean filters.
# because it doesn't know it has that name
# git commit clears up this mess
* If an unlocked file's content is not present, and a new file with
identical content is added with `git add`, the unlocked file is
populated, but git-annex is unable to update the index, so git status
will say that it has been modified.
* If an annexed file is deleted in the index, and another annexed file
uses the same key, and git annex get/drop is run, the index update
that's done to prevent status showing the file as modified adds
the deleted file back to the index.
* Also, if the user is getting files, and modifying files at the same
time, and they stage their modifications, the modification may get
unstaged in a race when a file is got and the updated worktree file
staged in the index.
* If the user is getting a file that was not present, and at the same
time overwrites the file with new content, the new content can be staged
accidentially when git-annex runs git update-index on the file.
I don't know if this is worth worrying about,
because there's also of course a race where the modification to the
worktree file may get reverted when git-annex updates the content. Those
races are much smaller, but do exist.
* get/drop operations on unlocked files lead to an update of the index.
Only one process can update the index at one time, so eg, git annex get
at the same time as a git commit may display a ugly warning
(or the git commit could fail to start if run at just the right time).
Two git-annex get processes can also try to update the index at the
same time and encounter this problem (git annex get -J is ok).
This race's window is wide because git-annex will process annex.queuesize
files before updating the index. It could be narrowed by running
update-index more frequently. Or, could check for modified files before
running it and throw those out, which would narrow the window a lot,
but not eliminate the race entirely.
(Of course there's also a race where the modification gets overwritten
by git-annex when it updates the worktree. Which is much like the race
that git checkout/pull/merge can overwite a modification, which is small
and unlikely but afaik unclosable.)
* Potentially: Use git's new `filter.<driver>.process` interface, which will
let only 1 git-annex process be started by git when processing