From 82cfcfc838c3a5421144bc59cfbf1d18bcd48390 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 16 Aug 2018 13:51:32 -0400 Subject: [PATCH] better index file refresh method Use git update-index --refresh, since it's a little bit more efficient and the user can be told to run it if a locked index prevents git-annex from running it. This also fixes the problem where an annexed file was deleted in the index and a get of another file that uses the same key caused the index update to add back the deleted file. update-index will not add back the deleted file. Documented in tips/unlocked_files.mdwn the gotcha that the index update may conflict with other operations. I can't see any way to possibly avoid that conflict. One new todo about a race that causes a modification to be accidentially staged. Note that the assistant only flushes the git command queue when it commits a modification. I have not tested the assistant with v6 unlocked files, but assume most users of the assistant won't care if the index shows a file as modified for a while. This commit was supported by the NSF-funded DataLad project. --- Annex/Content.hs | 20 ++---------------- Annex/Link.hs | 37 +++++++++++++++++++++++++++++++++- doc/tips/unlocked_files.mdwn | 19 +++++++++++++++++- doc/todo/smudge.mdwn | 39 ++++++++++++------------------------ 4 files changed, 69 insertions(+), 46 deletions(-) diff --git a/Annex/Content.hs b/Annex/Content.hs index 75182e9650..86f0a04de7 100644 --- a/Annex/Content.hs +++ b/Annex/Content.hs @@ -24,7 +24,6 @@ module Annex.Content ( checkDiskSpace, needMoreDiskSpace, moveAnnex, - Restage(..), populatePointerFile, linkToAnnex, linkFromAnnex, @@ -597,26 +596,11 @@ populatePointerFile restage k obj f = go =<< liftIO (isPointerFile f) ifM (linkOrCopy k obj f destmode) ( do thawContent f - restagePointerFile restage k f destmode + restagePointerFile restage f , liftIO $ writePointerFile f k destmode ) go _ = return () -newtype Restage = Restage Bool - -{- Re-stages a pointer file. This is used after updating a worktree file - - when content is added/removed, to prevent git from treating the worktree - - file as modified. - - - - If the index is known to be locked (eg, git add has run git-annex), - - the staging would fail, and Restage False will prevent it. - -} -restagePointerFile :: Restage -> Key -> FilePath -> Maybe FileMode -> Annex () -restagePointerFile (Restage False) _ _ _ = return () -restagePointerFile (Restage True) k f mode = do - pointersha <- hashPointerFile k - stagePointerFile f mode pointersha - data LinkAnnexResult = LinkAnnexOk | LinkAnnexFailed | LinkAnnexNoop {- Populates the annex object file by hard linking or copying a source @@ -856,7 +840,7 @@ removeAnnex (ContentRemovalLock key) = withObjectLoc key remove removedirect secureErase file liftIO $ nukeFile file liftIO $ writePointerFile file key mode - restagePointerFile (Restage True) key file mode + restagePointerFile (Restage True) file -- Modified file, so leave it alone. -- If it was a hard link to the annex object, -- that object might have been frozen as part of the diff --git a/Annex/Link.hs b/Annex/Link.hs index a75ed051ff..2dcb70d8bb 100644 --- a/Annex/Link.hs +++ b/Annex/Link.hs @@ -7,7 +7,7 @@ - - Pointer files are used instead of symlinks for unlocked files. - - - Copyright 2013-2015 Joey Hess + - Copyright 2013-2018 Joey Hess - - Licensed under the GNU GPL version 3 or higher. -} @@ -131,6 +131,41 @@ writePointerFile file k mode = do writeFile file (formatPointer k) maybe noop (setFileMode file) mode +newtype Restage = Restage Bool + +{- Restage pointer file. This is used after updating a worktree file + - when content is added/removed, to prevent git status from showing + - it as modified. + - + - Asks git to refresh its index information for the file. + - That in turn runs the clean filter on the file; when the clean + - filter produces the same pointer that was in the index before, git + - realizes that the file has not actually been modified. + - + - Note that, if the pointer file is staged for deletion, or has different + - content than the current worktree content staged, this won't change + - that. So it's safe to call at any time and any situation. + - + - If the index is known to be locked (eg, git add has run git-annex), + - that would fail. Restage False will prevent the index being updated. + - Will display a message to help the user understand why + - the file will appear to be modified. + - + - This uses the git queue, so the update is not performed immediately, + - and this can be run multiple times cheaply. + -} +restagePointerFile :: Restage -> FilePath -> Annex () +restagePointerFile (Restage False) f = toplevelWarning True $ unwords + [ "git status will show " ++ f + , "to be modified, since its content availability has changed." + , "This is only a cosmetic problem affecting git status; git add," + , "git commit, etc won't be affected." + , "To fix the git status display, you can run:" + , "git update-index -q --refresh " ++ f + ] +restagePointerFile (Restage True) f = + Annex.Queue.addCommand "update-index" [Param "-q", Param "--refresh"] [f] + {- Parses a symlink target or a pointer file to a Key. - Only looks at the first line, as pointer files can have subsequent - lines. -} diff --git a/doc/tips/unlocked_files.mdwn b/doc/tips/unlocked_files.mdwn index 68c7f20aab..4572e62edb 100644 --- a/doc/tips/unlocked_files.mdwn +++ b/doc/tips/unlocked_files.mdwn @@ -110,6 +110,23 @@ to having them all unlocked, you can do so using `git annex adjust useful when using filesystems like FAT, and OS's like Windows that don't support symlinks. +## index gotchas + +When git-annex gets or drops the content of an unlocked file, it updates +the file in git's worktree accordingly. Then it needs to update the index +file to reflect the change. Otherwise, `git status` would show the file +as modified, even though there are no changes to commit. + +This means that when git-annex is running a command that gets or drops the +content of an unlocked file, the index will sometimes be locked. This might +prevent you from `git commit` at the same time. Or, if you have a git +commit in progress, or are running multiple git-annex processes, git-annex +may complain that the index is locked. + +To manually update the index when git-annex was not able to, you can run: + + git update-index -q --refresh $file + ## using less disk space Unlocked files are handy, but they have one significant disadvantage @@ -154,7 +171,7 @@ So, using `git checkout` to check out a different branch, or even working tree, and using more disk space. A warning will be printed out in this situation. You can always run `git annex fix` to re-thin such files. -## tradeoffs +## annex.thin tradeoffs [[!template id=note text=""" When a [[direct_mode]] repository is upgraded, annex.thin is automatically diff --git a/doc/todo/smudge.mdwn b/doc/todo/smudge.mdwn index 6bdafd76ea..0215f9dc9a 100644 --- a/doc/todo/smudge.mdwn +++ b/doc/todo/smudge.mdwn @@ -12,33 +12,20 @@ git-annex should use smudge/clean filters. # because it doesn't know it has that name # git commit clears up this mess -* If an unlocked file's content is not present, and a new file with - identical content is added with `git add`, the unlocked file is - populated, but git-annex is unable to update the index, so git status - will say that it has been modified. - -* If an annexed file is deleted in the index, and another annexed file - uses the same key, and git annex get/drop is run, the index update - that's done to prevent status showing the file as modified adds - the deleted file back to the index. - -* Also, if the user is getting files, and modifying files at the same - time, and they stage their modifications, the modification may get - unstaged in a race when a file is got and the updated worktree file - staged in the index. +* If the user is getting a file that was not present, and at the same + time overwrites the file with new content, the new content can be staged + accidentially when git-annex runs git update-index on the file. - I don't know if this is worth worrying about, - because there's also of course a race where the modification to the - worktree file may get reverted when git-annex updates the content. Those - races are much smaller, but do exist. - -* get/drop operations on unlocked files lead to an update of the index. - Only one process can update the index at one time, so eg, git annex get - at the same time as a git commit may display a ugly warning - (or the git commit could fail to start if run at just the right time). - - Two git-annex get processes can also try to update the index at the - same time and encounter this problem (git annex get -J is ok). + This race's window is wide because git-annex will process annex.queuesize + files before updating the index. It could be narrowed by running + update-index more frequently. Or, could check for modified files before + running it and throw those out, which would narrow the window a lot, + but not eliminate the race entirely. + + (Of course there's also a race where the modification gets overwritten + by git-annex when it updates the worktree. Which is much like the race + that git checkout/pull/merge can overwite a modification, which is small + and unlikely but afaik unclosable.) * Potentially: Use git's new `filter..process` interface, which will let only 1 git-annex process be started by git when processing