v6: recover from race between git mv and git-annex get/drop

Update pointer file next time reconcileStaged is run to recover from the
race.

Note that restagePointerFile causes git to run the clean filter,
and that will run reconcileStaged. So, normally by the time the git
annex get/drop command finishes, the race has already been dealt with.
It may be that, in some case, that won't happen and the race will be
dealt with at a later point. git-annex could run reconcileStaged at
shutdown if that becomes a problem.

This does not handle the situation where the git mv is committed before
git-annex gets a chance to run again. git commit does run the clean
filter, and that happens to re-inject the content if it was supposed to
be dropped but is still populated. But, the case where the file was
supposed to be gotten but is not populated is not handled yet.

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2018-08-22 15:28:57 -04:00
parent 9ff1c62a4d
commit 50fa17aee6
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 49 additions and 15 deletions

View file

@ -13,6 +13,7 @@ git-annex (6.20180808) UNRELEASED; urgency=medium
so git status will not show the files as modified. so git status will not show the files as modified.
* v6: Update associated files database when git has staged changes * v6: Update associated files database when git has staged changes
to pointer files. to pointer files.
* v6: Fix some race conditions.
* linux standalone: When LOCPATH is already set, use it instead of the * linux standalone: When LOCPATH is already set, use it instead of the
bundled locales. It can be set to an empty string to use the system bundled locales. It can be set to an empty string to use the system
locales too. locales too.

View file

@ -15,6 +15,7 @@ module Database.Keys (
getAssociatedKey, getAssociatedKey,
removeAssociatedFile, removeAssociatedFile,
storeInodeCaches, storeInodeCaches,
storeInodeCaches',
addInodeCaches, addInodeCaches,
getInodeCaches, getInodeCaches,
removeInodeCaches, removeInodeCaches,
@ -32,6 +33,8 @@ import Annex.Version (versionUsesKeysDatabase)
import qualified Annex import qualified Annex
import Annex.LockFile import Annex.LockFile
import Annex.CatFile import Annex.CatFile
import Annex.Content.PointerFile
import Annex.Link
import Utility.InodeCache import Utility.InodeCache
import Annex.InodeSentinal import Annex.InodeSentinal
import Git import Git
@ -136,9 +139,9 @@ openDb createdb _ = catchPermissionDenied permerr $ withExclusiveLock gitAnnexKe
True -> throwM e True -> throwM e
open db = do open db = do
h <- liftIO $ H.openDbQueue H.MultiWriter db SQL.containedTable qh <- liftIO $ H.openDbQueue H.MultiWriter db SQL.containedTable
reconcileStaged (SQL.WriteHandle h) reconcileStaged qh
return $ DbOpen h return $ DbOpen qh
{- Closes the database if it was open. Any writes will be flushed to it. {- Closes the database if it was open. Any writes will be flushed to it.
- -
@ -168,8 +171,12 @@ removeAssociatedFile k = runWriterIO . SQL.removeAssociatedFile (toIKey k)
{- Stats the files, and stores their InodeCaches. -} {- Stats the files, and stores their InodeCaches. -}
storeInodeCaches :: Key -> [FilePath] -> Annex () storeInodeCaches :: Key -> [FilePath] -> Annex ()
storeInodeCaches k fs = withTSDelta $ \d -> storeInodeCaches k fs = storeInodeCaches' k fs []
addInodeCaches k . catMaybes =<< liftIO (mapM (`genInodeCache` d) fs)
storeInodeCaches' :: Key -> [FilePath] -> [InodeCache] -> Annex ()
storeInodeCaches' k fs ics = withTSDelta $ \d ->
addInodeCaches k . (++ ics) . catMaybes
=<< liftIO (mapM (`genInodeCache` d) fs)
addInodeCaches :: Key -> [InodeCache] -> Annex () addInodeCaches :: Key -> [InodeCache] -> Annex ()
addInodeCaches k is = runWriterIO $ SQL.addInodeCaches (toIKey k) is addInodeCaches k is = runWriterIO $ SQL.addInodeCaches (toIKey k) is
@ -196,9 +203,22 @@ removeInodeCaches = runWriterIO . SQL.removeInodeCaches . toIKey
- -
- Note that this is run with a lock held, so only one process can be - Note that this is run with a lock held, so only one process can be
- running this at a time. - running this at a time.
-
- This also cleans up after a race between eg a git mv and git-annex
- get/drop/similar. If git moves the file between this being run and the
- get/drop, the moved file won't be updated for the get/drop.
- The next time this runs, it will see the staged change. It then checks
- if the worktree file's content availability does not match the git-annex
- content availablity, and makes changes as necessary to reconcile them.
-
- Note that if a commit happens before this runs again, it won't see
- the staged change. Instead, during the commit, git will run the clean
- filter. If a drop missed the file then the file is added back into the
- annex. If a get missed the file then the clean filter populates the
- file.
-} -}
reconcileStaged :: SQL.WriteHandle -> Annex () reconcileStaged :: H.DbQueue -> Annex ()
reconcileStaged h@(SQL.WriteHandle qh) = whenM versionUsesKeysDatabase $ do reconcileStaged qh = whenM versionUsesKeysDatabase $ do
gitindex <- inRepo currentIndexFile gitindex <- inRepo currentIndexFile
indexcache <- fromRepo gitAnnexKeysDbIndexCache indexcache <- fromRepo gitAnnexKeysDbIndexCache
withTSDelta (liftIO . genInodeCache gitindex) >>= \case withTSDelta (liftIO . genInodeCache gitindex) >>= \case
@ -250,15 +270,28 @@ reconcileStaged h@(SQL.WriteHandle qh) = whenM versionUsesKeysDatabase $ do
((':':_srcmode):dstmode:_srcsha:dstsha:_change:[]) ((':':_srcmode):dstmode:_srcsha:dstsha:_change:[])
-- Only want files, not symlinks -- Only want files, not symlinks
| dstmode /= fmtTreeItemType TreeSymlink -> do | dstmode /= fmtTreeItemType TreeSymlink -> do
catKey (Ref dstsha) >>= \case maybe noop (reconcile (asTopFilePath file))
Nothing -> noop =<< catKey (Ref dstsha)
Just k -> liftIO $
SQL.addAssociatedFileFast
(toIKey k)
(asTopFilePath file)
h
procdiff rest True procdiff rest True
| otherwise -> procdiff rest changed | otherwise -> procdiff rest changed
_ -> return changed -- parse failed _ -> return changed -- parse failed
procdiff _ changed = return changed procdiff _ changed = return changed
-- Note that database writes done in here will not necessarily
-- be visible to database reads also done in here.
reconcile file key = do
let ikey = toIKey key
liftIO $ SQL.addAssociatedFileFast ikey file (SQL.WriteHandle qh)
caches <- liftIO $ SQL.getInodeCaches ikey (SQL.ReadHandle qh)
keyloc <- calcRepo (gitAnnexLocation key)
keypopulated <- sameInodeCache keyloc caches
p <- fromRepo $ fromTopFilePath file
filepopulated <- sameInodeCache p caches
case (keypopulated, filepopulated) of
(True, False) ->
populatePointerFile (Restage True) key keyloc p >>= \case
Nothing -> return ()
Just ic -> liftIO $
SQL.addInodeCaches ikey [ic] (SQL.WriteHandle qh)
(False, True) -> depopulatePointerFile key p
_ -> return ()

View file

@ -13,7 +13,7 @@ git-annex should use smudge/clean filters. v6 mode
This could be partially dealt with in reconcileStaged. The next time This could be partially dealt with in reconcileStaged. The next time
git-annex runs it, it will notice the staged change, and it could update git-annex runs it, it will notice the staged change, and it could update
the worktree file that was not gotten/dropped before. the worktree file that was not gotten/dropped before. -- this is done now
But, if a git mv is run, and then a git commit, reconcileStaged won't But, if a git mv is run, and then a git commit, reconcileStaged won't
get a chance to notice the changes. git commit does run the clean filter. get a chance to notice the changes. git commit does run the clean filter.