fsck: Detect situations where annex.thin has caused data loss to the content of locked files.

In particular, when two files had the same content, and one was unlocked
and modified, with annex.thin that can corrupt the content of the
annex object, and so fsck on the other file should detect that.

getKeyStatus was relying on Database.Keys.getAssociatedFiles to tell
when a file is unlocked, but that can false positive because the
database can list old associated files.

Instead, separate out the case of unlocked object which has multiple
hardlinks when annex.thin is in use.
This commit is contained in:
Joey Hess 2019-03-18 15:53:54 -04:00
parent 60ca3ce043
commit d5ee5fef65
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 73 additions and 26 deletions

View file

@ -29,6 +29,8 @@ git-annex (7.20190220) UNRELEASED; urgency=medium
to support filenames starting with dashes. to support filenames starting with dashes.
(To update the config of existing repositories, you can (To update the config of existing repositories, you can
re-run git-annex init.) re-run git-annex init.)
* fsck: Detect situations where annex.thin has caused data loss
to the content of locked files.
-- Joey Hess <id@joeyh.name> Wed, 20 Feb 2019 14:20:59 -0400 -- Joey Hess <id@joeyh.name> Wed, 20 Feb 2019 14:20:59 -0400

View file

@ -1,6 +1,6 @@
{- git-annex command {- git-annex command
- -
- Copyright 2010-2018 Joey Hess <id@joeyh.name> - Copyright 2010-2019 Joey Hess <id@joeyh.name>
- -
- Licensed under the GNU AGPL version 3 or higher. - Licensed under the GNU AGPL version 3 or higher.
-} -}
@ -227,7 +227,7 @@ verifyLocationLog :: Key -> KeyStatus -> ActionItem -> Annex Bool
verifyLocationLog key keystatus ai = do verifyLocationLog key keystatus ai = do
direct <- isDirect direct <- isDirect
obj <- calcRepo $ gitAnnexLocation key obj <- calcRepo $ gitAnnexLocation key
present <- if not direct && isKeyUnlocked keystatus present <- if not direct && isKeyUnlockedThin keystatus
then liftIO (doesFileExist obj) then liftIO (doesFileExist obj)
else inAnnex key else inAnnex key
u <- getUUID u <- getUUID
@ -235,9 +235,10 @@ verifyLocationLog key keystatus ai = do
{- Since we're checking that a key's object file is present, throw {- Since we're checking that a key's object file is present, throw
- in a permission fixup here too. -} - in a permission fixup here too. -}
when (present && not direct) $ do when (present && not direct) $ do
void $ tryIO $ if isKeyUnlocked keystatus void $ tryIO $ case keystatus of
then thawContent obj KeyUnlockedThin -> thawContent obj
else freezeContent obj KeyLockedThin -> thawContent obj
_ -> freezeContent obj
unlessM (isContentWritePermOk obj) $ unlessM (isContentWritePermOk obj) $
warning $ "** Unable to set correct write mode for " ++ obj ++ " ; perhaps you don't own that file" warning $ "** Unable to set correct write mode for " ++ obj ++ " ; perhaps you don't own that file"
whenM (liftIO $ doesDirectoryExist $ parentDir obj) $ whenM (liftIO $ doesDirectoryExist $ parentDir obj) $
@ -326,13 +327,11 @@ verifyAssociatedFiles key keystatus file = do
forM_ fs $ \f -> forM_ fs $ \f ->
unlessM (liftIO $ doesFileExist f) $ unlessM (liftIO $ doesFileExist f) $
void $ Direct.removeAssociatedFile key f void $ Direct.removeAssociatedFile key f
goindirect = case keystatus of goindirect = when (isKeyUnlockedThin keystatus) $ do
KeyUnlocked -> do
f <- inRepo $ toTopFilePath file f <- inRepo $ toTopFilePath file
afs <- Database.Keys.getAssociatedFiles key afs <- Database.Keys.getAssociatedFiles key
unless (getTopFilePath f `elem` map getTopFilePath afs) $ unless (getTopFilePath f `elem` map getTopFilePath afs) $
Database.Keys.addAssociatedFile key f Database.Keys.addAssociatedFile key f
_ -> return ()
verifyWorkTree :: Key -> FilePath -> Annex Bool verifyWorkTree :: Key -> FilePath -> Annex Bool
verifyWorkTree key file = do verifyWorkTree key file = do
@ -372,7 +371,7 @@ verifyWorkTree key file = do
- Not checked when a file is unlocked, or in direct mode. - Not checked when a file is unlocked, or in direct mode.
-} -}
checkKeySize :: Key -> KeyStatus -> ActionItem -> Annex Bool checkKeySize :: Key -> KeyStatus -> ActionItem -> Annex Bool
checkKeySize _ KeyUnlocked _ = return True checkKeySize _ KeyUnlockedThin _ = return True
checkKeySize key _ ai = do checkKeySize key _ ai = do
file <- calcRepo $ gitAnnexLocation key file <- calcRepo $ gitAnnexLocation key
ifM (liftIO $ doesFileExist file) ifM (liftIO $ doesFileExist file)
@ -450,7 +449,7 @@ checkBackend backend key keystatus afile = go =<< isDirect
where where
go False = do go False = do
content <- calcRepo $ gitAnnexLocation key content <- calcRepo $ gitAnnexLocation key
ifM (pure (isKeyUnlocked keystatus) <&&> (not <$> isUnmodified key content)) ifM (pure (isKeyUnlockedThin keystatus) <&&> (not <$> isUnmodified key content))
( nocheck ( nocheck
, checkBackendOr badContent backend key content (mkActionItem afile) , checkBackendOr badContent backend key content (mkActionItem afile)
) )
@ -700,28 +699,42 @@ withFsckDb (StartIncremental h) a = a h
withFsckDb NonIncremental _ = noop withFsckDb NonIncremental _ = noop
withFsckDb (ScheduleIncremental _ _ i) a = withFsckDb i a withFsckDb (ScheduleIncremental _ _ i) a = withFsckDb i a
data KeyStatus = KeyLocked | KeyUnlocked | KeyMissing data KeyStatus
= KeyMissing
| KeyPresent
| KeyUnlockedThin
-- ^ An annex.thin worktree file is hard linked to the object.
| KeyLockedThin
-- ^ The object has hard links, but the file being fscked
-- is not the one that hard links to it.
deriving (Show)
isKeyUnlocked :: KeyStatus -> Bool isKeyUnlockedThin :: KeyStatus -> Bool
isKeyUnlocked KeyUnlocked = True isKeyUnlockedThin KeyUnlockedThin = True
isKeyUnlocked KeyLocked = False isKeyUnlockedThin KeyLockedThin = False
isKeyUnlocked KeyMissing = False isKeyUnlockedThin KeyPresent = False
isKeyUnlockedThin KeyMissing = False
getKeyStatus :: Key -> Annex KeyStatus getKeyStatus :: Key -> Annex KeyStatus
getKeyStatus key = ifM isDirect getKeyStatus key = ifM isDirect
( return KeyUnlocked ( return KeyUnlockedThin
, catchDefaultIO KeyMissing $ do , catchDefaultIO KeyMissing $ do
unlocked <- not . null <$> Database.Keys.getAssociatedFiles key afs <- not . null <$> Database.Keys.getAssociatedFiles key
return $ if unlocked then KeyUnlocked else KeyLocked obj <- calcRepo $ gitAnnexLocation key
multilink <- ((> 1) . linkCount <$> liftIO (getFileStatus obj))
return $ if multilink && afs
then KeyUnlockedThin
else KeyPresent
) )
getKeyFileStatus :: Key -> FilePath -> Annex KeyStatus getKeyFileStatus :: Key -> FilePath -> Annex KeyStatus
getKeyFileStatus key file = do getKeyFileStatus key file = do
s <- getKeyStatus key s <- getKeyStatus key
case s of case s of
KeyLocked -> catchDefaultIO KeyLocked $ KeyUnlockedThin -> catchDefaultIO KeyUnlockedThin $
ifM (isJust <$> isAnnexLink file) ifM (isJust <$> isAnnexLink file)
( return KeyLocked ( return KeyLockedThin
, return KeyUnlocked , return KeyUnlockedThin
) )
_ -> return s _ -> return s

View file

@ -70,7 +70,7 @@ perform file oldkey oldbackend newbackend = go =<< genkey (fastMigrate oldbacken
go (Just (newkey, knowngoodcontent)) go (Just (newkey, knowngoodcontent))
| knowngoodcontent = finish newkey | knowngoodcontent = finish newkey
| otherwise = stopUnless checkcontent $ finish newkey | otherwise = stopUnless checkcontent $ finish newkey
checkcontent = Command.Fsck.checkBackend oldbackend oldkey Command.Fsck.KeyLocked afile checkcontent = Command.Fsck.checkBackend oldbackend oldkey Command.Fsck.KeyPresent afile
finish newkey = ifM (Command.ReKey.linkKey file oldkey newkey) finish newkey = ifM (Command.ReKey.linkKey file oldkey newkey)
( do ( do
_ <- copyMetaData oldkey newkey _ <- copyMetaData oldkey newkey

View file

@ -110,3 +110,5 @@ fsck ffoo ok
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
I'm hopeful! But just getting started :( I'm hopeful! But just getting started :(
> git-annex fsck made to detect and clean up after this, which I consider
> sufficient, so [[done]] --[[Joey]]

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2019-03-18T18:16:53Z"
content="""
annex.thin does allow data loss to happen, and just because there's
another annex link in the repository that points at the same object
file does not mean data loss is not acceptable.
While it might seem that, when adding the file with dup content, git-annex
could notice that the object file has been modified, and so replace it with
the new copy from ffoo, that would leave the same problem in
this equivilant situation:
# echo hi > 1
# echo hi > 2
# git annex add 1 2
# git annex unlock 1
# echo bye > 1
# cat 2
bye
The surprising thing to me is that `git annex fsck 2` (or ffoo in your
example) doesn't find any problem with it, despite it pointing at a
changed object file that doesn't have the right hash.
Fsck doesn't want to complain about *expected* data loss when an unlocked
file has been modified and annex.thin caused the old version to be lost.
But, when fscking an annex symlink, that shouldn't apply.
"""]]