git-annex/Command/Lock.hs

121 lines
3.2 KiB
Haskell
Raw Normal View History

2010-11-09 19:59:49 +00:00
{- git-annex command
-
2015-12-11 14:42:18 +00:00
- Copyright 2010,2015 Joey Hess <id@joeyh.name>
2010-11-09 19:59:49 +00:00
-
- Licensed under the GNU GPL version 3 or higher.
-}
module Command.Lock where
import Command
2011-10-04 04:40:47 +00:00
import qualified Annex.Queue
2013-12-05 20:05:07 +00:00
import qualified Annex
2015-12-11 14:42:18 +00:00
import Annex.Version
import Annex.Content
import Annex.Link
import Annex.InodeSentinal
import Annex.Perms
import Annex.ReplaceFile
2015-12-11 14:42:18 +00:00
import Utility.InodeCache
import qualified Database.Keys
2015-12-22 17:23:33 +00:00
import Annex.Ingest
import Logs.Location
import Git.FilePath
cmd :: Command
cmd = notDirect $ withGlobalOptions annexedMatchingOptions $
command "lock" SectionCommon
"undo unlock command"
paramPaths (withParams seek)
2010-11-09 19:59:49 +00:00
seek :: CmdParams -> CommandSeek
seek ps = do
l <- workTreeItems ps
ifM versionSupportsUnlockedPointers
( withFilesInGit (whenAnnexed startNew) l
, do
withFilesOldUnlocked startOld l
withFilesOldUnlockedToBeCommitted startOld l
)
2010-11-11 22:54:52 +00:00
2015-12-11 14:42:18 +00:00
startNew :: FilePath -> Key -> CommandStart
startNew file key = ifM (isJust <$> isAnnexLink file)
( stop
, do
showStart "lock" file
2016-01-01 17:22:38 +00:00
go =<< liftIO (isPointerFile file)
)
2015-12-11 14:42:18 +00:00
where
go (Just key')
lock: Fix edge cases where data loss could occur in v6 mode. In the case where the pointer file is in place, and not the content of the object, lock's performNew was called with filemodified=True, which caused it to try to repopulate the object from an unmodified associated file, of which there were none. So, the content of the object got thrown away incorrectly. This was the cause (although not the root cause) of data loss in https://github.com/datalad/datalad/issues/1020 The same problem could also occur when the work tree file is modified, but the object is not, and lock is called with --force. Added a test case for this, since it's excercising the same code path and is easier to set up than the problem above. Note that this only occurred when the keys database did not have an inode cache recorded for the annex object. Normally, the annex object would be in there, but there are of course circumstances where the inode cache is out of sync with reality, since it's only a cache. Fixed by checking if the object is unmodified; if so we don't need to try to repopulate it. This does add an additional checksum to the unlock path, but it's already checksumming the worktree file in another case, so it doesn't slow it down overall. Further investigation found a similar problem occurred when smudge --clean is called on a file and the inode cache is not populated. cleanOldKeys deleted the unmodified old object file in this case. This was also fixed by checking if the object is unmodified. In general, use of getInodeCaches and sameInodeCache is potentially dangerous if the inode cache has not gotten populated for some reason. Better to use isUnmodified. I breifly auited other places that check the inode cache, and did not see any immediate problems, but it would be easy to miss this kind of problem.
2016-10-17 16:56:26 +00:00
| key' == key = cont
2015-12-11 14:42:18 +00:00
| otherwise = errorModified
go Nothing =
ifM (isUnmodified key file)
lock: Fix edge cases where data loss could occur in v6 mode. In the case where the pointer file is in place, and not the content of the object, lock's performNew was called with filemodified=True, which caused it to try to repopulate the object from an unmodified associated file, of which there were none. So, the content of the object got thrown away incorrectly. This was the cause (although not the root cause) of data loss in https://github.com/datalad/datalad/issues/1020 The same problem could also occur when the work tree file is modified, but the object is not, and lock is called with --force. Added a test case for this, since it's excercising the same code path and is easier to set up than the problem above. Note that this only occurred when the keys database did not have an inode cache recorded for the annex object. Normally, the annex object would be in there, but there are of course circumstances where the inode cache is out of sync with reality, since it's only a cache. Fixed by checking if the object is unmodified; if so we don't need to try to repopulate it. This does add an additional checksum to the unlock path, but it's already checksumming the worktree file in another case, so it doesn't slow it down overall. Further investigation found a similar problem occurred when smudge --clean is called on a file and the inode cache is not populated. cleanOldKeys deleted the unmodified old object file in this case. This was also fixed by checking if the object is unmodified. In general, use of getInodeCaches and sameInodeCache is potentially dangerous if the inode cache has not gotten populated for some reason. Better to use isUnmodified. I breifly auited other places that check the inode cache, and did not see any immediate problems, but it would be easy to miss this kind of problem.
2016-10-17 16:56:26 +00:00
( cont
2015-12-11 14:42:18 +00:00
, ifM (Annex.getState Annex.force)
lock: Fix edge cases where data loss could occur in v6 mode. In the case where the pointer file is in place, and not the content of the object, lock's performNew was called with filemodified=True, which caused it to try to repopulate the object from an unmodified associated file, of which there were none. So, the content of the object got thrown away incorrectly. This was the cause (although not the root cause) of data loss in https://github.com/datalad/datalad/issues/1020 The same problem could also occur when the work tree file is modified, but the object is not, and lock is called with --force. Added a test case for this, since it's excercising the same code path and is easier to set up than the problem above. Note that this only occurred when the keys database did not have an inode cache recorded for the annex object. Normally, the annex object would be in there, but there are of course circumstances where the inode cache is out of sync with reality, since it's only a cache. Fixed by checking if the object is unmodified; if so we don't need to try to repopulate it. This does add an additional checksum to the unlock path, but it's already checksumming the worktree file in another case, so it doesn't slow it down overall. Further investigation found a similar problem occurred when smudge --clean is called on a file and the inode cache is not populated. cleanOldKeys deleted the unmodified old object file in this case. This was also fixed by checking if the object is unmodified. In general, use of getInodeCaches and sameInodeCache is potentially dangerous if the inode cache has not gotten populated for some reason. Better to use isUnmodified. I breifly auited other places that check the inode cache, and did not see any immediate problems, but it would be easy to miss this kind of problem.
2016-10-17 16:56:26 +00:00
( cont
2015-12-11 14:42:18 +00:00
, errorModified
)
)
lock: Fix edge cases where data loss could occur in v6 mode. In the case where the pointer file is in place, and not the content of the object, lock's performNew was called with filemodified=True, which caused it to try to repopulate the object from an unmodified associated file, of which there were none. So, the content of the object got thrown away incorrectly. This was the cause (although not the root cause) of data loss in https://github.com/datalad/datalad/issues/1020 The same problem could also occur when the work tree file is modified, but the object is not, and lock is called with --force. Added a test case for this, since it's excercising the same code path and is easier to set up than the problem above. Note that this only occurred when the keys database did not have an inode cache recorded for the annex object. Normally, the annex object would be in there, but there are of course circumstances where the inode cache is out of sync with reality, since it's only a cache. Fixed by checking if the object is unmodified; if so we don't need to try to repopulate it. This does add an additional checksum to the unlock path, but it's already checksumming the worktree file in another case, so it doesn't slow it down overall. Further investigation found a similar problem occurred when smudge --clean is called on a file and the inode cache is not populated. cleanOldKeys deleted the unmodified old object file in this case. This was also fixed by checking if the object is unmodified. In general, use of getInodeCaches and sameInodeCache is potentially dangerous if the inode cache has not gotten populated for some reason. Better to use isUnmodified. I breifly auited other places that check the inode cache, and did not see any immediate problems, but it would be easy to miss this kind of problem.
2016-10-17 16:56:26 +00:00
cont = next $ performNew file key
2010-11-09 19:59:49 +00:00
lock: Fix edge cases where data loss could occur in v6 mode. In the case where the pointer file is in place, and not the content of the object, lock's performNew was called with filemodified=True, which caused it to try to repopulate the object from an unmodified associated file, of which there were none. So, the content of the object got thrown away incorrectly. This was the cause (although not the root cause) of data loss in https://github.com/datalad/datalad/issues/1020 The same problem could also occur when the work tree file is modified, but the object is not, and lock is called with --force. Added a test case for this, since it's excercising the same code path and is easier to set up than the problem above. Note that this only occurred when the keys database did not have an inode cache recorded for the annex object. Normally, the annex object would be in there, but there are of course circumstances where the inode cache is out of sync with reality, since it's only a cache. Fixed by checking if the object is unmodified; if so we don't need to try to repopulate it. This does add an additional checksum to the unlock path, but it's already checksumming the worktree file in another case, so it doesn't slow it down overall. Further investigation found a similar problem occurred when smudge --clean is called on a file and the inode cache is not populated. cleanOldKeys deleted the unmodified old object file in this case. This was also fixed by checking if the object is unmodified. In general, use of getInodeCaches and sameInodeCache is potentially dangerous if the inode cache has not gotten populated for some reason. Better to use isUnmodified. I breifly auited other places that check the inode cache, and did not see any immediate problems, but it would be easy to miss this kind of problem.
2016-10-17 16:56:26 +00:00
performNew :: FilePath -> Key -> CommandPerform
performNew file key = do
lockdown =<< calcRepo (gitAnnexLocation key)
2015-12-22 17:23:33 +00:00
addLink file key
2015-12-11 14:42:18 +00:00
=<< withTSDelta (liftIO . genInodeCache file)
next $ cleanupNew file key
where
lockdown obj = do
lock: Fix edge cases where data loss could occur in v6 mode. In the case where the pointer file is in place, and not the content of the object, lock's performNew was called with filemodified=True, which caused it to try to repopulate the object from an unmodified associated file, of which there were none. So, the content of the object got thrown away incorrectly. This was the cause (although not the root cause) of data loss in https://github.com/datalad/datalad/issues/1020 The same problem could also occur when the work tree file is modified, but the object is not, and lock is called with --force. Added a test case for this, since it's excercising the same code path and is easier to set up than the problem above. Note that this only occurred when the keys database did not have an inode cache recorded for the annex object. Normally, the annex object would be in there, but there are of course circumstances where the inode cache is out of sync with reality, since it's only a cache. Fixed by checking if the object is unmodified; if so we don't need to try to repopulate it. This does add an additional checksum to the unlock path, but it's already checksumming the worktree file in another case, so it doesn't slow it down overall. Further investigation found a similar problem occurred when smudge --clean is called on a file and the inode cache is not populated. cleanOldKeys deleted the unmodified old object file in this case. This was also fixed by checking if the object is unmodified. In general, use of getInodeCaches and sameInodeCache is potentially dangerous if the inode cache has not gotten populated for some reason. Better to use isUnmodified. I breifly auited other places that check the inode cache, and did not see any immediate problems, but it would be easy to miss this kind of problem.
2016-10-17 16:56:26 +00:00
ifM (isUnmodified key obj)
( breakhardlink obj
, repopulate obj
)
whenM (liftIO $ doesFileExist obj) $
freezeContent obj
-- It's ok if the file is hard linked to obj, but if some other
-- associated file is, we need to break that link to lock down obj.
breakhardlink obj = whenM (catchBoolIO $ (> 1) . linkCount <$> liftIO (getFileStatus obj)) $ do
mfc <- withTSDelta (liftIO . genInodeCache file)
unlessM (sameInodeCache obj (maybeToList mfc)) $ do
modifyContent obj $ replaceFile obj $ \tmp -> do
unlessM (checkedCopyFile key obj tmp Nothing) $
giveup "unable to lock file"
Database.Keys.storeInodeCaches key [obj]
-- Try to repopulate obj from an unmodified associated file.
lock: Fix edge cases where data loss could occur in v6 mode. In the case where the pointer file is in place, and not the content of the object, lock's performNew was called with filemodified=True, which caused it to try to repopulate the object from an unmodified associated file, of which there were none. So, the content of the object got thrown away incorrectly. This was the cause (although not the root cause) of data loss in https://github.com/datalad/datalad/issues/1020 The same problem could also occur when the work tree file is modified, but the object is not, and lock is called with --force. Added a test case for this, since it's excercising the same code path and is easier to set up than the problem above. Note that this only occurred when the keys database did not have an inode cache recorded for the annex object. Normally, the annex object would be in there, but there are of course circumstances where the inode cache is out of sync with reality, since it's only a cache. Fixed by checking if the object is unmodified; if so we don't need to try to repopulate it. This does add an additional checksum to the unlock path, but it's already checksumming the worktree file in another case, so it doesn't slow it down overall. Further investigation found a similar problem occurred when smudge --clean is called on a file and the inode cache is not populated. cleanOldKeys deleted the unmodified old object file in this case. This was also fixed by checking if the object is unmodified. In general, use of getInodeCaches and sameInodeCache is potentially dangerous if the inode cache has not gotten populated for some reason. Better to use isUnmodified. I breifly auited other places that check the inode cache, and did not see any immediate problems, but it would be easy to miss this kind of problem.
2016-10-17 16:56:26 +00:00
repopulate obj = modifyContent obj $ do
g <- Annex.gitRepo
fs <- map (`fromTopFilePath` g)
<$> Database.Keys.getAssociatedFiles key
mfile <- firstM (isUnmodified key) fs
liftIO $ nukeFile obj
case mfile of
Just unmodified ->
unlessM (checkedCopyFile key unmodified obj Nothing)
lostcontent
Nothing -> lostcontent
lostcontent = logStatus key InfoMissing
2015-12-11 14:42:18 +00:00
cleanupNew :: FilePath -> Key -> CommandCleanup
cleanupNew file key = do
Database.Keys.removeAssociatedFile key =<< inRepo (toTopFilePath file)
2015-12-11 14:42:18 +00:00
return True
startOld :: FilePath -> CommandStart
startOld file = do
showStart "lock" file
unlessM (Annex.getState Annex.force)
errorModified
next $ performOld file
performOld :: FilePath -> CommandPerform
performOld file = do
Annex.Queue.addCommand "checkout" [Param "--"] [file]
2015-12-11 14:42:18 +00:00
next $ return True
errorModified :: a
errorModified = giveup "Locking this file would discard any changes you have made to it. Use 'git annex add' to stage your changes. (Or, use --force to override)"