2010-11-09 19:59:49 +00:00
{- git - annex command
-
2015-12-11 14:42:18 +00:00
- Copyright 2010 , 2015 Joey Hess < id @ joeyh . name >
2010-11-09 19:59:49 +00:00
-
2019-03-13 19:48:14 +00:00
- Licensed under the GNU AGPL version 3 or higher .
2010-11-09 19:59:49 +00:00
- }
module Command.Lock where
import Command
2013-12-05 20:05:07 +00:00
import qualified Annex
2015-12-11 14:42:18 +00:00
import Annex.Content
import Annex.Link
import Annex.InodeSentinal
2015-12-11 19:13:36 +00:00
import Annex.Perms
import Annex.ReplaceFile
2015-12-11 14:42:18 +00:00
import Utility.InodeCache
import qualified Database.Keys
2015-12-22 17:23:33 +00:00
import Annex.Ingest
2015-12-11 19:13:36 +00:00
import Logs.Location
2016-01-05 21:22:19 +00:00
import Git.FilePath
2019-12-11 18:12:22 +00:00
import qualified Utility.RawFilePath as R
2010-12-30 19:06:26 +00:00
2015-07-08 16:33:27 +00:00
cmd :: Command
2022-06-29 17:28:08 +00:00
cmd = withAnnexOptions [ jsonOptions , annexedMatchingOptions ] $
2015-07-08 19:08:02 +00:00
command " lock " SectionCommon
" undo unlock command "
paramPaths ( withParams seek )
2010-11-09 19:59:49 +00:00
2015-07-08 19:08:02 +00:00
seek :: CmdParams -> CommandSeek
2020-07-13 21:04:02 +00:00
seek ps = withFilesInGitAnnex ww seeker =<< workTreeItems ww ps
2020-05-28 19:55:17 +00:00
where
ww = WarnUnmatchLsFiles
2020-07-13 21:04:02 +00:00
seeker = AnnexedFileSeeker
2020-07-22 18:23:28 +00:00
{ startAction = start
2020-07-13 21:04:02 +00:00
, checkContentPresent = Nothing
, usesLocationLog = False
}
2010-11-11 22:54:52 +00:00
2020-09-14 20:49:33 +00:00
start :: SeekInput -> RawFilePath -> Key -> CommandStart
start si file key = ifM ( isJust <$> isAnnexLink file )
2015-12-11 19:13:36 +00:00
( stop
2020-09-14 20:49:33 +00:00
, starting " lock " ( mkActionItem ( key , file ) ) si $
2016-01-01 17:22:38 +00:00
go =<< liftIO ( isPointerFile file )
2015-12-11 19:13:36 +00:00
)
2015-12-11 14:42:18 +00:00
where
go ( Just key' )
lock: Fix edge cases where data loss could occur in v6 mode.
In the case where the pointer file is in place, and not the content
of the object, lock's performNew was called with filemodified=True,
which caused it to try to repopulate the object from an unmodified
associated file, of which there were none. So, the content of the object
got thrown away incorrectly. This was the cause (although not the root
cause) of data loss in https://github.com/datalad/datalad/issues/1020
The same problem could also occur when the work tree file is modified,
but the object is not, and lock is called with --force. Added a test case
for this, since it's excercising the same code path and is easier to set up
than the problem above.
Note that this only occurred when the keys database did not have an inode
cache recorded for the annex object. Normally, the annex object would be in
there, but there are of course circumstances where the inode cache is out
of sync with reality, since it's only a cache.
Fixed by checking if the object is unmodified; if so we don't need to
try to repopulate it. This does add an additional checksum to the unlock
path, but it's already checksumming the worktree file in another case,
so it doesn't slow it down overall.
Further investigation found a similar problem occurred when smudge --clean
is called on a file and the inode cache is not populated. cleanOldKeys
deleted the unmodified old object file in this case. This was also
fixed by checking if the object is unmodified.
In general, use of getInodeCaches and sameInodeCache is potentially
dangerous if the inode cache has not gotten populated for some reason.
Better to use isUnmodified. I breifly auited other places that check the
inode cache, and did not see any immediate problems, but it would be easy
to miss this kind of problem.
2016-10-17 16:56:26 +00:00
| key' == key = cont
2015-12-11 14:42:18 +00:00
| otherwise = errorModified
go Nothing =
2019-12-11 18:12:22 +00:00
ifM ( isUnmodified key file )
lock: Fix edge cases where data loss could occur in v6 mode.
In the case where the pointer file is in place, and not the content
of the object, lock's performNew was called with filemodified=True,
which caused it to try to repopulate the object from an unmodified
associated file, of which there were none. So, the content of the object
got thrown away incorrectly. This was the cause (although not the root
cause) of data loss in https://github.com/datalad/datalad/issues/1020
The same problem could also occur when the work tree file is modified,
but the object is not, and lock is called with --force. Added a test case
for this, since it's excercising the same code path and is easier to set up
than the problem above.
Note that this only occurred when the keys database did not have an inode
cache recorded for the annex object. Normally, the annex object would be in
there, but there are of course circumstances where the inode cache is out
of sync with reality, since it's only a cache.
Fixed by checking if the object is unmodified; if so we don't need to
try to repopulate it. This does add an additional checksum to the unlock
path, but it's already checksumming the worktree file in another case,
so it doesn't slow it down overall.
Further investigation found a similar problem occurred when smudge --clean
is called on a file and the inode cache is not populated. cleanOldKeys
deleted the unmodified old object file in this case. This was also
fixed by checking if the object is unmodified.
In general, use of getInodeCaches and sameInodeCache is potentially
dangerous if the inode cache has not gotten populated for some reason.
Better to use isUnmodified. I breifly auited other places that check the
inode cache, and did not see any immediate problems, but it would be easy
to miss this kind of problem.
2016-10-17 16:56:26 +00:00
( cont
2022-06-28 19:28:14 +00:00
, ifM ( Annex . getRead Annex . force )
lock: Fix edge cases where data loss could occur in v6 mode.
In the case where the pointer file is in place, and not the content
of the object, lock's performNew was called with filemodified=True,
which caused it to try to repopulate the object from an unmodified
associated file, of which there were none. So, the content of the object
got thrown away incorrectly. This was the cause (although not the root
cause) of data loss in https://github.com/datalad/datalad/issues/1020
The same problem could also occur when the work tree file is modified,
but the object is not, and lock is called with --force. Added a test case
for this, since it's excercising the same code path and is easier to set up
than the problem above.
Note that this only occurred when the keys database did not have an inode
cache recorded for the annex object. Normally, the annex object would be in
there, but there are of course circumstances where the inode cache is out
of sync with reality, since it's only a cache.
Fixed by checking if the object is unmodified; if so we don't need to
try to repopulate it. This does add an additional checksum to the unlock
path, but it's already checksumming the worktree file in another case,
so it doesn't slow it down overall.
Further investigation found a similar problem occurred when smudge --clean
is called on a file and the inode cache is not populated. cleanOldKeys
deleted the unmodified old object file in this case. This was also
fixed by checking if the object is unmodified.
In general, use of getInodeCaches and sameInodeCache is potentially
dangerous if the inode cache has not gotten populated for some reason.
Better to use isUnmodified. I breifly auited other places that check the
inode cache, and did not see any immediate problems, but it would be easy
to miss this kind of problem.
2016-10-17 16:56:26 +00:00
( cont
2015-12-11 14:42:18 +00:00
, errorModified
)
)
2020-07-10 19:40:06 +00:00
cont = perform file key
2010-11-09 19:59:49 +00:00
2020-07-10 19:40:06 +00:00
perform :: RawFilePath -> Key -> CommandPerform
perform file key = do
2015-12-11 19:13:36 +00:00
lockdown =<< calcRepo ( gitAnnexLocation key )
2022-06-14 18:40:55 +00:00
addSymlink file key =<< withTSDelta ( liftIO . genInodeCache file )
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
next $ return True
2015-12-11 19:13:36 +00:00
where
lockdown obj = do
lock: Fix edge cases where data loss could occur in v6 mode.
In the case where the pointer file is in place, and not the content
of the object, lock's performNew was called with filemodified=True,
which caused it to try to repopulate the object from an unmodified
associated file, of which there were none. So, the content of the object
got thrown away incorrectly. This was the cause (although not the root
cause) of data loss in https://github.com/datalad/datalad/issues/1020
The same problem could also occur when the work tree file is modified,
but the object is not, and lock is called with --force. Added a test case
for this, since it's excercising the same code path and is easier to set up
than the problem above.
Note that this only occurred when the keys database did not have an inode
cache recorded for the annex object. Normally, the annex object would be in
there, but there are of course circumstances where the inode cache is out
of sync with reality, since it's only a cache.
Fixed by checking if the object is unmodified; if so we don't need to
try to repopulate it. This does add an additional checksum to the unlock
path, but it's already checksumming the worktree file in another case,
so it doesn't slow it down overall.
Further investigation found a similar problem occurred when smudge --clean
is called on a file and the inode cache is not populated. cleanOldKeys
deleted the unmodified old object file in this case. This was also
fixed by checking if the object is unmodified.
In general, use of getInodeCaches and sameInodeCache is potentially
dangerous if the inode cache has not gotten populated for some reason.
Better to use isUnmodified. I breifly auited other places that check the
inode cache, and did not see any immediate problems, but it would be easy
to miss this kind of problem.
2016-10-17 16:56:26 +00:00
ifM ( isUnmodified key obj )
2015-12-11 19:13:36 +00:00
( breakhardlink obj
2020-11-02 20:31:28 +00:00
, repopulate obj
2015-12-11 19:13:36 +00:00
)
2019-12-11 18:12:22 +00:00
whenM ( liftIO $ R . doesPathExist obj ) $
2020-11-06 18:10:58 +00:00
freezeContent obj
2015-12-11 19:13:36 +00:00
-- It's ok if the file is hard linked to obj, but if some other
-- associated file is, we need to break that link to lock down obj.
2019-12-11 18:12:22 +00:00
breakhardlink obj = whenM ( catchBoolIO $ ( > 1 ) . linkCount <$> liftIO ( R . getFileStatus obj ) ) $ do
mfc <- withTSDelta ( liftIO . genInodeCache file )
2015-12-11 19:13:36 +00:00
unlessM ( sameInodeCache obj ( maybeToList mfc ) ) $ do
2022-05-16 16:34:56 +00:00
modifyContentDir obj $ replaceGitAnnexDirFile ( fromRawFilePath obj ) $ \ tmp -> do
2020-11-06 18:10:58 +00:00
unlessM ( checkedCopyFile key obj ( toRawFilePath tmp ) Nothing ) $
2016-11-16 01:29:54 +00:00
giveup " unable to lock file "
2015-12-11 19:13:36 +00:00
Database . Keys . storeInodeCaches key [ obj ]
-- Try to repopulate obj from an unmodified associated file.
2022-05-16 16:34:56 +00:00
repopulate obj = modifyContentDir obj $ do
lock: Fix edge cases where data loss could occur in v6 mode.
In the case where the pointer file is in place, and not the content
of the object, lock's performNew was called with filemodified=True,
which caused it to try to repopulate the object from an unmodified
associated file, of which there were none. So, the content of the object
got thrown away incorrectly. This was the cause (although not the root
cause) of data loss in https://github.com/datalad/datalad/issues/1020
The same problem could also occur when the work tree file is modified,
but the object is not, and lock is called with --force. Added a test case
for this, since it's excercising the same code path and is easier to set up
than the problem above.
Note that this only occurred when the keys database did not have an inode
cache recorded for the annex object. Normally, the annex object would be in
there, but there are of course circumstances where the inode cache is out
of sync with reality, since it's only a cache.
Fixed by checking if the object is unmodified; if so we don't need to
try to repopulate it. This does add an additional checksum to the unlock
path, but it's already checksumming the worktree file in another case,
so it doesn't slow it down overall.
Further investigation found a similar problem occurred when smudge --clean
is called on a file and the inode cache is not populated. cleanOldKeys
deleted the unmodified old object file in this case. This was also
fixed by checking if the object is unmodified.
In general, use of getInodeCaches and sameInodeCache is potentially
dangerous if the inode cache has not gotten populated for some reason.
Better to use isUnmodified. I breifly auited other places that check the
inode cache, and did not see any immediate problems, but it would be easy
to miss this kind of problem.
2016-10-17 16:56:26 +00:00
g <- Annex . gitRepo
2019-12-11 18:12:22 +00:00
fs <- map ( ` fromTopFilePath ` g )
lock: Fix edge cases where data loss could occur in v6 mode.
In the case where the pointer file is in place, and not the content
of the object, lock's performNew was called with filemodified=True,
which caused it to try to repopulate the object from an unmodified
associated file, of which there were none. So, the content of the object
got thrown away incorrectly. This was the cause (although not the root
cause) of data loss in https://github.com/datalad/datalad/issues/1020
The same problem could also occur when the work tree file is modified,
but the object is not, and lock is called with --force. Added a test case
for this, since it's excercising the same code path and is easier to set up
than the problem above.
Note that this only occurred when the keys database did not have an inode
cache recorded for the annex object. Normally, the annex object would be in
there, but there are of course circumstances where the inode cache is out
of sync with reality, since it's only a cache.
Fixed by checking if the object is unmodified; if so we don't need to
try to repopulate it. This does add an additional checksum to the unlock
path, but it's already checksumming the worktree file in another case,
so it doesn't slow it down overall.
Further investigation found a similar problem occurred when smudge --clean
is called on a file and the inode cache is not populated. cleanOldKeys
deleted the unmodified old object file in this case. This was also
fixed by checking if the object is unmodified.
In general, use of getInodeCaches and sameInodeCache is potentially
dangerous if the inode cache has not gotten populated for some reason.
Better to use isUnmodified. I breifly auited other places that check the
inode cache, and did not see any immediate problems, but it would be easy
to miss this kind of problem.
2016-10-17 16:56:26 +00:00
<$> Database . Keys . getAssociatedFiles key
mfile <- firstM ( isUnmodified key ) fs
2020-11-02 20:31:28 +00:00
liftIO $ removeWhenExistsWith R . removeLink obj
lock: Fix edge cases where data loss could occur in v6 mode.
In the case where the pointer file is in place, and not the content
of the object, lock's performNew was called with filemodified=True,
which caused it to try to repopulate the object from an unmodified
associated file, of which there were none. So, the content of the object
got thrown away incorrectly. This was the cause (although not the root
cause) of data loss in https://github.com/datalad/datalad/issues/1020
The same problem could also occur when the work tree file is modified,
but the object is not, and lock is called with --force. Added a test case
for this, since it's excercising the same code path and is easier to set up
than the problem above.
Note that this only occurred when the keys database did not have an inode
cache recorded for the annex object. Normally, the annex object would be in
there, but there are of course circumstances where the inode cache is out
of sync with reality, since it's only a cache.
Fixed by checking if the object is unmodified; if so we don't need to
try to repopulate it. This does add an additional checksum to the unlock
path, but it's already checksumming the worktree file in another case,
so it doesn't slow it down overall.
Further investigation found a similar problem occurred when smudge --clean
is called on a file and the inode cache is not populated. cleanOldKeys
deleted the unmodified old object file in this case. This was also
fixed by checking if the object is unmodified.
In general, use of getInodeCaches and sameInodeCache is potentially
dangerous if the inode cache has not gotten populated for some reason.
Better to use isUnmodified. I breifly auited other places that check the
inode cache, and did not see any immediate problems, but it would be easy
to miss this kind of problem.
2016-10-17 16:56:26 +00:00
case mfile of
Just unmodified ->
2021-07-26 18:12:58 +00:00
ifM ( checkedCopyFile key unmodified obj Nothing )
( Database . Keys . storeInodeCaches key [ obj ]
, lostcontent
)
lock: Fix edge cases where data loss could occur in v6 mode.
In the case where the pointer file is in place, and not the content
of the object, lock's performNew was called with filemodified=True,
which caused it to try to repopulate the object from an unmodified
associated file, of which there were none. So, the content of the object
got thrown away incorrectly. This was the cause (although not the root
cause) of data loss in https://github.com/datalad/datalad/issues/1020
The same problem could also occur when the work tree file is modified,
but the object is not, and lock is called with --force. Added a test case
for this, since it's excercising the same code path and is easier to set up
than the problem above.
Note that this only occurred when the keys database did not have an inode
cache recorded for the annex object. Normally, the annex object would be in
there, but there are of course circumstances where the inode cache is out
of sync with reality, since it's only a cache.
Fixed by checking if the object is unmodified; if so we don't need to
try to repopulate it. This does add an additional checksum to the unlock
path, but it's already checksumming the worktree file in another case,
so it doesn't slow it down overall.
Further investigation found a similar problem occurred when smudge --clean
is called on a file and the inode cache is not populated. cleanOldKeys
deleted the unmodified old object file in this case. This was also
fixed by checking if the object is unmodified.
In general, use of getInodeCaches and sameInodeCache is potentially
dangerous if the inode cache has not gotten populated for some reason.
Better to use isUnmodified. I breifly auited other places that check the
inode cache, and did not see any immediate problems, but it would be easy
to miss this kind of problem.
2016-10-17 16:56:26 +00:00
Nothing -> lostcontent
2015-12-11 19:13:36 +00:00
lostcontent = logStatus key InfoMissing
2015-12-11 14:42:18 +00:00
errorModified :: a
2016-11-16 01:29:54 +00:00
errorModified = giveup " Locking this file would discard any changes you have made to it. Use 'git annex add' to stage your changes. (Or, use --force to override) "