include locked files in the keys database associated files
Before only unlocked files were included. The initial scan now scans for locked as well as unlocked files. This does mean it gets a little bit slower, although I optimised it as well as I think it can be. reconcileStaged changed to diff from the current index to the tree of the previous index. This lets it handle deletions as well, removing associated files for both locked and unlocked files, which did not always happen before. On upgrade, there will be no recorded previous tree, so it will diff from the empty tree to current index, and so will fully populate the associated files, as well as removing any stale associated files that were present due to them not being removed before. reconcileStaged now does a bit more work. Most of the time, this will just be due to running more often, after some change is made to the index, and since there will be few changes since the last time, it will not be a noticable overhead. What may turn out to be a noticable slowdown is after changing to a branch, it has to go through the diff from the previous index to the new one, and if there are lots of changes, that could take a long time. Also, after adding a lot of files, or deleting a lot of files, or moving a large subdirectory, etc. Command.Lock used removeAssociatedFile, but now that's wrong because a newly locked file still needs to have its associated file tracked. Command.Rekey used removeAssociatedFile when the file was unlocked. It could remove it also when it's locked, but it is not really necessary, because it changes the index, and so the next time git-annex run and accesses the keys db, reconcileStaged will run and update it. There are probably several other places that use addAssociatedFile and don't need to any more for similar reasons. But there's no harm in keeping them, and it probably is a good idea to, if only to support mixing this with older versions of git-annex. However, mixing this and older versions does risk reconcileStaged not running, if the older version already ran it on a given index state. So it's not a good idea to mix versions. This problem could be dealt with by changing the name of the gitAnnexKeysDbIndexCache, but that would leave the old file dangling, or it would need to keep trying to remove it.
This commit is contained in:
parent
df0b75cdc4
commit
428c91606b
8 changed files with 81 additions and 67 deletions
|
@ -134,8 +134,8 @@ initialize' mversion = checkInitializeAllowed $ do
|
||||||
else deconfigureSmudgeFilter
|
else deconfigureSmudgeFilter
|
||||||
unlessM isBareRepo $ do
|
unlessM isBareRepo $ do
|
||||||
when supportunlocked $ do
|
when supportunlocked $ do
|
||||||
showSideAction "scanning for unlocked files"
|
showSideAction "scanning for annexed files"
|
||||||
scanUnlockedFiles
|
scanAnnexedFiles
|
||||||
hookWrite postCheckoutHook
|
hookWrite postCheckoutHook
|
||||||
hookWrite postMergeHook
|
hookWrite postMergeHook
|
||||||
AdjustedBranch.checkAdjustedClone >>= \case
|
AdjustedBranch.checkAdjustedClone >>= \case
|
||||||
|
|
|
@ -66,19 +66,19 @@ whenAnnexed a file = ifAnnexed file (a file) (return Nothing)
|
||||||
ifAnnexed :: RawFilePath -> (Key -> Annex a) -> Annex a -> Annex a
|
ifAnnexed :: RawFilePath -> (Key -> Annex a) -> Annex a -> Annex a
|
||||||
ifAnnexed file yes no = maybe no yes =<< lookupKey file
|
ifAnnexed file yes no = maybe no yes =<< lookupKey file
|
||||||
|
|
||||||
{- Find all unlocked files and update the keys database for them.
|
{- Find all annexed files and update the keys database for them.
|
||||||
-
|
-
|
||||||
- This is expensive, and so normally the associated files are updated
|
- This is expensive, and so normally the associated files are updated
|
||||||
- incrementally when changes are noticed. So, this only needs to be done
|
- incrementally when changes are noticed. So, this only needs to be done
|
||||||
- when initializing/upgrading repository.
|
- when initializing/upgrading a repository.
|
||||||
-
|
-
|
||||||
- Also, the content for the unlocked file may already be present as
|
- Also, the content for an unlocked file may already be present as
|
||||||
- an annex object. If so, populate the pointer file with it.
|
- an annex object. If so, populate the pointer file with it.
|
||||||
- But if worktree file does not have a pointer file's content, it is left
|
- But if worktree file does not have a pointer file's content, it is left
|
||||||
- as-is.
|
- as-is.
|
||||||
-}
|
-}
|
||||||
scanUnlockedFiles :: Annex ()
|
scanAnnexedFiles :: Annex ()
|
||||||
scanUnlockedFiles = whenM (inRepo Git.Ref.headExists <&&> not <$> isBareRepo) $ do
|
scanAnnexedFiles = whenM (inRepo Git.Ref.headExists <&&> not <$> isBareRepo) $ do
|
||||||
dropold <- liftIO $ newMVar $
|
dropold <- liftIO $ newMVar $
|
||||||
Database.Keys.runWriter $
|
Database.Keys.runWriter $
|
||||||
liftIO . Database.Keys.SQL.dropAllAssociatedFiles
|
liftIO . Database.Keys.SQL.dropAllAssociatedFiles
|
||||||
|
@ -87,9 +87,10 @@ scanUnlockedFiles = whenM (inRepo Git.Ref.headExists <&&> not <$> isBareRepo) $
|
||||||
(Git.LsTree.LsTreeLong False)
|
(Git.LsTree.LsTreeLong False)
|
||||||
Git.Ref.headRef
|
Git.Ref.headRef
|
||||||
forM_ l $ \i ->
|
forM_ l $ \i ->
|
||||||
when (isregfile i) $
|
maybe noop (add dropold i)
|
||||||
maybe noop (add dropold i)
|
=<< catKey'
|
||||||
=<< catKey (Git.LsTree.sha i)
|
(Git.LsTree.sha i)
|
||||||
|
(fromMaybe 0 (Git.LsTree.size i))
|
||||||
liftIO $ void cleanup
|
liftIO $ void cleanup
|
||||||
where
|
where
|
||||||
isregfile i = case Git.Types.toTreeItemType (Git.LsTree.mode i) of
|
isregfile i = case Git.Types.toTreeItemType (Git.LsTree.mode i) of
|
||||||
|
@ -101,7 +102,7 @@ scanUnlockedFiles = whenM (inRepo Git.Ref.headExists <&&> not <$> isBareRepo) $
|
||||||
let tf = Git.LsTree.file i
|
let tf = Git.LsTree.file i
|
||||||
Database.Keys.runWriter $
|
Database.Keys.runWriter $
|
||||||
liftIO . Database.Keys.SQL.addAssociatedFileFast k tf
|
liftIO . Database.Keys.SQL.addAssociatedFileFast k tf
|
||||||
whenM (inAnnex k) $ do
|
whenM (pure (isregfile i) <&&> inAnnex k) $ do
|
||||||
f <- fromRepo $ fromTopFilePath tf
|
f <- fromRepo $ fromTopFilePath tf
|
||||||
liftIO (isPointerFile f) >>= \case
|
liftIO (isPointerFile f) >>= \case
|
||||||
Just k' | k' == k -> do
|
Just k' | k' == k -> do
|
||||||
|
|
|
@ -62,7 +62,7 @@ perform file key = do
|
||||||
lockdown =<< calcRepo (gitAnnexLocation key)
|
lockdown =<< calcRepo (gitAnnexLocation key)
|
||||||
addLink (CheckGitIgnore False) file key
|
addLink (CheckGitIgnore False) file key
|
||||||
=<< withTSDelta (liftIO . genInodeCache file)
|
=<< withTSDelta (liftIO . genInodeCache file)
|
||||||
next $ cleanup file key
|
next $ return True
|
||||||
where
|
where
|
||||||
lockdown obj = do
|
lockdown obj = do
|
||||||
ifM (isUnmodified key obj)
|
ifM (isUnmodified key obj)
|
||||||
|
@ -97,10 +97,5 @@ perform file key = do
|
||||||
|
|
||||||
lostcontent = logStatus key InfoMissing
|
lostcontent = logStatus key InfoMissing
|
||||||
|
|
||||||
cleanup :: RawFilePath -> Key -> CommandCleanup
|
|
||||||
cleanup file key = do
|
|
||||||
Database.Keys.removeAssociatedFile key =<< inRepo (toTopFilePath file)
|
|
||||||
return True
|
|
||||||
|
|
||||||
errorModified :: a
|
errorModified :: a
|
||||||
errorModified = giveup "Locking this file would discard any changes you have made to it. Use 'git annex add' to stage your changes. (Or, use --force to override)"
|
errorModified = giveup "Locking this file would discard any changes you have made to it. Use 'git annex add' to stage your changes. (Or, use --force to override)"
|
||||||
|
|
|
@ -86,7 +86,7 @@ perform file oldkey oldbackend newbackend = go =<< genkey (fastMigrate oldbacken
|
||||||
urls <- getUrls oldkey
|
urls <- getUrls oldkey
|
||||||
forM_ urls $ \url ->
|
forM_ urls $ \url ->
|
||||||
setUrlPresent newkey url
|
setUrlPresent newkey url
|
||||||
next $ Command.ReKey.cleanup file oldkey newkey
|
next $ Command.ReKey.cleanup file newkey
|
||||||
, giveup "failed creating link from old to new key"
|
, giveup "failed creating link from old to new key"
|
||||||
)
|
)
|
||||||
genkey Nothing = do
|
genkey Nothing = do
|
||||||
|
|
|
@ -15,8 +15,6 @@ import Annex.Link
|
||||||
import Annex.Perms
|
import Annex.Perms
|
||||||
import Annex.ReplaceFile
|
import Annex.ReplaceFile
|
||||||
import Logs.Location
|
import Logs.Location
|
||||||
import Git.FilePath
|
|
||||||
import qualified Database.Keys
|
|
||||||
import Annex.InodeSentinal
|
import Annex.InodeSentinal
|
||||||
import Utility.InodeCache
|
import Utility.InodeCache
|
||||||
import qualified Utility.RawFilePath as R
|
import qualified Utility.RawFilePath as R
|
||||||
|
@ -79,7 +77,7 @@ perform file oldkey newkey = do
|
||||||
, unlessM (Annex.getState Annex.force) $
|
, unlessM (Annex.getState Annex.force) $
|
||||||
giveup $ fromRawFilePath file ++ " is not available (use --force to override)"
|
giveup $ fromRawFilePath file ++ " is not available (use --force to override)"
|
||||||
)
|
)
|
||||||
next $ cleanup file oldkey newkey
|
next $ cleanup file newkey
|
||||||
|
|
||||||
{- Make a hard link to the old key content (when supported),
|
{- Make a hard link to the old key content (when supported),
|
||||||
- to avoid wasting disk space. -}
|
- to avoid wasting disk space. -}
|
||||||
|
@ -119,8 +117,8 @@ linkKey file oldkey newkey = ifM (isJust <$> isAnnexLink file)
|
||||||
LinkAnnexNoop -> True
|
LinkAnnexNoop -> True
|
||||||
)
|
)
|
||||||
|
|
||||||
cleanup :: RawFilePath -> Key -> Key -> CommandCleanup
|
cleanup :: RawFilePath -> Key -> CommandCleanup
|
||||||
cleanup file oldkey newkey = do
|
cleanup file newkey = do
|
||||||
ifM (isJust <$> isAnnexLink file)
|
ifM (isJust <$> isAnnexLink file)
|
||||||
( do
|
( do
|
||||||
-- Update symlink to use the new key.
|
-- Update symlink to use the new key.
|
||||||
|
@ -131,8 +129,6 @@ cleanup file oldkey newkey = do
|
||||||
liftIO $ whenM (isJust <$> isPointerFile file) $
|
liftIO $ whenM (isJust <$> isPointerFile file) $
|
||||||
writePointerFile file newkey mode
|
writePointerFile file newkey mode
|
||||||
stagePointerFile file mode =<< hashPointerFile newkey
|
stagePointerFile file mode =<< hashPointerFile newkey
|
||||||
Database.Keys.removeAssociatedFile oldkey
|
|
||||||
=<< inRepo (toTopFilePath file)
|
|
||||||
)
|
)
|
||||||
whenM (inAnnex newkey) $
|
whenM (inAnnex newkey) $
|
||||||
logStatus newkey InfoPresent
|
logStatus newkey InfoPresent
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
{- Sqlite database of information about Keys
|
{- Sqlite database of information about Keys
|
||||||
-
|
-
|
||||||
- Copyright 2015-2019 Joey Hess <id@joeyh.name>
|
- Copyright 2015-2021 Joey Hess <id@joeyh.name>
|
||||||
-
|
-
|
||||||
- Licensed under the GNU AGPL version 3 or higher.
|
- Licensed under the GNU AGPL version 3 or higher.
|
||||||
-}
|
-}
|
||||||
|
@ -44,6 +44,9 @@ import Git.FilePath
|
||||||
import Git.Command
|
import Git.Command
|
||||||
import Git.Types
|
import Git.Types
|
||||||
import Git.Index
|
import Git.Index
|
||||||
|
import Git.Sha
|
||||||
|
import Git.Branch (writeTree, update')
|
||||||
|
import qualified Git.Ref
|
||||||
import Config.Smudge
|
import Config.Smudge
|
||||||
import qualified Utility.RawFilePath as R
|
import qualified Utility.RawFilePath as R
|
||||||
|
|
||||||
|
@ -191,20 +194,17 @@ removeInodeCache = runWriterIO . SQL.removeInodeCache
|
||||||
isInodeKnown :: InodeCache -> SentinalStatus -> Annex Bool
|
isInodeKnown :: InodeCache -> SentinalStatus -> Annex Bool
|
||||||
isInodeKnown i s = or <$> runReaderIO ((:[]) <$$> SQL.isInodeKnown i s)
|
isInodeKnown i s = or <$> runReaderIO ((:[]) <$$> SQL.isInodeKnown i s)
|
||||||
|
|
||||||
{- Looks at staged changes to find when unlocked files are copied/moved,
|
{- Looks at staged changes to annexed files, and updates the keys database,
|
||||||
- and updates associated files in the keys database.
|
- so that its information is consistent with the state of the repository.
|
||||||
-
|
-
|
||||||
- Since staged changes can be dropped later, does not remove any
|
- This is run with a lock held, so only one process can be running this at
|
||||||
- associated files; only adds new associated files.
|
- a time.
|
||||||
-
|
|
||||||
- This needs to be run before querying the keys database so that
|
|
||||||
- information is consistent with the state of the repository.
|
|
||||||
-
|
-
|
||||||
- To avoid unncessary work, the index file is statted, and if it's not
|
- To avoid unncessary work, the index file is statted, and if it's not
|
||||||
- changed since last time this was run, nothing is done.
|
- changed since last time this was run, nothing is done.
|
||||||
-
|
-
|
||||||
- Note that this is run with a lock held, so only one process can be
|
- A tree is generated from the index, and the diff between that tree
|
||||||
- running this at a time.
|
- and the last processed tree is examined for changes.
|
||||||
-
|
-
|
||||||
- This also cleans up after a race between eg a git mv and git-annex
|
- This also cleans up after a race between eg a git mv and git-annex
|
||||||
- get/drop/similar. If git moves the file between this being run and the
|
- get/drop/similar. If git moves the file between this being run and the
|
||||||
|
@ -233,17 +233,28 @@ reconcileStaged qh = do
|
||||||
)
|
)
|
||||||
Nothing -> noop
|
Nothing -> noop
|
||||||
where
|
where
|
||||||
|
lastindexref = Ref "refs/annex/last-index"
|
||||||
|
|
||||||
go cur indexcache = do
|
go cur indexcache = do
|
||||||
(l, cleanup) <- inRepo $ pipeNullSplit' diff
|
oldtree <- fromMaybe emptyTree
|
||||||
changed <- procdiff l False
|
<$> inRepo (Git.Ref.sha lastindexref)
|
||||||
void $ liftIO cleanup
|
newtree <- inRepo writeTree
|
||||||
-- Flush database changes immediately
|
when (oldtree /= newtree) $ do
|
||||||
-- so other processes can see them.
|
(l, cleanup) <- inRepo $ pipeNullSplit' $
|
||||||
when changed $
|
diff oldtree newtree
|
||||||
liftIO $ H.flushDbQueue qh
|
changed <- procdiff l False
|
||||||
liftIO $ writeFile indexcache $ showInodeCache cur
|
void $ liftIO cleanup
|
||||||
|
-- Flush database changes immediately
|
||||||
|
-- so other processes can see them.
|
||||||
|
when changed $
|
||||||
|
liftIO $ H.flushDbQueue qh
|
||||||
|
liftIO $ writeFile indexcache $ showInodeCache cur
|
||||||
|
-- Storing the tree in a ref makes sure it does not
|
||||||
|
-- get garbage collected, and is available to diff
|
||||||
|
-- against next time.
|
||||||
|
inRepo $ update' lastindexref newtree
|
||||||
|
|
||||||
diff =
|
diff oldtree newtree =
|
||||||
-- Avoid running smudge or clean filters, since we want the
|
-- Avoid running smudge or clean filters, since we want the
|
||||||
-- raw output, and they would block trying to access the
|
-- raw output, and they would block trying to access the
|
||||||
-- locked database. The --raw normally avoids git diff
|
-- locked database. The --raw normally avoids git diff
|
||||||
|
@ -253,43 +264,49 @@ reconcileStaged qh = do
|
||||||
-- (The -G option may make it be used otherwise.)
|
-- (The -G option may make it be used otherwise.)
|
||||||
[ Param "-c", Param "diff.external="
|
[ Param "-c", Param "diff.external="
|
||||||
, Param "diff"
|
, Param "diff"
|
||||||
, Param "--cached"
|
|
||||||
, Param "--raw"
|
, Param "--raw"
|
||||||
, Param "-z"
|
, Param "-z"
|
||||||
, Param "--no-abbrev"
|
, Param "--no-abbrev"
|
||||||
-- Optimization: Only find pointer files. This is not
|
-- Optimization: Limit to pointer files and annex symlinks.
|
||||||
-- perfect. A file could start with this and not be a
|
-- This is not perfect. A file could contain with this and not
|
||||||
-- pointer file. And a pointer file that is replaced with
|
-- be a pointer file. And a pointer file that is replaced with
|
||||||
-- a non-pointer file will match this.
|
-- a non-pointer file will match this. This is only a
|
||||||
, Param $ "-G^" ++ fromRawFilePath (toInternalGitPath $
|
-- prefilter so that's ok.
|
||||||
|
, Param $ "-G" ++ fromRawFilePath (toInternalGitPath $
|
||||||
P.pathSeparator `S.cons` objectDir')
|
P.pathSeparator `S.cons` objectDir')
|
||||||
-- Don't include files that were deleted, because this only
|
|
||||||
-- wants to update information for files that are present
|
|
||||||
-- in the index.
|
|
||||||
, Param "--diff-filter=AMUT"
|
|
||||||
-- Disable rename detection.
|
-- Disable rename detection.
|
||||||
, Param "--no-renames"
|
, Param "--no-renames"
|
||||||
-- Avoid other complications.
|
-- Avoid other complications.
|
||||||
, Param "--ignore-submodules=all"
|
, Param "--ignore-submodules=all"
|
||||||
, Param "--no-ext-diff"
|
, Param "--no-ext-diff"
|
||||||
|
, Param (fromRef oldtree)
|
||||||
|
, Param (fromRef newtree)
|
||||||
]
|
]
|
||||||
|
|
||||||
procdiff (info:file:rest) changed
|
procdiff (info:file:rest) changed
|
||||||
| ":" `S.isPrefixOf` info = case S8.words info of
|
| ":" `S.isPrefixOf` info = case S8.words info of
|
||||||
(_colonsrcmode:dstmode:_srcsha:dstsha:_change:[])
|
(_colonsrcmode:dstmode:srcsha:dstsha:_change:[]) -> do
|
||||||
-- Only want files, not symlinks
|
removed <- catKey (Ref srcsha) >>= \case
|
||||||
| dstmode /= fmtTreeItemType TreeSymlink -> do
|
Just oldkey -> do
|
||||||
maybe noop (reconcile (asTopFilePath file))
|
liftIO $ SQL.removeAssociatedFile oldkey
|
||||||
=<< catKey (Ref dstsha)
|
(asTopFilePath file)
|
||||||
procdiff rest True
|
(SQL.WriteHandle qh)
|
||||||
| otherwise -> procdiff rest changed
|
return True
|
||||||
|
Nothing -> return False
|
||||||
|
added <- catKey (Ref dstsha) >>= \case
|
||||||
|
Just key -> do
|
||||||
|
liftIO $ SQL.addAssociatedFile key
|
||||||
|
(asTopFilePath file)
|
||||||
|
(SQL.WriteHandle qh)
|
||||||
|
when (dstmode /= fmtTreeItemType TreeSymlink) $
|
||||||
|
reconcilerace (asTopFilePath file) key
|
||||||
|
return True
|
||||||
|
Nothing -> return False
|
||||||
|
procdiff rest (changed || removed || added)
|
||||||
_ -> return changed -- parse failed
|
_ -> return changed -- parse failed
|
||||||
procdiff _ changed = return changed
|
procdiff _ changed = return changed
|
||||||
|
|
||||||
-- Note that database writes done in here will not necessarily
|
reconcilerace file key = do
|
||||||
-- be visible to database reads also done in here.
|
|
||||||
reconcile file key = do
|
|
||||||
liftIO $ SQL.addAssociatedFileFast key file (SQL.WriteHandle qh)
|
|
||||||
caches <- liftIO $ SQL.getInodeCaches key (SQL.ReadHandle qh)
|
caches <- liftIO $ SQL.getInodeCaches key (SQL.ReadHandle qh)
|
||||||
keyloc <- calcRepo (gitAnnexLocation key)
|
keyloc <- calcRepo (gitAnnexLocation key)
|
||||||
keypopulated <- sameInodeCache keyloc caches
|
keypopulated <- sameInodeCache keyloc caches
|
||||||
|
|
|
@ -47,7 +47,7 @@ upgrade automatic = flip catchNonAsync onexception $ do
|
||||||
, do
|
, do
|
||||||
checkGitVersionForIndirectUpgrade
|
checkGitVersionForIndirectUpgrade
|
||||||
)
|
)
|
||||||
scanUnlockedFiles
|
scanAnnexedFiles
|
||||||
configureSmudgeFilter
|
configureSmudgeFilter
|
||||||
-- Inode sentinal file was only used in direct mode and when
|
-- Inode sentinal file was only used in direct mode and when
|
||||||
-- locking down files as they were added. In v6, it's used more
|
-- locking down files as they were added. In v6, it's used more
|
||||||
|
|
|
@ -9,7 +9,12 @@ If most of the files are locked, that would actually make the scan
|
||||||
somewhere around twice as slow as it currently is. So not a worthwhile
|
somewhere around twice as slow as it currently is. So not a worthwhile
|
||||||
optimisation.
|
optimisation.
|
||||||
|
|
||||||
And I don't see much else there that could be optimised. Possibly the
|
Update: Now that the scan also scans for locked files to make the
|
||||||
|
associated files include information about them, the catKey optimisation
|
||||||
|
did make sense. Unfortunately, that does mean this scan got a little bit
|
||||||
|
slower still, since it has to use git ls-tree --long.
|
||||||
|
|
||||||
|
I don't see much else there that could be optimised. Possibly the
|
||||||
ls-tree parser could be made faster but it's already using attoparsec
|
ls-tree parser could be made faster but it's already using attoparsec
|
||||||
so unlikely to be many gains.
|
so unlikely to be many gains.
|
||||||
"""]]
|
"""]]
|
||||||
|
|
Loading…
Reference in a new issue