cache negative lookups of global numcopies and mincopies
Speeds up eg git-annex sync --content by up to 50%. When it does not need to transfer or drop anything, it now noops a lot more quickly. I didn't see anything else in sync --content noop loop that could really be sped up. It has to cat git objects to keys, stat object files, etc. Sponsored-by: unqueued on Patreon
This commit is contained in:
parent
4437e187e6
commit
3c15e0f7a0
5 changed files with 38 additions and 6 deletions
4
Annex.hs
4
Annex.hs
|
@ -183,8 +183,8 @@ data AnnexState = AnnexState
|
||||||
, hashobjecthandle :: Maybe (ResourcePool HashObjectHandle)
|
, hashobjecthandle :: Maybe (ResourcePool HashObjectHandle)
|
||||||
, checkattrhandle :: Maybe (ResourcePool CheckAttrHandle)
|
, checkattrhandle :: Maybe (ResourcePool CheckAttrHandle)
|
||||||
, checkignorehandle :: Maybe (ResourcePool CheckIgnoreHandle)
|
, checkignorehandle :: Maybe (ResourcePool CheckIgnoreHandle)
|
||||||
, globalnumcopies :: Maybe NumCopies
|
, globalnumcopies :: Maybe (Maybe NumCopies)
|
||||||
, globalmincopies :: Maybe MinCopies
|
, globalmincopies :: Maybe (Maybe MinCopies)
|
||||||
, limit :: ExpandableMatcher Annex
|
, limit :: ExpandableMatcher Annex
|
||||||
, timelimit :: Maybe (Duration, POSIXTime)
|
, timelimit :: Maybe (Duration, POSIXTime)
|
||||||
, sizelimit :: Maybe (TVar Integer)
|
, sizelimit :: Maybe (TVar Integer)
|
||||||
|
|
|
@ -79,6 +79,8 @@ git-annex (10.20230408) UNRELEASED; urgency=medium
|
||||||
* Large speed up to importing trees from special remotes that contain a lot
|
* Large speed up to importing trees from special remotes that contain a lot
|
||||||
of files, by only processing changed files.
|
of files, by only processing changed files.
|
||||||
* Some other speedups to importing trees from special remotes.
|
* Some other speedups to importing trees from special remotes.
|
||||||
|
* Cache negative lookups of global numcopies and mincopies.
|
||||||
|
Speeds up eg git-annex sync --content by up to 50%.
|
||||||
|
|
||||||
-- Joey Hess <id@joeyh.name> Sat, 08 Apr 2023 13:57:18 -0400
|
-- Joey Hess <id@joeyh.name> Sat, 08 Apr 2023 13:57:18 -0400
|
||||||
|
|
||||||
|
|
|
@ -45,22 +45,22 @@ setGlobalMinCopies new = do
|
||||||
|
|
||||||
{- Value configured in the numcopies log. Cached for speed. -}
|
{- Value configured in the numcopies log. Cached for speed. -}
|
||||||
getGlobalNumCopies :: Annex (Maybe NumCopies)
|
getGlobalNumCopies :: Annex (Maybe NumCopies)
|
||||||
getGlobalNumCopies = maybe globalNumCopiesLoad (return . Just)
|
getGlobalNumCopies = maybe globalNumCopiesLoad return
|
||||||
=<< Annex.getState Annex.globalnumcopies
|
=<< Annex.getState Annex.globalnumcopies
|
||||||
|
|
||||||
{- Value configured in the mincopies log. Cached for speed. -}
|
{- Value configured in the mincopies log. Cached for speed. -}
|
||||||
getGlobalMinCopies :: Annex (Maybe MinCopies)
|
getGlobalMinCopies :: Annex (Maybe MinCopies)
|
||||||
getGlobalMinCopies = maybe globalMinCopiesLoad (return . Just)
|
getGlobalMinCopies = maybe globalMinCopiesLoad return
|
||||||
=<< Annex.getState Annex.globalmincopies
|
=<< Annex.getState Annex.globalmincopies
|
||||||
|
|
||||||
globalNumCopiesLoad :: Annex (Maybe NumCopies)
|
globalNumCopiesLoad :: Annex (Maybe NumCopies)
|
||||||
globalNumCopiesLoad = do
|
globalNumCopiesLoad = do
|
||||||
v <- getLog numcopiesLog
|
v <- getLog numcopiesLog
|
||||||
Annex.changeState $ \s -> s { Annex.globalnumcopies = v }
|
Annex.changeState $ \s -> s { Annex.globalnumcopies = Just v }
|
||||||
return v
|
return v
|
||||||
|
|
||||||
globalMinCopiesLoad :: Annex (Maybe MinCopies)
|
globalMinCopiesLoad :: Annex (Maybe MinCopies)
|
||||||
globalMinCopiesLoad = do
|
globalMinCopiesLoad = do
|
||||||
v <- getLog mincopiesLog
|
v <- getLog mincopiesLog
|
||||||
Annex.changeState $ \s -> s { Annex.globalmincopies = v }
|
Annex.changeState $ \s -> s { Annex.globalmincopies = Just v }
|
||||||
return v
|
return v
|
||||||
|
|
|
@ -0,0 +1,12 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 14"""
|
||||||
|
date="2023-06-06T17:11:35Z"
|
||||||
|
content="""
|
||||||
|
There's only one import in the sync, and your output shows it completed
|
||||||
|
(with error).
|
||||||
|
|
||||||
|
The only other phase of sync that could be run after that and take a lot of
|
||||||
|
time is content syncing. You would have to have annex.synccontent set
|
||||||
|
somewhere for sync to do that. Do you?
|
||||||
|
"""]]
|
|
@ -0,0 +1,18 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 15"""
|
||||||
|
date="2023-06-06T17:31:49Z"
|
||||||
|
content="""
|
||||||
|
It would make a lot of sense for --content syncing to be what remains slow.
|
||||||
|
That has to scan over all the files and when it decides that it does not
|
||||||
|
need to copy the content anywhere, that's a tight loop with no output.
|
||||||
|
|
||||||
|
In my repo with 10000 files that was set up by the latest test case,
|
||||||
|
`git-annex sync` takes 13 seconds, and with --content it takes 61 seconds.
|
||||||
|
|
||||||
|
I optimised a numcopies/mincopies lookup away, and that got it
|
||||||
|
down to 28 seconds.
|
||||||
|
|
||||||
|
The cidsdb does not get accessed by the --content scan
|
||||||
|
in my testing, although there may be other situations where it does.
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue