sped up git-annex smudge --clean by 25%
Disabling git-annex branch update for this command is ok, because it does not use any information from the branch, but only logs the location when it adds a key. Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
parent
e47b4badb3
commit
9ea8106bb0
4 changed files with 17 additions and 2 deletions
|
@ -10,6 +10,7 @@ git-annex (8.20210904) UNRELEASED; urgency=medium
|
||||||
retrieving from a borg repository.
|
retrieving from a borg repository.
|
||||||
* Resume where it left off when copying a file to/from a local git remote
|
* Resume where it left off when copying a file to/from a local git remote
|
||||||
was interrupted.
|
was interrupted.
|
||||||
|
* Sped up git-annex smudge --clean by 25%.
|
||||||
|
|
||||||
-- Joey Hess <id@joeyh.name> Fri, 03 Sep 2021 12:02:55 -0400
|
-- Joey Hess <id@joeyh.name> Fri, 03 Sep 2021 12:02:55 -0400
|
||||||
|
|
||||||
|
|
|
@ -30,6 +30,7 @@ import Annex.InodeSentinal
|
||||||
import Utility.InodeCache
|
import Utility.InodeCache
|
||||||
import Config.GitConfig
|
import Config.GitConfig
|
||||||
import qualified Types.Backend
|
import qualified Types.Backend
|
||||||
|
import qualified Annex.BranchState
|
||||||
|
|
||||||
import qualified Data.ByteString as S
|
import qualified Data.ByteString as S
|
||||||
import qualified Data.ByteString.Lazy as L
|
import qualified Data.ByteString.Lazy as L
|
||||||
|
@ -87,6 +88,7 @@ smudge file = do
|
||||||
-- injested content if so. Otherwise, the original content.
|
-- injested content if so. Otherwise, the original content.
|
||||||
clean :: RawFilePath -> CommandStart
|
clean :: RawFilePath -> CommandStart
|
||||||
clean file = do
|
clean file = do
|
||||||
|
Annex.BranchState.disableUpdate -- optimisation
|
||||||
b <- liftIO $ L.hGetContents stdin
|
b <- liftIO $ L.hGetContents stdin
|
||||||
ifM fileoutsiderepo
|
ifM fileoutsiderepo
|
||||||
( liftIO $ L.hPut stdout b
|
( liftIO $ L.hPut stdout b
|
||||||
|
|
|
@ -13,8 +13,8 @@ The middle is slightly an outlier, and it would be better to have more data
|
||||||
points, but what this says to me is it's probably around 38x more expensive
|
points, but what this says to me is it's probably around 38x more expensive
|
||||||
on windows than on linux for git-annex smudge --clean to run.
|
on windows than on linux for git-annex smudge --clean to run.
|
||||||
|
|
||||||
git-annex smudge --clean makes on the order of 3000 syscalls, including
|
git-annex smudge --clean makes on the order of 4000 syscalls, including
|
||||||
opening 200 files, execing git 30 times, and statting 400 files. That's
|
opening 200 files, execing git 8 times, and statting 500 files. That's
|
||||||
around 10x as many syscalls as git add makes. And it's run once per file. So
|
around 10x as many syscalls as git add makes. And it's run once per file. So
|
||||||
relatively small differences in syscall performance between windows and
|
relatively small differences in syscall performance between windows and
|
||||||
linux can add up.
|
linux can add up.
|
||||||
|
|
|
@ -0,0 +1,12 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 6"""
|
||||||
|
date="2021-09-23T17:48:19Z"
|
||||||
|
content="""
|
||||||
|
I noticed in the strace that smudge --clean ran git cat-file 2
|
||||||
|
more times than necessary. Also was able to avoid updating the git-annex
|
||||||
|
branch, which eliminates several calls to git (depending on the number of
|
||||||
|
remotes). On Linux, this made it 25% faster. Might be more on Windows.
|
||||||
|
|
||||||
|
Rest of the strace looks clean, nothing else stands out as unncessary.
|
||||||
|
"""]]
|
Loading…
Add table
Reference in a new issue