sped up git-annex smudge --clean by 25%
Disabling git-annex branch update for this command is ok, because it does not use any information from the branch, but only logs the location when it adds a key. Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
parent
e47b4badb3
commit
9ea8106bb0
4 changed files with 17 additions and 2 deletions
|
@ -10,6 +10,7 @@ git-annex (8.20210904) UNRELEASED; urgency=medium
|
|||
retrieving from a borg repository.
|
||||
* Resume where it left off when copying a file to/from a local git remote
|
||||
was interrupted.
|
||||
* Sped up git-annex smudge --clean by 25%.
|
||||
|
||||
-- Joey Hess <id@joeyh.name> Fri, 03 Sep 2021 12:02:55 -0400
|
||||
|
||||
|
|
|
@ -30,6 +30,7 @@ import Annex.InodeSentinal
|
|||
import Utility.InodeCache
|
||||
import Config.GitConfig
|
||||
import qualified Types.Backend
|
||||
import qualified Annex.BranchState
|
||||
|
||||
import qualified Data.ByteString as S
|
||||
import qualified Data.ByteString.Lazy as L
|
||||
|
@ -87,6 +88,7 @@ smudge file = do
|
|||
-- injested content if so. Otherwise, the original content.
|
||||
clean :: RawFilePath -> CommandStart
|
||||
clean file = do
|
||||
Annex.BranchState.disableUpdate -- optimisation
|
||||
b <- liftIO $ L.hGetContents stdin
|
||||
ifM fileoutsiderepo
|
||||
( liftIO $ L.hPut stdout b
|
||||
|
|
|
@ -13,8 +13,8 @@ The middle is slightly an outlier, and it would be better to have more data
|
|||
points, but what this says to me is it's probably around 38x more expensive
|
||||
on windows than on linux for git-annex smudge --clean to run.
|
||||
|
||||
git-annex smudge --clean makes on the order of 3000 syscalls, including
|
||||
opening 200 files, execing git 30 times, and statting 400 files. That's
|
||||
git-annex smudge --clean makes on the order of 4000 syscalls, including
|
||||
opening 200 files, execing git 8 times, and statting 500 files. That's
|
||||
around 10x as many syscalls as git add makes. And it's run once per file. So
|
||||
relatively small differences in syscall performance between windows and
|
||||
linux can add up.
|
||||
|
|
|
@ -0,0 +1,12 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 6"""
|
||||
date="2021-09-23T17:48:19Z"
|
||||
content="""
|
||||
I noticed in the strace that smudge --clean ran git cat-file 2
|
||||
more times than necessary. Also was able to avoid updating the git-annex
|
||||
branch, which eliminates several calls to git (depending on the number of
|
||||
remotes). On Linux, this made it 25% faster. Might be more on Windows.
|
||||
|
||||
Rest of the strace looks clean, nothing else stands out as unncessary.
|
||||
"""]]
|
Loading…
Add table
Reference in a new issue