sped up git-annex smudge --clean by 25%

Disabling git-annex branch update for this command is
ok, because it does not use any information from the branch,
but only logs the location when it adds a key.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2021-09-24 14:15:20 -04:00
parent e47b4badb3
commit 9ea8106bb0
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 17 additions and 2 deletions

View file

@ -10,6 +10,7 @@ git-annex (8.20210904) UNRELEASED; urgency=medium
retrieving from a borg repository. retrieving from a borg repository.
* Resume where it left off when copying a file to/from a local git remote * Resume where it left off when copying a file to/from a local git remote
was interrupted. was interrupted.
* Sped up git-annex smudge --clean by 25%.
-- Joey Hess <id@joeyh.name> Fri, 03 Sep 2021 12:02:55 -0400 -- Joey Hess <id@joeyh.name> Fri, 03 Sep 2021 12:02:55 -0400

View file

@ -30,6 +30,7 @@ import Annex.InodeSentinal
import Utility.InodeCache import Utility.InodeCache
import Config.GitConfig import Config.GitConfig
import qualified Types.Backend import qualified Types.Backend
import qualified Annex.BranchState
import qualified Data.ByteString as S import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L import qualified Data.ByteString.Lazy as L
@ -87,6 +88,7 @@ smudge file = do
-- injested content if so. Otherwise, the original content. -- injested content if so. Otherwise, the original content.
clean :: RawFilePath -> CommandStart clean :: RawFilePath -> CommandStart
clean file = do clean file = do
Annex.BranchState.disableUpdate -- optimisation
b <- liftIO $ L.hGetContents stdin b <- liftIO $ L.hGetContents stdin
ifM fileoutsiderepo ifM fileoutsiderepo
( liftIO $ L.hPut stdout b ( liftIO $ L.hPut stdout b

View file

@ -13,8 +13,8 @@ The middle is slightly an outlier, and it would be better to have more data
points, but what this says to me is it's probably around 38x more expensive points, but what this says to me is it's probably around 38x more expensive
on windows than on linux for git-annex smudge --clean to run. on windows than on linux for git-annex smudge --clean to run.
git-annex smudge --clean makes on the order of 3000 syscalls, including git-annex smudge --clean makes on the order of 4000 syscalls, including
opening 200 files, execing git 30 times, and statting 400 files. That's opening 200 files, execing git 8 times, and statting 500 files. That's
around 10x as many syscalls as git add makes. And it's run once per file. So around 10x as many syscalls as git add makes. And it's run once per file. So
relatively small differences in syscall performance between windows and relatively small differences in syscall performance between windows and
linux can add up. linux can add up.

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2021-09-23T17:48:19Z"
content="""
I noticed in the strace that smudge --clean ran git cat-file 2
more times than necessary. Also was able to avoid updating the git-annex
branch, which eliminates several calls to git (depending on the number of
remotes). On Linux, this made it 25% faster. Might be more on Windows.
Rest of the strace looks clean, nothing else stands out as unncessary.
"""]]