benchmarking of filter-process vs smudge/clean
No firm conclusions yet, but it's doing better than I would have expected. Sponsored-by: Graham Spencer on Patreon
This commit is contained in:
parent
099e8fe061
commit
054c803f8d
1 changed files with 63 additions and 7 deletions
|
@ -120,6 +120,9 @@ The best fix would be to improve git's smudge/clean interface:
|
|||
|
||||
## benchmarking
|
||||
|
||||
Goal is to both characterise how slow this interface makes git-annex,
|
||||
and to investigate when enabling filter-process is an improvement, and not.
|
||||
|
||||
* git add of 1000 small files (adding to git repository not annex)
|
||||
- no git-annex: 0.2s
|
||||
- git-annex with smudge --clean: 63.3s
|
||||
|
@ -146,10 +149,63 @@ The best fix would be to improve git's smudge/clean interface:
|
|||
the piping to add more overhead than it seems to have.
|
||||
* git checkout of branch with 1000 small annexed files
|
||||
- no git-annex (checking out annex pointer files): 0.1s
|
||||
- git-annex with smudge: 83.4s
|
||||
- git-annex with filter-process: 16.0s ()
|
||||
With filter-process, the actual checkout takes under a second,
|
||||
then the post-checkout hook which populates the annexed files
|
||||
and restages them in git. The restaging does not
|
||||
use filter-process currently. The number in parens is with
|
||||
git-annex modified so the restaging does use filter-process.
|
||||
- git-annex with smudge: 145s
|
||||
- git-annex with filter-process enabled: 13.1s
|
||||
Win for filter-process, but small annexed files are somewhat
|
||||
unusual. See next benchmark.
|
||||
* git checkout of branch with 1 gb annexed file
|
||||
- git-annex with smudge: 5.6s
|
||||
- git-annex with filter-process enabled: 11.2s
|
||||
Here filter-process slows it down, and the reason it does
|
||||
is the post-checkout hook runs, which populates the annexed file
|
||||
and restages it in git. The restaging uses filter-process, and git
|
||||
feeds the annexed file contents through the pipe, though git-annex
|
||||
does not need to see that data. So it makes sense that
|
||||
filter-process is about twice as slow as smudge, since with smudge
|
||||
it only has to write the file and does not also read it.
|
||||
With more annexed data being checked out, it should continue to
|
||||
scale like this, with filter-process being 2x as expensive,
|
||||
or perhaps more (if disk cache stops helping).
|
||||
Disabling filter-process during the restaging would improve
|
||||
this case, but unfortunately it does not seem easy to do
|
||||
that (see [[!commit 837025b14f523f9180f82d0cced1e53a8a9b94de]]).
|
||||
* git-annex get of 1000 small annexed files
|
||||
- git-annex with smudge: 100.1s
|
||||
- git-annex with filter-process enabled: 39.3s
|
||||
The difference is due to restaging in git needing to pass
|
||||
the annexed files through the filter.
|
||||
Win for filter-process, but small annexed files are somewhat
|
||||
unusual. See next benchmark.
|
||||
* git-annex get of a 1 gb annexed file
|
||||
- git-annex with smudge: 21.5s
|
||||
- git-annex with filter-process enabled: 22.8s
|
||||
Transfer time was around 12s, the rest is copying the file
|
||||
to the work tree and restaging overhead. So filter-process
|
||||
is slower because git sends the file content to it over a pipe
|
||||
unncessarily. Less of a loss for filter-process that I expected
|
||||
though, but again disk cache probably helped.
|
||||
* git-annex get of two 1 gb annexed files
|
||||
- git-annex with smudge: 42.3s
|
||||
- git-annex with filter-process enabled: 46.7s
|
||||
This shows that filter-process will get progressively worse
|
||||
as the amount of annexed data that git-annex gets goes up.
|
||||
It is not a fast increase, but it will add up. Also disk cache
|
||||
will stop helping at some point.
|
||||
|
||||
Benchmark summary:
|
||||
|
||||
* filter-process makes `git add` slightly slower for large
|
||||
files that are added to the annex, but not as much as expected (and it can
|
||||
be improved), so overall it's a win for `git add`.
|
||||
|
||||
* filter-process makes `git checkout`, `merge`, etc of unlocked annexed files
|
||||
at least twice as slow as the size of annexed data goes up, but it does avoid
|
||||
very slow checkouts when there are a lot of non-annexed or smaller unlocked
|
||||
annexed files. That benefit may be worth the overhead, though it would
|
||||
be good to check the overhead with larger annexed data checkouts to see
|
||||
how it scales.
|
||||
|
||||
* filter-process makes `git-annex get` slower as the size of annexed data
|
||||
goes up. Although the time spent actually getting the data will typically
|
||||
dominate (network being slower than disk), so this may be an acceptable
|
||||
tradeoff for many users.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue