new todo
This commit is contained in:
parent
8034f2e9bb
commit
9121154a75
3 changed files with 47 additions and 8 deletions
|
@ -90,12 +90,12 @@ And here's the consequences of git-annex's workarounds:
|
|||
|
||||
* When `git-annex filter-process` is enabled, it cannot use the trick
|
||||
described above that `git-annex smudge --clean` uses to avoid git
|
||||
piping the whole content of large files through it. This mainly slows
|
||||
down `git add` when it is being used with an annex.largefiles
|
||||
confguration to add a large file to the annex. (Making filter-process
|
||||
incrementally hash the content git passes to it will mostly avoid
|
||||
this performance problem though it may always be a little bit slower
|
||||
than `git-annex smudge --clean` due to the data piping.)
|
||||
piping the whole content of large files through it. The whole file
|
||||
content has to be read, even when git-annex does not need to see it.
|
||||
This mainly slows down `git add` when it is being used with an
|
||||
annex.largefiles confguration to add a large file to the annex,
|
||||
by about 5%. ([[todo/incremental_hashing_for_add]] would improve
|
||||
performance)
|
||||
|
||||
* In a rare situation, git-annex would like to get git to run the clean
|
||||
filter, but it cannot because git has the index locked. So, git-annex has
|
||||
|
|
40
doc/todo/incremental_hashing_for_add.mdwn
Normal file
40
doc/todo/incremental_hashing_for_add.mdwn
Normal file
|
@ -0,0 +1,40 @@
|
|||
When `git-annex filter-process` is enabled, `git add` pipes the content of
|
||||
files into it, but that's thrown away, and the file is read again by git-annex
|
||||
to generate a hash. It would improve performance to hash the content
|
||||
provided via the pipe.
|
||||
|
||||
When filter-process is not enabled, `git-annex smudge --clean` reads
|
||||
the file to hash it, then reads it a second time to copy it into
|
||||
.git/annex/objects. When annex.addunlocked is enabled, `git annex add`
|
||||
does the same. It would improve performance to read once and copy and
|
||||
hash at the same time.
|
||||
|
||||
The `incrementalhash` branch has a start at implementing this.
|
||||
I lost steam on this branch when I realized that it would need to
|
||||
re-implement Annex.Ingest.ingest in order to populate
|
||||
.git/annex/objects/. And it's not as simple as writing an object file
|
||||
and moving it into place there, because annex.thin means a hard link should
|
||||
be made, and if the filesystem supports CoW, that should be used rather
|
||||
than writing the file again.
|
||||
|
||||
A benchmark showed that `git add` of a 1 gb file
|
||||
is about 5% slower with filter-process enabled than it is
|
||||
with filter-process disabled. That's due to the piping overhead to
|
||||
filter-process ([[todo/git_smudge_clean_interface_suboptiomal]]).
|
||||
`git-annex add` with `annex.addunlocked` has similar performance
|
||||
as `git add` with filter-process disabled.
|
||||
|
||||
`git-annex add` without `annex.addunlocked` is about 25% faster than those,
|
||||
and only reads the file once, but it also does not copy the file, so of
|
||||
course it's faster, and always will be.
|
||||
|
||||
Probably disk cache helps them a fair amount, unless it's too small.
|
||||
So it's not clear how much implementing this would really speed them up.
|
||||
|
||||
This does not really affect default configurations.
|
||||
Performance is only impacted when annex.addunlocked or
|
||||
annex.largefiles is configured, and in a few cases
|
||||
where an already annexed file is added by `git add` or `git commit -a`.
|
||||
|
||||
So is the complication of implementing this worth it? Users who
|
||||
need maximum speed can use `git-annex add`.
|
|
@ -15,5 +15,4 @@ could change and if it does, these things could be included.
|
|||
* Possibly enable `git-annex filter-process` by default. If the tradeoffs
|
||||
seem worth it.
|
||||
|
||||
It does not currently incrementally hash, so implementing that first
|
||||
would improve the tradeoffs.
|
||||
May want to implement [[incremental_hashing_for_add]] first.
|
||||
|
|
Loading…
Reference in a new issue