new todo
This commit is contained in:
parent
8034f2e9bb
commit
9121154a75
3 changed files with 47 additions and 8 deletions
|
@ -90,12 +90,12 @@ And here's the consequences of git-annex's workarounds:
|
||||||
|
|
||||||
* When `git-annex filter-process` is enabled, it cannot use the trick
|
* When `git-annex filter-process` is enabled, it cannot use the trick
|
||||||
described above that `git-annex smudge --clean` uses to avoid git
|
described above that `git-annex smudge --clean` uses to avoid git
|
||||||
piping the whole content of large files through it. This mainly slows
|
piping the whole content of large files through it. The whole file
|
||||||
down `git add` when it is being used with an annex.largefiles
|
content has to be read, even when git-annex does not need to see it.
|
||||||
confguration to add a large file to the annex. (Making filter-process
|
This mainly slows down `git add` when it is being used with an
|
||||||
incrementally hash the content git passes to it will mostly avoid
|
annex.largefiles confguration to add a large file to the annex,
|
||||||
this performance problem though it may always be a little bit slower
|
by about 5%. ([[todo/incremental_hashing_for_add]] would improve
|
||||||
than `git-annex smudge --clean` due to the data piping.)
|
performance)
|
||||||
|
|
||||||
* In a rare situation, git-annex would like to get git to run the clean
|
* In a rare situation, git-annex would like to get git to run the clean
|
||||||
filter, but it cannot because git has the index locked. So, git-annex has
|
filter, but it cannot because git has the index locked. So, git-annex has
|
||||||
|
|
40
doc/todo/incremental_hashing_for_add.mdwn
Normal file
40
doc/todo/incremental_hashing_for_add.mdwn
Normal file
|
@ -0,0 +1,40 @@
|
||||||
|
When `git-annex filter-process` is enabled, `git add` pipes the content of
|
||||||
|
files into it, but that's thrown away, and the file is read again by git-annex
|
||||||
|
to generate a hash. It would improve performance to hash the content
|
||||||
|
provided via the pipe.
|
||||||
|
|
||||||
|
When filter-process is not enabled, `git-annex smudge --clean` reads
|
||||||
|
the file to hash it, then reads it a second time to copy it into
|
||||||
|
.git/annex/objects. When annex.addunlocked is enabled, `git annex add`
|
||||||
|
does the same. It would improve performance to read once and copy and
|
||||||
|
hash at the same time.
|
||||||
|
|
||||||
|
The `incrementalhash` branch has a start at implementing this.
|
||||||
|
I lost steam on this branch when I realized that it would need to
|
||||||
|
re-implement Annex.Ingest.ingest in order to populate
|
||||||
|
.git/annex/objects/. And it's not as simple as writing an object file
|
||||||
|
and moving it into place there, because annex.thin means a hard link should
|
||||||
|
be made, and if the filesystem supports CoW, that should be used rather
|
||||||
|
than writing the file again.
|
||||||
|
|
||||||
|
A benchmark showed that `git add` of a 1 gb file
|
||||||
|
is about 5% slower with filter-process enabled than it is
|
||||||
|
with filter-process disabled. That's due to the piping overhead to
|
||||||
|
filter-process ([[todo/git_smudge_clean_interface_suboptiomal]]).
|
||||||
|
`git-annex add` with `annex.addunlocked` has similar performance
|
||||||
|
as `git add` with filter-process disabled.
|
||||||
|
|
||||||
|
`git-annex add` without `annex.addunlocked` is about 25% faster than those,
|
||||||
|
and only reads the file once, but it also does not copy the file, so of
|
||||||
|
course it's faster, and always will be.
|
||||||
|
|
||||||
|
Probably disk cache helps them a fair amount, unless it's too small.
|
||||||
|
So it's not clear how much implementing this would really speed them up.
|
||||||
|
|
||||||
|
This does not really affect default configurations.
|
||||||
|
Performance is only impacted when annex.addunlocked or
|
||||||
|
annex.largefiles is configured, and in a few cases
|
||||||
|
where an already annexed file is added by `git add` or `git commit -a`.
|
||||||
|
|
||||||
|
So is the complication of implementing this worth it? Users who
|
||||||
|
need maximum speed can use `git-annex add`.
|
|
@ -15,5 +15,4 @@ could change and if it does, these things could be included.
|
||||||
* Possibly enable `git-annex filter-process` by default. If the tradeoffs
|
* Possibly enable `git-annex filter-process` by default. If the tradeoffs
|
||||||
seem worth it.
|
seem worth it.
|
||||||
|
|
||||||
It does not currently incrementally hash, so implementing that first
|
May want to implement [[incremental_hashing_for_add]] first.
|
||||||
would improve the tradeoffs.
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue