47084b8a1d
This has tradeoffs, but is generally a win, and users who it causes git add to slow down unacceptably for can just disable it again. It needed to happen in an upgrade, since there are git-annex versions that do not support it, and using such an old version with a v8 repository with filter.annex.process set will cause bad behavior. By enabling it in v9, it's guaranteed that any git-annex version that can use the repository does support it. Although, this is not a perfect protection against problems, since an old git-annex version, if it's used with a v9 repository, will cause git add to try to run git-annex filter-process, which will fail. But at least, the user is unlikely to have an old git-annex in path if they are using a v9 repository, since it won't work in that repository. Sponsored-by: Dartmouth College's Datalad project
40 lines
2 KiB
Markdown
40 lines
2 KiB
Markdown
When `git-annex filter-process` is enabled (v9 and above), `git add` pipes
|
|
the content of files into it, but that's thrown away, and the file is read
|
|
again by git-annex to generate a hash. It would improve performance to hash
|
|
the content provided via the pipe.
|
|
|
|
When filter-process is not enabled, `git-annex smudge --clean` reads
|
|
the file to hash it, then reads it a second time to copy it into
|
|
.git/annex/objects. When annex.addunlocked is enabled, `git annex add`
|
|
does the same. It would improve performance to read once and copy and
|
|
hash at the same time.
|
|
|
|
The `incrementalhash` branch has a start at implementing this.
|
|
I lost steam on this branch when I realized that it would need to
|
|
re-implement Annex.Ingest.ingest in order to populate
|
|
.git/annex/objects/. And it's not as simple as writing an object file
|
|
and moving it into place there, because annex.thin means a hard link should
|
|
be made, and if the filesystem supports CoW, that should be used rather
|
|
than writing the file again.
|
|
|
|
A benchmark on Linux showed that `git add` of a 1 gb file
|
|
is about 5% slower with filter-process enabled than it is
|
|
with filter-process disabled. That's due to the piping overhead to
|
|
filter-process ([[todo/git_smudge_clean_interface_suboptiomal]]).
|
|
`git-annex add` with `annex.addunlocked` has similar performance
|
|
as `git add` with filter-process disabled.
|
|
|
|
`git-annex add` without `annex.addunlocked` is about 25% faster than those,
|
|
and only reads the file once, but it also does not copy the file, so of
|
|
course it's faster, and always will be.
|
|
|
|
Probably disk cache helps them a fair amount, unless it's too small.
|
|
So it's not clear how much implementing this would really speed them up.
|
|
|
|
This does not really affect default configurations.
|
|
Performance is only impacted when annex.addunlocked or
|
|
annex.largefiles is configured, and in a few cases
|
|
where an already annexed file is added by `git add` or `git commit -a`.
|
|
|
|
So is the complication of implementing this worth it? Users who
|
|
need maximum speed can use `git-annex add`.
|