massive v6 add speed/memory improvement

v6 add: Take advantage of improved SIGPIPE handler in git 2.5 to speed up
the clean filter by not reading the file content from the pipe. This also
avoids git buffering the whole file content in memory.

When built with an older git, still consumes stdin. If built with a newer
git and used with an older one, it breaks, but that's acceptable --
checking the git version every time would make repeated smudge runs slow.

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2018-08-09 18:17:46 -04:00
parent 74551a430a
commit a96972015d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 35 additions and 12 deletions

View file

@ -65,16 +65,16 @@ git-annex should use smudge/clean filters.
* When git runs the smudge filter, it buffers all its output in ram before
writing it to a file. So, checking out a branch with a large v6 unlocked files
can cause git to use a lot of memory.
(This needs to be fixed in git, but my proposed interface in
This needs to be fixed in git, but my proposed interface in
<http://thread.gmane.org/gmane.comp.version-control.git/294425> would
avoid the problem for git checkout, since it would use the new interface
and not the smudge filter.)
and not the smudge filter.
* When `git add` is run with a large file, it allocates memory for
the whole file content, even though it's only going
to stream it to the clean filter. My proposed smudge/clean
interface patch also fixed this problem, since it made git not read
the file at all.
Last verified with git 2.18 in 2018.
To check: Does the long-running filter process interface have the same
problem?
* Eventually (but not yet), make v6 the default for new repositories.
Note that the assistant forces repos into direct mode; that will need to

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="joey"
subject="""re: Git 2.5 allows smudge filters to not read all of stdin"""
date="2018-08-09T22:11:00Z"
content="""
@torarnv thanks for pointing that out.. I finally got around to verifying
that, and was able to speed up the smudge filter. Also this avoids the
problem that git for some reason buffers the whole file content in memory
when it sends it to the smudge filter, which is a pretty bad memory leak in git
that no longer affects this.
"""]]