fix memory leak when staging the journal

The list of files had to be retained until the end so it could be deleted.
Also, a list of update-index lines was generated and only then fed into it.
Now everything streams in constant space.
This commit is contained in:
Joey Hess 2012-02-14 14:35:52 -04:00
parent cdd6cdbb67
commit 7ebd98d8d8
4 changed files with 52 additions and 42 deletions

View file

@ -12,26 +12,27 @@ A history of the leaks:
* Originally, `git annex add` remembered all the files
it had added, and fed them to git at the end. Of course
that made its memory use grow, so it was fixed to periodically
flush its buffer. Affected versions: before 0.20110417
flush its buffer. Fixed in version 0.20110417.
* Something called a "lazy state monad" caused "thunks" to build
up and memory to leak. Also affected other git annex commands
than `add`. Adding files using a SHA* backend hit the worst.
Fixed in versions afer 3.20120123.
* A strange GHC bug seemed to be responsible for another leak.
(In particular, a child process was forked. All the child did
was read filenames from one pipe and shove them reformatted out
another pipe. For some reason, it steadily grew in size.)
Code was rewritten in a way that happens to avoid that leak.
Apparently fixed in versions afer 3.20120123, but this one is not
well understood.
* Committing journal files turned out to have another memory leak.
After adding a lot of files ran out of memory, this left the journal
behind and could affect other git-anne commands. Fixed in versions afer
behind and could affect other git-annex commands. Fixed in versions afer
3.20120123.
* Something is still causing a slow leak when adding files.
I tested by adding many copies of the whole linux kernel
tree into the annex using the WORM backend, and once
it had added 1 million files, git-annex used ~100 mb of ram.
That's 100 bytes leaked per file on average .. roughly the
size of a filename? It's worth noting that `git add` uses more memory
than that in such a large tree.
**not fixed yet**
* (Note that `git ls-files --others`, which is used to find files to add,
also uses surpsisingly large amounts
of memory when you have a lot of files. It buffers