git-annex/doc/bugs/git_annex_add_memory_leak.mdwn
Joey Hess 505d6b1a06 fix failure count memory leak
This is the last memory leak that prevents git-annex from running
in constant space, as far as I can see. I can now run git annex find
dummied up to repeatedly find the same file over and over, on millions
olf files, and memory stays entirely constant.
2012-02-15 14:35:49 -04:00

39 lines
1.6 KiB
Markdown

For the record, `git annex add` has had a series of memory leaks.
Mostly these are minor -- until you need to check in a few
million files in a single operation.
If this happens to you, git-annex will run out of memory and stop.
(Generally well before your system runs out of memory, since it has some
built-in ulimits.) You can recover by just re-running the `git annex add`
-- it will automatically pick up where it left off.
A history of the leaks:
* Originally, `git annex add` remembered all the files
it had added, and fed them to git at the end. Of course
that made its memory use grow, so it was fixed to periodically
flush its buffer. Fixed in version 0.20110417.
* Something called a "lazy state monad" caused "thunks" to build
up and memory to leak. Also affected other git annex commands
than `add`. Adding files using a SHA* backend hit the worst.
Fixed in versions afer 3.20120123.
* Committing journal files turned out to have another memory leak.
After adding a lot of files ran out of memory, this left the journal
behind and could affect other git-annex commands. Fixed in versions afer
3.20120123.
* The count of the number of failed commands was updated lazily, which
caused a slow leak when running on a lot of files. Fixed in versions afer
3.20120123.
* (Note that `git ls-files --others`, which is used to find files to add,
also uses surpsisingly large amounts
of memory when you have a lot of files. It buffers
the entire list, so it can compare it with the files in the index,
before outputting anything.
This is Not Our Problem, but I'm sure the git developers
would appreciate a patch that fixes it.)
[[done]]