2012-02-14 05:01:38 +00:00
|
|
|
For the record, `git annex add` has had a series of memory leaks.
|
|
|
|
Mostly these are minor -- until you need to check in a few
|
|
|
|
million files in a single operation.
|
|
|
|
|
|
|
|
If this happens to you, git-annex will run out of memory and stop.
|
|
|
|
(Generally well before your system runs out of memory, since it has some
|
|
|
|
built-in ulimits.) You can recover by just re-running the `git annex add`
|
|
|
|
-- it will automatically pick up where it left off.
|
|
|
|
|
|
|
|
A history of the leaks:
|
|
|
|
|
|
|
|
* Originally, `git annex add` remembered all the files
|
|
|
|
it had added, and fed them to git at the end. Of course
|
|
|
|
that made its memory use grow, so it was fixed to periodically
|
2012-02-14 18:35:52 +00:00
|
|
|
flush its buffer. Fixed in version 0.20110417.
|
2012-02-14 05:01:38 +00:00
|
|
|
|
|
|
|
* Something called a "lazy state monad" caused "thunks" to build
|
|
|
|
up and memory to leak. Also affected other git annex commands
|
|
|
|
than `add`. Adding files using a SHA* backend hit the worst.
|
|
|
|
Fixed in versions afer 3.20120123.
|
|
|
|
|
2012-02-14 15:20:30 +00:00
|
|
|
* Committing journal files turned out to have another memory leak.
|
|
|
|
After adding a lot of files ran out of memory, this left the journal
|
2012-02-14 18:35:52 +00:00
|
|
|
behind and could affect other git-annex commands. Fixed in versions afer
|
2012-02-14 15:20:30 +00:00
|
|
|
3.20120123.
|
|
|
|
|
2012-02-14 18:35:52 +00:00
|
|
|
* Something is still causing a slow leak when adding files.
|
|
|
|
I tested by adding many copies of the whole linux kernel
|
|
|
|
tree into the annex using the WORM backend, and once
|
|
|
|
it had added 1 million files, git-annex used ~100 mb of ram.
|
|
|
|
That's 100 bytes leaked per file on average .. roughly the
|
|
|
|
size of a filename? It's worth noting that `git add` uses more memory
|
|
|
|
than that in such a large tree.
|
|
|
|
**not fixed yet**
|
|
|
|
|
2012-02-14 05:01:38 +00:00
|
|
|
* (Note that `git ls-files --others`, which is used to find files to add,
|
|
|
|
also uses surpsisingly large amounts
|
|
|
|
of memory when you have a lot of files. It buffers
|
|
|
|
the entire list, so it can compare it with the files in the index,
|
|
|
|
before outputting anything.
|
|
|
|
This is Not Our Problem, but I'm sure the git developers
|
|
|
|
would appreciate a patch that fixes it.)
|