add
This commit is contained in:
parent
6fa86946d5
commit
4ea0b7c288
1 changed files with 25 additions and 0 deletions
25
doc/todo/git-annex_unused_eats_memory.mdwn
Normal file
25
doc/todo/git-annex_unused_eats_memory.mdwn
Normal file
|
@ -0,0 +1,25 @@
|
|||
`git-annex unused` has to compare large sets of data
|
||||
(all keys with content present in the repository,
|
||||
with all keys used by files in the repository), and so
|
||||
uses more memory than git-annex typically needs; around
|
||||
60-80 mb when run in a repository with 80 thousand files.
|
||||
|
||||
I would like to reduce this. One idea is to use a bloom filter.
|
||||
For example, construct a bloom filter of all keys used by files in
|
||||
the repository. Then for each key with content present, check if it's
|
||||
in the bloom filter. Since there can be false negatives, this might
|
||||
miss finding some unused keys. The probability/size of filter
|
||||
could be tunable.
|
||||
|
||||
Another way might be to scan the git log for files that got removed
|
||||
or changed what key they pointed to. Correlate with keys with content
|
||||
currently present in the repository (possibly using a bloom filter again),
|
||||
and that would yield a shortlist of keys that are probably not used.
|
||||
Then scan thru all files in the repo to make sure that none point to keys
|
||||
on the shortlist.
|
||||
|
||||
----
|
||||
|
||||
`git annex unused --from remote` is much worse, using hundreds of mb of
|
||||
memory. It has not been profiled at all yet, and can probably be improved
|
||||
somewhat by fixing whatever memory leak it (probably) has.
|
Loading…
Add table
Reference in a new issue