f600444ab6
Using a single strictness annotation, in just the right place. Tried several others, none of which helped and some of which potentially hurt. This is only the second time I've really had to deal with this in a year of using haskell, which is, I suppose not that bad.
19 lines
999 B
Markdown
19 lines
999 B
Markdown
`git-annex unused` has to compare large sets of data
|
|
(all keys with content present in the repository,
|
|
with all keys used by files in the repository), and so
|
|
uses more memory than git-annex typically needs; around
|
|
60-80 mb when run in a repository with 80 thousand files.
|
|
|
|
I would like to reduce this. One idea is to use a bloom filter.
|
|
For example, construct a bloom filter of all keys used by files in
|
|
the repository. Then for each key with content present, check if it's
|
|
in the bloom filter. Since there can be false negatives, this might
|
|
miss finding some unused keys. The probability/size of filter
|
|
could be tunable.
|
|
|
|
Another way might be to scan the git log for files that got removed
|
|
or changed what key they pointed to. Correlate with keys with content
|
|
currently present in the repository (possibly using a bloom filter again),
|
|
and that would yield a shortlist of keys that are probably not used.
|
|
Then scan thru all files in the repo to make sure that none point to keys
|
|
on the shortlist.
|