36 lines
1.8 KiB
Markdown
36 lines
1.8 KiB
Markdown
Most of git-annex is designed to be fast no matter how many other files are
|
|
in the annex. Things like add/get/drop/move/fsck have good locality;
|
|
they will only operate on as many files as you need them to.
|
|
|
|
(git commit can get a little slow with a great deal of files,
|
|
but that's out of scope -- and recent git-annex versions use queuing
|
|
to save git add from piling up too much in the index.)
|
|
|
|
But currently two git-annex commands are quite slow when annexes become large
|
|
in quantity of files. These are unused and stats
|
|
(Both have --fast versions that don't do as much).
|
|
|
|
Both are slow because both need two pieces of information that are not
|
|
quick to look up, and require examining the whole repo, very seekily:
|
|
|
|
1. The keys present in the annex. Found by looking thru .git/annex/objects.
|
|
2. The keys referenced by files in git. Found by finding every file
|
|
in git, and looking at its symlink.
|
|
|
|
Of these, the first is less expensive (typically, an annex does not have every
|
|
key in it). It could be optimized fairly simply, by adding a database
|
|
of keys present in the annex that is optimised to list them all. The
|
|
database would be updated by the few functions that move content in and
|
|
out.
|
|
|
|
The second is harder to optimise, because the user can delete, revert,
|
|
copy, add, etc files in git at will, and git-annex does not have a good way
|
|
to watch that and maintain a database of what keys are being referenced.
|
|
|
|
It could use a post-commit hook and examine files changed by commits, etc.
|
|
But then staged files would be left out. It might be sufficient to
|
|
make --fast trust the database... except unused will suggest *deleting*
|
|
data if nothing references it. Or maybe it could be required to have a
|
|
clean tree with nothing staged before running git-annex unused.
|
|
|
|
Anyway, this is a semi-longterm item for me. --[[Joey]]
|