All commands that often have to read a lot of information from
the git-annex branch should now be nearly as fast as before
the branch was introduced.
Before fsck was taking approximatly 3 hours, now it's running in 8 minutes.
The code is very nasty. It should be rewritten to read the header line
from git cat-file, and then read the specified number of bytes of content.
git is slow when the index file is large and has to be rewritten each time
a file is changed. To speed this up, added a journal where changes are
recorded before being fed into the index file and committed to the
git-annex branch. The entire journal can be fed into git with just 2
commands, and only one write of the index file.
This will speed up typical cases like git-annex get, which currently
has to read the location log once, then read it a second time in order to
add a line to it. Since these reads now involve more than just reading
in a file, it seemed good to add a cache layer.
Only the most recent thing needs to be cached, because git-annex has
good locality; it operates on one file at a time, and only cares
about one item from the branch per file.