git is slow when the index file is large and has to be rewritten each time
a file is changed. To speed this up, added a journal where changes are
recorded before being fed into the index file and committed to the
git-annex branch. The entire journal can be fed into git with just 2
commands, and only one write of the index file.
That sucking sound is a whole page of code vanishing to be replaced with
return . catMaybes . map (logFileKey . takeFileName) =<< Branch.files
What can I say, git is my database, and haskell my copilot.
This fixes precommit, since in that hook, git sets the env var to write
to the lock file, which avoids git add failing due to the presence of the
lock file. (Took me a good hour and a half of confusion to figure this out.)
Test suite now passes 100%! Only the upgrade code still remains to be
written.
Otherwise, the location log changes are only staged in its index,
and this can confuse matters if pulling or cloning from the remote.
The test suite was failing because this wasn't done.
Do not set annex.version whenever any command is run. Just do it in init.
This ensures that, if a repo has annex.version=3, it has a git-annex
branch, so we don't have to run a command every time to check for the
branch.
Remove the old ad-hoc logic for v0 and v1, to simplify version checking.
stop changing gitattributes on init
create git-annex branch on init
ugly special case for init in a bare repository goes away, yay!
git annex init is also faster, at least in a large existing repo, as
it does not need to run the slow 'git add'
This will speed up typical cases like git-annex get, which currently
has to read the location log once, then read it a second time in order to
add a line to it. Since these reads now involve more than just reading
in a file, it seemed good to add a cache layer.
Only the most recent thing needs to be cached, because git-annex has
good locality; it operates on one file at a time, and only cares
about one item from the branch per file.
This is needed for robust handling of the git-annex branch. Since changes
are staged to its index as git-annex runs, and committed at the end,
it's possible that git-annex is interrupted, and leaves a dirty index.
When it next runs, it needs to be able to merge the git-annex branch
as necessary, without losing the existing changes in the index.
Note that this assumes that the git-annex branch is only modified by
git-annex. Any changes to it will be lost when git-annex updates the
branch. I don't see a good, inexpensive way to find changes in
the git-annex branch that arn't in the index, and union merging the
git-annex branch into the index every time would likewise be expensive.
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.