Rebenchmarked v2 vs v3, and v3 is now actually faster. Yes, storing data
in git, using git as a filesystem is actually faster than just using the
filesystem. If you do it just right. :)
Now it reads the size specified, rather than using the sentinal hack to
determine EOF.
It still depends on error messages to handle files that are not present.
All commands that often have to read a lot of information from
the git-annex branch should now be nearly as fast as before
the branch was introduced.
Before fsck was taking approximatly 3 hours, now it's running in 8 minutes.
The code is very nasty. It should be rewritten to read the header line
from git cat-file, and then read the specified number of bytes of content.
Since the logs have just been moved into the git-annex branch, don't need
to worry about backwards compatability with old versions of git-annex that
would fail to parse location logs with extra fields tacked on.
Only "partially" because the journal is not locked during the merge, so
there's a small window where a different git-annex process could write info
to the journal that overwrites info taken from the merge.
That could be dealt with by locking, but the lock would really need to be
around the whole git-annex, to only let one run at a time. Otherwise, even
with the journal locked during the merge, another git-annex could already
be running, generate an overwriting change, and only store it in the journal
after the merge was complete. And similarly, two git-annex processes could
fight and overwrite each other's information independant of any merging.
So, a toplevel lock for git-annex may get added; it's something I've
considered before, as these potential, unlikely problems are not new.
(OTOH, fsck will deal with such problems.)
git is slow when the index file is large and has to be rewritten each time
a file is changed. To speed this up, added a journal where changes are
recorded before being fed into the index file and committed to the
git-annex branch. The entire journal can be fed into git with just 2
commands, and only one write of the index file.
That sucking sound is a whole page of code vanishing to be replaced with
return . catMaybes . map (logFileKey . takeFileName) =<< Branch.files
What can I say, git is my database, and haskell my copilot.
This fixes precommit, since in that hook, git sets the env var to write
to the lock file, which avoids git add failing due to the presence of the
lock file. (Took me a good hour and a half of confusion to figure this out.)
Test suite now passes 100%! Only the upgrade code still remains to be
written.