I hate hard-coded 40 kilobyte lone file lists, and just once would like to
see a build system that does not assume it's a good idea to have a file
list, or a hardcoded file list, or a file list that can only be generated
with a crippled form of globs. But not today, thank you cabal.
This is substantially slower than using make, does not build or install
documentation, does not run the test suite, and is not particularly
recommended, but could be useful to some.
Rebenchmarked v2 vs v3, and v3 is now actually faster. Yes, storing data
in git, using git as a filesystem is actually faster than just using the
filesystem. If you do it just right. :)
Now it reads the size specified, rather than using the sentinal hack to
determine EOF.
It still depends on error messages to handle files that are not present.
All commands that often have to read a lot of information from
the git-annex branch should now be nearly as fast as before
the branch was introduced.
Before fsck was taking approximatly 3 hours, now it's running in 8 minutes.
The code is very nasty. It should be rewritten to read the header line
from git cat-file, and then read the specified number of bytes of content.
Since the logs have just been moved into the git-annex branch, don't need
to worry about backwards compatability with old versions of git-annex that
would fail to parse location logs with extra fields tacked on.
Only "partially" because the journal is not locked during the merge, so
there's a small window where a different git-annex process could write info
to the journal that overwrites info taken from the merge.
That could be dealt with by locking, but the lock would really need to be
around the whole git-annex, to only let one run at a time. Otherwise, even
with the journal locked during the merge, another git-annex could already
be running, generate an overwriting change, and only store it in the journal
after the merge was complete. And similarly, two git-annex processes could
fight and overwrite each other's information independant of any merging.
So, a toplevel lock for git-annex may get added; it's something I've
considered before, as these potential, unlikely problems are not new.
(OTOH, fsck will deal with such problems.)
git is slow when the index file is large and has to be rewritten each time
a file is changed. To speed this up, added a journal where changes are
recorded before being fed into the index file and committed to the
git-annex branch. The entire journal can be fed into git with just 2
commands, and only one write of the index file.