All changes to files in the branch are now made via pure functions that
transform the old file into the new. This will allow adding locking
to prevent read/write races. It also makes the code nicer, and purer.
I noticed a behavior change, really a sort of bug fix. Before,
'git annex untrust foo --trust bar' would change both trust levels
permanantly, now the --trust doesn't get stored.
Could result in bad location log data for keys that contain [&:%] in their
names. (A workaround for this problem is to run git annex fsck.)
`git annex unused --from remote` could also run into the broken code.
The only remaining vestiage of backends is different types of keys. These
are still called "backends", mostly to avoid needing to change user interface
and configuration. But everything to do with storing keys in different
backends was gone; instead different types of remotes are used.
In the refactoring, lots of code was moved out of odd corners like
Backend.File, to closer to where it's used, like Command.Drop and
Command.Fsck. Quite a lot of dead code was removed. Several data structures
became simpler, which may result in better runtime efficiency. There should
be no user-visible changes.
Since the logs have just been moved into the git-annex branch, don't need
to worry about backwards compatability with old versions of git-annex that
would fail to parse location logs with extra fields tacked on.
That sucking sound is a whole page of code vanishing to be replaced with
return . catMaybes . map (logFileKey . takeFileName) =<< Branch.files
What can I say, git is my database, and haskell my copilot.
to avoid some issues with git on OSX with the mixed-case directories. No
migration is needed; the old mixed case hash directories are still read;
new information is written to the new directories.
This means there can be 1024 subdirs, each with up to 1024 sub-subdirs.
So with hundreds of millions of annexed objects, each leaf directory will
have only a few files on average.
Writing to a tmp file means no locking is needed, and it fixes a bug
introduced by the last commit, which made log files be read lazily, so
they could still be open when written, which breaks due to haskell's
internal locking.
Actions that need to read all the location logs, like "git annex get .",
were still using a lot of memory, and profiling pointed at the location log
reading as the problem. Not locking them for read, and thus avoiding the
strict reading fixes the problem, although I don't quite understand why.
(Oddly, -sstderr profiling did not show the memory as used, though top
showed dozens of MB being used.)
Anyway, it's fine to not lock location logs for read, since the log format
and parser should be safe if a partial read of a file being written happens.
Note that that could easily happen anyway, if doing a git pull, etc,
especially if git needs to union merge in changes from elsewhere. The worst
that will happen is git-annex could get a bad or out of date idea about
locations and refuse to eg, --drop something.