The tricky part about this is that to generate a key, the file must be
present already. Worked around by adding (back) an URL key type, which
is used for addurl --fast.
This was more complex than would be expected. unannex has to use git commit -a
since it's removing files from git; git commit filelist won't do.
Allow commands to be added to the Git queue that have no associated files,
and run such commands once.
This allows eg, `git-annex -c annex.rsync-options=-6 get file`
The overridden git configs are not passed on to git plumbing commands
that are run. Perhaps someone will find a need to do that, but I don't yet
and it would require storing more state to know what config settings
have been overridden and need to be passed on.
Could result in bad location log data for keys that contain [&:%] in their
names. (A workaround for this problem is to run git annex fsck.)
`git annex unused --from remote` could also run into the broken code.
Giulio Eulisse reported that on OSX, bad free space numbers were being
shown. It thought he had negative free space.
While the documentation is not clear, especially across OS's, it seems
likely that statfs uses unsigned long. It doesn't make sense for any
numbers to be negative.
This is substantially slower than using make, does not build or install
documentation, does not run the test suite, and is not particularly
recommended, but could be useful to some.
Rebenchmarked v2 vs v3, and v3 is now actually faster. Yes, storing data
in git, using git as a filesystem is actually faster than just using the
filesystem. If you do it just right. :)
All commands that often have to read a lot of information from
the git-annex branch should now be nearly as fast as before
the branch was introduced.
Before fsck was taking approximatly 3 hours, now it's running in 8 minutes.
The code is very nasty. It should be rewritten to read the header line
from git cat-file, and then read the specified number of bytes of content.
Since the logs have just been moved into the git-annex branch, don't need
to worry about backwards compatability with old versions of git-annex that
would fail to parse location logs with extra fields tacked on.
This is a new git subcommand, that does a generic union merge operation
between two refs, storing the result in a branch. It operates efficiently
without touching the working tree. It does need to write out a temporary
index file, and may need to write out some other temp files as well.
This could be useful for anything that stores data in a branch,
and needs to merge changes into that branch without actually checking the
branch out. Since conflict handling can't be done without a working copy,
the merge type is always a union merge, which is fine for data stored in
log format (as git-annex does), or in non-conflicting files
(as pristine-tar does).
This probably belongs in git proper, but it will live in git-annex for now.
---
Plan is to move .git-annex/ to a git-annex branch, and use git-union-merge
to handle merging changes when pulling from remotes.
Some preliminary benchmarking using real .git-annex/ data indicates
that it's quite fast, except for the "git add" call, which is as slow
as "git add" tends to be with a big index.
cp is still used when copying file from repos on the same filesystem, since
--reflink=auto can make it significantly faster on filesystems such as
btrfs.
Directory special remotes still use cp, not rsync. It's not clear what
tmp file should be used when rsyncing to such a remote.
get not honoring --from has surprised me a few times, so least surprise
suggests it should just behave like copy --from. This leaves the difference
between get and copy being that copy always requires the remote to copy
from, while get will decide whether to get a file from a key/value store or
a remote.