The only remaining vestiage of backends is different types of keys. These
are still called "backends", mostly to avoid needing to change user interface
and configuration. But everything to do with storing keys in different
backends was gone; instead different types of remotes are used.
In the refactoring, lots of code was moved out of odd corners like
Backend.File, to closer to where it's used, like Command.Drop and
Command.Fsck. Quite a lot of dead code was removed. Several data structures
became simpler, which may result in better runtime efficiency. There should
be no user-visible changes.
Otherwise, the location log changes are only staged in its index,
and this can confuse matters if pulling or cloning from the remote.
The test suite was failing because this wasn't done.
cp is still used when copying file from repos on the same filesystem, since
--reflink=auto can make it significantly faster on filesystems such as
btrfs.
Directory special remotes still use cp, not rsync. It's not clear what
tmp file should be used when rsyncing to such a remote.
This takes advantage of the debug logging done by missingh, and I added
my own debug messages for executeFile calls. There are still some other
low-level ways git-annex runs stuff that are not shown by debugging,
but this gets most of it easily.
These are defined in ifelse, but it's not currently available and I don't
want to pull in a library for 6 lines of code anyhow.
Also, ifelse sets the fixity to 1, which does not allow >>? error $ ...
In particular, munge key filenames to comply with the IA's filename limits,
disable encryption, support their nonstandard way of creating buckets, and
allow x-amz-* headers to be specified in initremote to set item metadata.
Still TODO: initremote does not handle multiword metadata headers right.
Fully tested and working, including resuming and encryption. (Though not
resuming when sending *with* encryption; gpg doesn't produce identical
output each time.)
Uses same layout as the directory special remote and the .git/annex/objects/
directory.
This was a most surprising leak. It occurred in the process that is forked
off to feed data to gpg. That process was passed a lazy ByteString of
input, and ghc seemed to not GC the ByteString as it was lazily read
and consumed, so memory slowly leaked as the file was read and passed
through gpg to bup.
To fix it, I simply changed the feeder to take an IO action that returns
the lazy bytestring, and fed the result directly to hPut.
AFAICS, this should change nothing WRT buffering. But somehow it makes
ghc's GC do the right thing. Probably I triggered some weakness in ghc's
GC (version 6.12.1).
(Note that S3 still has this leak, and others too. Fixing it will involve
another dance with the type system.)
Update: One theory I have is that this has something to do with
the forking of the feeder process. Perhaps, when the ByteString
is produced before the fork, ghc decides it need to hold a pointer
to the start of it, for some reason -- maybe it doesn't realize that
it is only used in the forked process.
Stalls were caused by code that did approximatly:
content' <- liftIO $ withEncryptedContent cipher content return
store content'
The return evaluated without actually reading content from S3,
and so the cleanup code began waiting on gpg to exit before
gpg could send all its data.
Fixing it involved moving the `store` type action into the IO monad:
liftIO $ withEncryptedContent cipher content store
Which was a bit of a pain to do, thank you type system, but
avoids the problem as now the whole content is consumed, and
stored, before cleanup.
Untested, I will need to dust off my S3 keys, and plug the modem back in
that was unplugged last night due to very low battery bank power. But it
compiles, so it's probably perfect. :)
Forking a new process rather than relying on a thread to feed gpg.
The feeder thread was stalling, probably when the main thread got
to the point it was wait()ing on the gpg to exit.