This option avoids gpg key distribution, at the expense of flexability, and
with the requirement that all clones of the git repository be equally
trusted.
Avoid ever using read to parse a non-haskell formatted input string.
show :: Key is arguably still show abuse, but displaying Keys as filenames
is just too useful to give up.
Actually, let's do a targeted fix of the actual forkProcess that was not
waited on. The global reap is moved back to the end, after the long-running
git processes actually exit.
This was a most surprising leak. It occurred in the process that is forked
off to feed data to gpg. That process was passed a lazy ByteString of
input, and ghc seemed to not GC the ByteString as it was lazily read
and consumed, so memory slowly leaked as the file was read and passed
through gpg to bup.
To fix it, I simply changed the feeder to take an IO action that returns
the lazy bytestring, and fed the result directly to hPut.
AFAICS, this should change nothing WRT buffering. But somehow it makes
ghc's GC do the right thing. Probably I triggered some weakness in ghc's
GC (version 6.12.1).
(Note that S3 still has this leak, and others too. Fixing it will involve
another dance with the type system.)
Update: One theory I have is that this has something to do with
the forking of the feeder process. Perhaps, when the ByteString
is produced before the fork, ghc decides it need to hold a pointer
to the start of it, for some reason -- maybe it doesn't realize that
it is only used in the forked process.
Stalls were caused by code that did approximatly:
content' <- liftIO $ withEncryptedContent cipher content return
store content'
The return evaluated without actually reading content from S3,
and so the cleanup code began waiting on gpg to exit before
gpg could send all its data.
Fixing it involved moving the `store` type action into the IO monad:
liftIO $ withEncryptedContent cipher content store
Which was a bit of a pain to do, thank you type system, but
avoids the problem as now the whole content is consumed, and
stored, before cleanup.
I was offline last night and going by function signatures, and unable to
tell which was which. Note sure it matters to HMAC which comes first;
better safe than sorry.
Per bugs/S3_bucket_uses_the_same_key_for_encryption_and_hashing
It may be paranoid to worry about the cipher being recovered
from hmac keys, but yes.. let's be paranoid.
Forking a new process rather than relying on a thread to feed gpg.
The feeder thread was stalling, probably when the main thread got
to the point it was wait()ing on the gpg to exit.
For HMAC, using the Data.Digest.Pure.SHA library. I have been avoiding
this library for checksumming generally, since it's (probably) not
as fast as external utilities, but it's fine to use it for HMAC.