It would be nice if command-specific options were supported. The first
difficulty is that which command is being called is not known until after
getopt; but that could be worked around by finding the first non-dashed
parameter. Storing the settings without putting them in the annex monad is
the next difficulty; it could perhaps be handled by making the seek stage
pass applicable settings into the start stage (and from there on to perform
as needed). But that still leaves a problem, what data type to use to
represent the options between getopt and seek?
Left out the backend usage graph for now, and bad/temp directory sizes
are only displayed when present. Also, disk usage is returned as a string
with units, which I can see changing later.
This is actually tricky, 45bbf210a1 added
the escaping because it's needed for rsync that does go over ssh.
So I had to detect whether the remote's rsync url will use ssh or not,
and vary the escaping.
The branch may not exist, if .git/annex has been copied over from another
repo (or a corrupted repo). I suppose it could also have gotten deleted
somehow. Without this, there is a confusing failure.
In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for
those. Note that this does not prevent mixing up of eg, refs and branches
at the type level. Since git really doesn't care, except rare cases like
git update-ref, or git tag -d, that seems ok for now.
There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref
may or may not be a tree-ish, depending on the object type, so there seems
no point in trying to represent it at the type level.
semitrusted uuids rarely are listed in trust.log, so a special case
is needed to get a list of them. Take the difference of all known uuids
with non-semitrusted uuids.
Before, a merge was first calculated, by running various actions that
called git and built up a list of lines, which were at the end sent
to git update-index. This necessarily used space proportional to the size
of the diff between the trees being merged.
Now, lines are streamed into git update-index from each of the actions in
turn.
Runtime size of git-annex merge when merging 50000 location log files
drops from around 100 mb to a constant 4 mb.
Presumably it runs quite a lot faster, too.
This reduces the memory use of a merge by 1/3rd. The space leak was
apparently because the whole update-index input was generated strictly, not
lazily.
I wondered if the change to ByteStrings contributed to this, due to the
need to convert with L.pack here. But going back to the old code, I still
see a much similar leak, and worse performance besides due to it not using
ByteStrings.
The fix is to just hPutStr the lines repeatedly. (Note the \0 is written
separately, to avoid allocation overheads in adding it to the string.)
The Git.pipeWrite interface is probably just wrong for any large inputs to
git. This was the only place using it for input of any size.
There is still at least one other space leak in the merge code.