The bug was that with --json, output lines were sometimes doubled. For
example, git annex init --json would output two lines, despite only running
one thing. Adding to the weirdness, this only occurred when the output
was redirected to a pipe or a file.
Strace showed two processes outputting the same buffered output.
The second process was this writer process (only needed to work around
bug #624389):
_ <- forkProcess $ do
hPutStr toh $ unlines paths
hClose toh
exitSuccess
The doubled output occurs when this process exits, and ghc flushes the
inherited stdout buffer. Why only when piping? I don't know, but ghc may
be behaving differently when stdout is not a terminal.
While this is quite possibly a ghc bug, there is a nice fix in git-annex.
Explicitly flushing after each chunk of json is output works around the
problem, and as a side effect, json is streamed rather than being output
all at the end when performing an expensive operaition.
However, note that this means all uses of putStr in git-annex must be
explicitly flushed. The others were, already.
It would be nice if command-specific options were supported. The first
difficulty is that which command is being called is not known until after
getopt; but that could be worked around by finding the first non-dashed
parameter. Storing the settings without putting them in the annex monad is
the next difficulty; it could perhaps be handled by making the seek stage
pass applicable settings into the start stage (and from there on to perform
as needed). But that still leaves a problem, what data type to use to
represent the options between getopt and seek?
Left out the backend usage graph for now, and bad/temp directory sizes
are only displayed when present. Also, disk usage is returned as a string
with units, which I can see changing later.
This is actually tricky, 45bbf210a1 added
the escaping because it's needed for rsync that does go over ssh.
So I had to detect whether the remote's rsync url will use ssh or not,
and vary the escaping.