Came up with a generic way to filter out progress messages while keeping
errors, for commands that use stderr for both.
--json mode will disable command outputs too.
More aggressive rsync params fixup for windows. Param may contain a url, or
a file path, so check if it looks like a local file path and if so, fix it
up.
On windows only, rsyncUrlIsPath will treat c:foo as a path, rather than as
a rsyncurl starting with a host "c".
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.
Done as a background task while listening to episode 2 of the Type Theory
podcast.
This reaping of any processes came to cause me problems when redoing the
rsync special remote -- a gpg process that was running gets waited on and
the place that then checks its return code fails.
I cannot reproduce any zombies when using the rsync special remote.
But I still can when using a normal git remote, accessed over ssh.
There is 1 zombie per file downloaded without this horrible hack enabled.
So, move the hack to only be used in that case.
There was confusion in different parts of the progress bar code about
whether an update contained the total number of bytes transferred, or the
number of bytes transferred since the last update. One way this bug
showed up was progress bars that seemed to stick at zero for a long time.
In order to fix it comprehensively, I add a new BytesProcessed data type,
that is explicitly a total quantity of bytes, not a delta.
Note that this doesn't necessarily fix every problem with progress bars.
Particularly, buffering can now cause progress bars to seem to run ahead
of transfers, reaching 100% when data is still being uploaded.
In general, git-annex does not try to preserve file permissions. For
example, they don't round trip through special remotes. So it's ok to not
preserve them for git remotes either.
On crippled filesystems, rsync has been observed failing after the file
was transferred because it couldn't set some permission or other.
When rsyncProgress pipes rsync's stdout, this turns out to cause a ssh
process started by rsync to be left behind as a zombie. I don't know why,
but my recent zombie reaping cleanup was correct, it's just that this other
zombie, that's not directly started by git-annex, was no longer reaped
due to changes in the cleanup. Make rsyncProgress reap the zombie started
by rsync, as a workaround.
FWIW, the process tree looks like this. It seems like the rsync child
is for some reason starting but not waiting on this extra ssh process.
Ssh connection caching may be involved -- disabling it seemed to change
the shape of the tree, but did not eliminate the zombie.
9378 pts/14 S+ 0:00 | \_ rsync -p --progress --inplace -4 -e 'ssh' '-S' ...
9379 pts/14 S+ 0:00 | | \_ ssh ...
9380 pts/14 S+ 0:00 | | \_ rsync -p --progress --inplace -4 -e 'ssh' '-S' ...
9381 pts/14 Z+ 0:00 | \_ [ssh] <defunct>
This seems to fix a problem I've recently seen where ctrl-c during rsync
leads to `git annex get` moving on to the next thing rather than exiting.
Seems likely that started happening with the switch to System.Process
(d1da9cf221), as the old code took care
to install a default SIGINT handler.
Note that since the bug was only occurring sometimes, I am not 100% sure
I've squashed it, although I seem to have.
Current implementation parses rsync's output a character a time, which
is hardly efficient. It could be sped up a lot by using hGetBufSome,
but that would require going really lowlevel, down to raw C style buffers
(good example of that here: http://users.aber.ac.uk/afc/stricthaskell.html)
But rsync doesn't output very much, so currently it seems ok.