Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.
As well as being an optimisation, this helps move toward eliminating use of
missingh.
(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)
I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
This removes a bit of complexity, and should make things faster
(avoids tokenizing Params string), and probably involve less garbage
collection.
In a few places, it was useful to use Params to avoid needing a list,
but that is easily avoided.
Problems noticed while doing this conversion:
* Some uses of Params "oneword" which was entirely unnecessary
overhead.
* A few places that built up a list of parameters with ++
and then used Params to split it!
Test suite passes.
This is certianly a cabal bug for not passing the build options in the
cabal file when building Setup.hs.
And, why oh why did ghc enable this warning by default? So unhappy with
this choice.
A git pathspec is a filename, except when it starts with ':', it's taken
to refer to a branch, etc. Rather than special case ':', any filename
starting with anything unusual is prefixed with "./"
This could have been a real mess to deal with, but luckily SafeCommand
is already extensively used and so we know at the type level the difference
between parameters that are files, and parameters that are command options.
Testing did show that Git.Queue was not using SafeCommand on
filenames fed to xargs. (Filenames starting with '-' worked before only
because -- was used to separate filenames from options when calling eg git
add.)
The test suite now passes with filenames starting with ':'. However, I did
not keep that change to it, because such filenames are probably not legal
on windows, and I have enough ugly windows ifdefs in there as it is.
This commit was sponsored by Otavio Salvador. Thanks!
This seems to fix a problem I've recently seen where ctrl-c during rsync
leads to `git annex get` moving on to the next thing rather than exiting.
Seems likely that started happening with the switch to System.Process
(d1da9cf221), as the old code took care
to install a default SIGINT handler.
Note that since the bug was only occurring sometimes, I am not 100% sure
I've squashed it, although I seem to have.
Make Utility.Process wrap the parts of System.Process that I use,
and add debug logging to them.
Also wrote some higher-level code that allows running an action
with handles to a processes stdin or stdout (or both), and checking
its exit status, all in a single function call.
As a bonus, the debug logging now indicates whether the process
is being run to read from it, feed it data, chat with it (writing and
reading), or just call it for its side effect.
Test suite now passes with -threaded!
I traced back all the hangs with -threaded to System.Cmd.Utils. It seems
it's just crappy/unsafe/outdated, and should not be used. System.Process
seems to be the cool new thing, so converted all the code to use it
instead.
In the process, --debug stopped printing commands it runs. I may try to
bring that back later.
Note that even SafeSystem was switched to use System.Process. Since that
was a modified version of code from System.Cmd.Utils, it needed to be
converted too. I also got rid of nearly all calls to forkProcess,
and all calls to executeFile, which I'm also doubtful about working
well with -threaded.