As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.
It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.
This commit was supported by the NSF-funded DataLad project.
0.1s seemed a bit too jumpy. But kept 0.1s for --json-progress in case
some consumer depends on it.
BTW, the way rsync handles it is one progress update after every chunk
the rolling checksum algo identifies. So it's not a fixed delay. Weird.
Noticed that getting a key whose size is not known resulted in a
progress display that didn't include the percent complete.
Fixed for P2P by making the size sent with DATA be used to update the
meter's total size.
In order for rateLimitMeterUpdate to also learn the total size,
had to make it be passed the Meter, and some other reorg in
Utility.Metered was also done so that --json-progress can construct a
Meter to pass to rateLimitMeterUpdate.
When the fallback rsync is done, the progress display still doesn't
include the percent complete. Only way to fix that seems to be to let rsync
display its output again, but that would conflict with git-annex's
own progress meter, which is also being displayed.
This commit was sponsored by Henrik Riomar on Patreon.
Note that, due to not using rsync to transfer files to ssh remotes
any longer, permissions and other file metadata of annexed files
will no longer be preserved when copying them to ssh remotes.
Other remotes never supported preserving that information, so
this is not considered a regression. Added NEWS item about this.
Another significant side effect of this is that, even when rsync is run to
retrieve a file, its progress display will no longer be shown, and
instead the native git-annex progress display will appear. It would be
possible to use the rsync process display when rsync is used (old
git-annex-shell and also retrieval from a local repository), but it
would have complicated the code unncessarily, and been inconsistent
behavior.
(I'd been thinking for a while about eliminating the rsync progress
display, since it's got some annoying verbosities, including display of
the key and the "(xfr#1, to-chk=0/1)" bit and was already somewhat
inconsistent.)
retrieveKeyFileCheap still uses rsync, since that ensures that it gets
the actual file content from the remote. Using the P2P protocol would
use the local content, as long as the local and remote size are the
same.
This commit was sponsored by John Pellman on Patreon.
msg is what would be output to stderr, so has some layout and
formatting, which is perhaps not ideal, but let's at least avoid it
containing line breaks.
Always include error-messages field, even if empty,
to make the json be self-documenting.
This was a design requirement for --json-error-messages.
This commit was supported by the NSF-funded DataLad project.
Fix behavior of --json-progress followed by --json, in which
the latter option disabled the former.
This commit was supported by the NSF-funded DataLad project.
--json: When there are multiple lines of notes about a file, make the note
field multiline, rather than the old behavior of only including the last
line.
Using newlines in the note is perhaps not ideal, but upgrading it to an
array in this case would be an annoying inconsistency to need to deal with.
This commit was sponsored by Ole-Morten Duesund on Patreon.
It's left up to the special remote to detect when git-annex is new enough
to support the message; an old git-annex will blow up.
This commit was supported by the NSF-funded DataLad project.
When built with concurrent-output 1.9, ssh password prompts will no longer
interfere with the -J display.
To avoid flicker, only done when ssh actually does need to prompt;
ssh is first run in batch mode and if that succeeds the connection is up
and no need to clear regions.
This commit was supported by the NSF-funded DataLad project.
Removed dependency on MissingH, instead depending on the split
library.
After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.
Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.
This commit was sponsored by Shane-o on Patreon.
When ssh connection caching is enabled (and when GIT_ANNEX_USE_GIT_SSH is
not set), only one ssh password prompt will be made per host, and only one
ssh password prompt will be made at a time.
This also fixes a race in prepSocket's stale ssh connection stopping
when run with -J. It was possible for one thread to start a cached ssh
connection, and another thread to immediately stop it, resulting in excess
connections being made.
This commit was supported by the NSF-funded DataLad project.
This gets rid of quite a lot of ugly hacks around json generation.
I doubt that any real-world json parsers can parse incomplete objects, so
while it's not as nice to need to wait for the complete object, especially
for commands like `git annex info` that take a while, it doesn't seem worth
the added complexity.
This also causes the order of fields within the json objects to be
reordered. Since any real json parser shouldn't care, the only possible
problem would be with ad-hoc parsers of the old json output.
Avoid threads emitting json at the same time and scrambling, which was
still possible even with the buffering, just less likely.
Converted json IO actions to JSONChunk data too.
This makes -Jn work with --json and --quiet, where before
setting -Jn disabled those options.
Concurrent json output is currently a mess though since threads output
chunks over top of one-another.
Keeping Text.JSON use for now, because it seems a better fit for most of
the commands, which don't use very structured JSON objects, but just output
whatever fields suites them. But this lets Aeson be used when a more
structured data type is available to serialize to JSON.
Instead -J will behave as if it was built without concurrent-output support
in this situation. Ie, it will be mostly quiet, except when there's an
error.
Note that it's not a problem for a filename to contain invalid utf-8 when
in a utf-8 locale. That is handled ok by concurrent-output. It's only
displaying unicode characters in a non-unicode locale that doesn't work.