convert encode_c to ByteString

This turns out to be possible after all, because the old one decomposed
a unicode Char to multiple Word8s and encoded those. It should be faster
in some places, particularly in Git.Filename.encodeAlways.

The old version encoded all unicode by default as well as ascii control
characters and also '"'. The new one only encodes ascii control
characters by default.

That old behavior was visible in Utility.Format.format, which did escape
'"' when used in eg git-annex find --format='${escaped_file}\n'
So made sure to keep that working the same. Although the man page only
says it will escape "unusual" characters, so it might be able to be
changed.

Git.Filename.encodeAlways also needs to escape '"' ; that was the
original reason that was escaped.

Types.Transferrer I judge is ok to not escape '"', because the escaped
value is sent in a line-based protocol, which is decoded at the other
end by decode_c. So old git-annex and new will be fine whether that is
escaped or not, the result will be the same.

Note that when asked to escape a double quote, it is escaped to \"
rather than to \042. That's the same behavior as git has. It's
perhaps somehow more of a special case than it needs to be.

Sponsored-by: k0ld on Patreon
This commit is contained in:
Joey Hess 2023-04-07 16:47:26 -04:00
parent 371d4f8183
commit d9b6be7782
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 66 additions and 45 deletions

View file

@ -83,9 +83,9 @@ instance Proto.Receivable TransferRequest where
instance Proto.Sendable TransferResponse where
formatMessage (TransferOutput (OutputMessage m)) =
["om", Proto.serialize (encode_c (decodeBS m))]
["om", Proto.serialize (decodeBS (encode_c isUtf8Byte m))]
formatMessage (TransferOutput (OutputError e)) =
["oe", Proto.serialize (encode_c e)]
["oe", Proto.serialize (decodeBS (encode_c isUtf8Byte (encodeBS e)))]
formatMessage (TransferOutput BeginProgressMeter) =
["opb"]
formatMessage (TransferOutput (UpdateProgressMeterTotalSize (TotalSize sz))) =
@ -99,7 +99,7 @@ instance Proto.Sendable TransferResponse where
formatMessage (TransferOutput EndPrompt) =
["opre"]
formatMessage (TransferOutput (JSONObject b)) =
["oj", Proto.serialize (encode_c (decodeBL b))]
["oj", Proto.serialize (decodeBS (encode_c isUtf8Byte (L.toStrict b)))]
formatMessage (TransferResult True) =
["t"]
formatMessage (TransferResult False) =
@ -141,7 +141,9 @@ instance Proto.Serializable TransferRemote where
serialize (TransferRemoteUUID u) = 'u':fromUUID u
-- A remote name could contain whitespace or newlines, which needs
-- to be escaped for the protocol. Use C-style encoding.
serialize (TransferRemoteName r) = 'r':encode_c' isSpace r
serialize (TransferRemoteName r) = 'r':decodeBS (encode_c is_space_or_unicode (encodeBS r))
where
is_space_or_unicode c = isUtf8Byte c || isSpace (chr (fromIntegral c))
deserialize ('u':u) = Just (TransferRemoteUUID (toUUID u))
deserialize ('r':r) = Just (TransferRemoteName (decodeBS (decode_c (encodeBS r))))
@ -151,7 +153,7 @@ instance Proto.Serializable TransferAssociatedFile where
-- Comes last, so whitespace is ok. But, in case the filename
-- contains eg a newline, escape it. Use C-style encoding.
serialize (TransferAssociatedFile (AssociatedFile (Just f))) =
encode_c (fromRawFilePath f)
decodeBS (encode_c isUtf8Byte f)
serialize (TransferAssociatedFile (AssociatedFile Nothing)) = ""
deserialize "" = Just $ TransferAssociatedFile $