Commit graph

33715 commits

Author SHA1 Message Date
Joey Hess
cb375977a6
follow-on changes from MetaData type changes
Including writing and parsing the metadata log files with
bytestring-builder and attoparsec.
2019-01-07 15:51:05 -04:00
Joey Hess
16c798b5ef
switch MetaValue to ByteString and MetaField to Text
MetaField was already limited to alphanumerics, so it makes sense to use
Text for it.

Note that technically a UUID can contain invalid UTF-8, and so
remoteMetaDataPrefix's use of T.pack . fromUUID could replace non-UTF8
values with '?' or whatever. In practice, a UUID is usually also text,
I only kept open the possibility of it containing invalid UTF-8 to avoid
breaking parsing of strange UUIDs in git-annex branch files. So, I
decided to let this edge case slip by.

Have not updated the rest of the code base yet for this change, as the
change took 2.5 hours longer than I expected to get working properly.
2019-01-07 14:18:24 -04:00
Joey Hess
a80922a594
support for ByteStrings 2019-01-07 12:29:25 -04:00
Joey Hess
ccd75c60d2
correct ghc version number 2019-01-05 16:07:53 -04:00
Joey Hess
2e0e557e75
Support being built with ghc 8.0.1 (MonadFail)
Tested on an older ghc by enabling MonadFailDesugaring globally.

In TransferQueue, the lack of a MonadFail for STM exposed what would
normally be a bug in the pattern matching, although in this case an
earlier check that the queue was not empty avoided a pattern match
failure.
2019-01-05 11:55:15 -04:00
Joey Hess
6ec993252e
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-05 11:27:49 -04:00
Joey Hess
7a74f2497b
update 2019-01-05 11:27:04 -04:00
Joey Hess
5ba14b5095
build cleanrly when benchmark flag is not enabled 2019-01-05 08:09:28 -04:00
Joey Hess
fc3fd0cfe0
do union merge on bytestrings
My concern with using bytestring for this is the file needs to be split
into lines, and the encoding is not known. It's safe to split a utf-8
encoded file on the \n byte; only newlines get encoded to that byte in utf-8.
And this code already assumes utf-8 or ascii encoding, because it used
the filesystem encoding.
2019-01-05 08:06:47 -04:00
Chymera
ff668bba28 Added a comment 2019-01-04 23:05:43 +00:00
Chymera
7b933afadb Added a comment 2019-01-04 21:08:54 +00:00
Chymera
efe755d4ae 2019-01-04 21:02:16 +00:00
Joey Hess
9cf9ef5077
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-04 15:12:02 -04:00
Joey Hess
c73698fb33
devblog 2019-01-04 15:11:16 -04:00
pthomasdelaney@9b04608ad7e837fde64ab60a285a7b7254b5bb26
d5c4100da3 2019-01-04 18:23:45 +00:00
geoffrey.jost
fbad26da1e Updated Densho description , formerly "Japanese American Legacy Project", and total collection size. 2019-01-04 17:55:54 +00:00
Joey Hess
11d6e2e260
new improved benchmark command that can benchmark anything git-annex does 2019-01-04 13:46:36 -04:00
chocolate.camera@ec2ecab153906be21ac5f36652c33786ad0e0b60
f2379492a0 Added a comment 2019-01-04 16:11:12 +00:00
Joey Hess
3b3d31583b
explicitly default benchmark build flag to false 2019-01-04 11:24:16 -04:00
andrew
7b4cfab08b Added a comment 2019-01-04 12:49:51 +00:00
Joey Hess
40db44da19
format 2019-01-04 00:38:03 -04:00
Chymera
aef47a57d7 Added a comment 2019-01-04 03:52:22 +00:00
andrew
78cf95cd86 Added a comment 2019-01-03 23:48:03 +00:00
Chymera
6c1cf18a27 Added a comment 2019-01-03 22:56:25 +00:00
Joey Hess
2b27717c20
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-03 16:12:32 -04:00
Joey Hess
56e2712376
update 2019-01-03 16:12:02 -04:00
Joey Hess
e3c410cc77
devblog 2019-01-03 16:11:57 -04:00
Chymera
55bd348dd5 removed 2019-01-03 19:43:53 +00:00
Chymera
9220561ce5 Added a comment 2019-01-03 19:43:24 +00:00
Chymera
7d0054b174 Added a comment 2019-01-03 19:40:22 +00:00
Chymera
35ef2e5c60 Added a comment 2019-01-03 19:33:02 +00:00
Joey Hess
ef8ddaa713
attoparsec parser for presence logs 2019-01-03 15:27:29 -04:00
colin.brosseau@d444b2b3412af38b85f7b4b340b9c44a412b5698
938ca1c698 Added a comment: NTFS Make it clear that it'll not work with annex.thin 2019-01-03 18:04:58 +00:00
Joey Hess
bfc9039ead
convert git-annex branch access to ByteStrings and Builders
Most of the individual logs are not converted yet, only presense logs
have an efficient ByteString Builder implemented so far. The rest
convert to and from String.
2019-01-03 13:21:48 -04:00
Joey Hess
53905490df
convert Git.HashObject to use ByteStrings
Both lazy and strict, because sometimes it's more efficient to build a
small strict bytestring, and other times better to lazily stream.
2019-01-03 13:21:01 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
ka7
328773f807 Added a comment: (better formating..) 2019-01-03 14:15:21 +00:00
ka7
7092fe9c1a Added a comment: got it 2019-01-03 14:10:19 +00:00
CandyAngel
a9d85d2993 Added a comment 2019-01-03 12:29:41 +00:00
ka7
934a1176cd 2019-01-03 11:17:31 +00:00
Joey Hess
f574d8af10
comment typo 2019-01-03 00:22:05 -04:00
andrew
d2062ed057 Added a comment 2019-01-03 01:32:28 +00:00
Ilya_Shlyakhter
6835cd3957 asked about per-branch git-annex branches 2019-01-02 21:12:13 +00:00
Joey Hess
1aebc356e4
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-02 16:18:30 -04:00
Joey Hess
384eda5af7
devblog 2019-01-02 16:18:06 -04:00
insec
3c21b0fef7 2019-01-02 20:03:42 +00:00
chocolate.camera@ec2ecab153906be21ac5f36652c33786ad0e0b60
79820eb072 2019-01-02 19:36:21 +00:00
kirelagin@6d93475882c55a329fedae6be1971868a775ec7e
2122de1ad6 2019-01-02 17:37:57 +00:00
Joey Hess
3ba6e9bb96
use attoparsec parser for String parsing, 10x speedup
This is not as efficient as using ByteStrings throughout, but converting
the String to ByteString is actually significantly faster than the old
parser.

    benchmarking parse/old
    time                 9.657 μs   (9.600 μs .. 9.732 μs)
                         1.000 R²   (0.999 R² .. 1.000 R²)
    mean                 9.703 μs   (9.645 μs .. 9.785 μs)
    std dev              231.6 ns   (161.5 ns .. 323.7 ns)
    variance introduced by outliers: 25% (moderately inflated)

    benchmarking parse/new
    time                 834.6 ns   (797.1 ns .. 886.9 ns)
                         0.987 R²   (0.976 R² .. 0.999 R²)
    mean                 816.4 ns   (802.7 ns .. 845.1 ns)
    std dev              62.39 ns   (37.66 ns .. 108.4 ns)
    variance introduced by outliers: 82% (severely inflated)

There is a small behavior change from the old parsePOSIXTime,
which accepted any amount of trailing whitespace after the timestamp.
That behavior was not documented, and it doesn't seem anything relied on it.
2019-01-02 13:28:44 -04:00
Joey Hess
3c74dcd4e1
attoparsec parser for POSIXTime
(Not yet used anywhere.)

Benchmarking

{-# LANGUAGE OverloadedStrings #-}

import Criterion.Main
import Utility.TimeStamp
import Data.Attoparsec.ByteString

main = defaultMain
	[ bgroup "parse"
		[ bench "new" $ whnf (parseOnly (parserPOSIXTime <* endOfInput)) "1431286201.113452s"
		, bench "old" $ whnf parsePOSIXTime "1431286201.113452s"
		]
	]

benchmarking parse/new
time                 643.6 ns   (640.2 ns .. 646.7 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 645.3 ns   (642.1 ns .. 650.9 ns)
std dev              14.59 ns   (9.194 ns .. 22.07 ns)
variance introduced by outliers: 29% (moderately inflated)

benchmarking parse/old
time                 9.657 μs   (9.600 μs .. 9.732 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 9.703 μs   (9.645 μs .. 9.785 μs)
std dev              231.6 ns   (161.5 ns .. 323.7 ns)
variance introduced by outliers: 25% (moderately inflated)

So old took 9703 ns to parse, and new 643 ns.
2019-01-02 12:48:53 -04:00