Commit graph

33792 commits

Author SHA1 Message Date
CandyAngel
699f23ee2c Added a comment 2019-01-15 08:30:11 +00:00
Joey Hess
901fba3173
fix validKeyName to account unicode again
It used to, but that was lost in the bytestring conversion recently.

20 * 4 = 80, but I only increased it to 64, which would be up to 16
4-byte unicode characters.
2019-01-14 19:03:25 -04:00
Joey Hess
745ecccd0e
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-14 19:00:56 -04:00
Joey Hess
d79ac08532
devblog 2019-01-14 19:00:38 -04:00
Joey Hess
f289663611
correct benchmark
I think I ran the original benchmark in some subdir of my big repo,
which is not a good test case. Updated with value from newly created
repo of 1000 files.
2019-01-14 18:48:03 -04:00
Joey Hess
d9a33d98cf
remove unused import 2019-01-14 18:29:10 -04:00
Joey Hess
0e44985210
remove duplicate import 2019-01-14 18:26:38 -04:00
Joey Hess
43ec130c03
new comment (and rename for consistency) 2019-01-14 18:01:02 -04:00
Joey Hess
d5bbf123fd
bugfix
The first item in the list from split '&' did not start with a '&'
2019-01-14 17:42:18 -04:00
Joey Hess
e0c4ac99b5
convert serializeKey' to strict ByteString
The builder produces a lazy ByteString, and L.toStrict has to copy it,
but needing to use the builder is no longer to common case; the
serialization will normally be cached already as a strict ByteString,
and this avoids keyFile' needing to use L.toStrict . serializeKey'
2019-01-14 17:03:46 -04:00
Joey Hess
4536c93bb2
cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

It means that every place a Key has any of its fields changed, the cache
has to be dropped. I've grepped and found them all. But, it would be
better to avoid that gotcha somehow..
2019-01-14 16:37:28 -04:00
Ilya_Shlyakhter
27cf71e7e4 added suggestion for RecentChanges page 2019-01-14 20:18:37 +00:00
Joey Hess
918868915c
rename page 2019-01-14 15:57:04 -04:00
Joey Hess
5d98cba923
use ByteStrings when reading annex symlinks and pointers
Now there's a ByteString used all the way from disk to Key.

The main complication in this conversion was the use of fromInternalGitPath
in several places to munge things on Windows. The things that used that
were changed to parse the ByteString using either path separator.

Also some code that had read from files to a String lazily was changed
to read a minimal strict ByteString.
2019-01-14 15:37:08 -04:00
Joey Hess
0a8d93cb8a
convert to ByteString 2019-01-14 14:02:47 -04:00
Joey Hess
0acbbf208f
use fileKey here
This doesn't change behavior in any way worth mentioning, but it's the
right thing to do.
2019-01-14 13:22:33 -04:00
Joey Hess
303e828b7c
rest of the deserializeKey renameing 2019-01-14 13:17:47 -04:00
Joey Hess
1791447cc8
avoid creating work tree files in subdirectories in an edge case
A keyName could contain "/", though this is unlikely and certianly only
ever could happen with WORM keys.

The change to addunused to escape that is no problem at all.

The change to VariantFile to escape it means that different versions of
git-annex could resolve a merge conflict differently in this case, which
is unfortunate. There would be different .variant files used, so the two
resolutions would themselves merge together without additional
conflicts, but the user would have to clean up the extra .variant
files.
2019-01-14 13:14:25 -04:00
Joey Hess
d3ab5e626b
rename key2file and file2key
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.

Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
2019-01-14 13:03:35 -04:00
Joey Hess
ff0a2bee2d
avoid unnecessary conversion from and back to ByteString 2019-01-14 12:40:13 -04:00
Ilya_Shlyakhter
9953e3d353 Added a comment 2019-01-14 16:11:39 +00:00
CandyAngel
3681195e5a Added a comment 2019-01-14 12:36:04 +00:00
qiang.fang@ddaed0de00c2925f8036e6c61ce6e12654263ada
0edfc46668 2019-01-14 12:32:09 +00:00
jonas
9ee7c65d34 Added a comment 2019-01-13 20:40:09 +00:00
jonas
fd70e71b9a Added a comment 2019-01-13 20:21:30 +00:00
andrew
f605156e42 2019-01-13 18:11:07 +00:00
Joey Hess
52695e5925
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-13 14:03:12 -04:00
guzik.sergey@9391b6c15e4938a539e36fbe5bab71df07111d2e
ac9267a9a3 Added a comment 2019-01-13 14:50:44 +00:00
jonas
2a6f8fa74a 2019-01-13 12:34:15 +00:00
jonas
4de1aeacc2 2019-01-13 12:28:21 +00:00
reed
e8839a4357 2019-01-12 21:34:36 +00:00
Joey Hess
dc9087edc6
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-12 13:38:52 -04:00
Joey Hess
fc21cccf1c
slight optimisation more 2019-01-11 19:56:31 -04:00
Ilya_Shlyakhter
15dd1a17a1 Added a comment 2019-01-11 22:23:06 +00:00
Joey Hess
c1c976d1fa
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-11 17:25:56 -04:00
Joey Hess
4148548084
roadmap update 2019-01-11 17:25:37 -04:00
Joey Hess
e0567e4e55
devblog 2019-01-11 17:25:28 -04:00
Joey Hess
d12f4db54d
comment 2019-01-11 16:54:07 -04:00
Joey Hess
05de519d2c
update re field ordering 2019-01-11 16:51:54 -04:00
Joey Hess
727767e1e2
make everything build again after ByteString Key changes 2019-01-11 16:39:46 -04:00
Joey Hess
151562b537
convert key2file and file2key to use builder and attoparsec
The new parser is significantly stricter than the old one:

The old file2key allowed the fields to come in any order,
but the new one requires the fixed order that git-annex has always used.
Hopefully this will not cause any breakage.

And the old file2key allowed eg SHA1-m1-m2-m3-m4-m5-m6--xxxx
while the new does not allow duplication of fields. This could potentially
improve security, because allowing lots of extra junk like that in a key
could potentially be used in a SHA1 collision attack, although the current
attacks need binary data and not this kind of structured numeric data.

Speed improved of course, and fairly substantially, in microbenchmarks:

benchmarking old/key2file
time                 2.264 μs   (2.257 μs .. 2.273 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.265 μs   (2.260 μs .. 2.275 μs)
std dev              21.17 ns   (13.06 ns .. 39.26 ns)

benchmarking new/key2file'
time                 1.744 μs   (1.741 μs .. 1.747 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.745 μs   (1.742 μs .. 1.751 μs)
std dev              13.55 ns   (9.099 ns .. 21.89 ns)

benchmarking old/file2key
time                 6.114 μs   (6.102 μs .. 6.129 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.118 μs   (6.106 μs .. 6.143 μs)
std dev              55.00 ns   (30.08 ns .. 100.2 ns)

benchmarking new/file2key'
time                 1.791 μs   (1.782 μs .. 1.801 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.792 μs   (1.785 μs .. 1.804 μs)
std dev              32.46 ns   (20.59 ns .. 50.82 ns)
variance introduced by outliers: 19% (moderately inflated)
2019-01-11 16:33:42 -04:00
Joey Hess
b552551b33
use ByteString in Key for speed
This is an easy win for parseKeyVariety:

benchmarking old/parseKeyVariety
time                 1.515 μs   (1.512 μs .. 1.517 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.515 μs   (1.513 μs .. 1.517 μs)
std dev              6.417 ns   (4.992 ns .. 8.113 ns)

benchmarking new/parseKeyVariety
time                 54.97 ns   (54.70 ns .. 55.40 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 55.42 ns   (55.05 ns .. 56.03 ns)
std dev              1.562 ns   (969.5 ps .. 2.442 ns)
variance introduced by outliers: 44% (moderately inflated)

For formatKeyVariety, using a Builder is marginally worse than building a
String... (This is with criterion evaluating fully to nf not whnf)

benchmarking old/formatKeyVariety
time                 434.3 ns   (428.0 ns .. 440.4 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 430.6 ns   (428.2 ns .. 433.9 ns)
std dev              9.166 ns   (6.932 ns .. 11.94 ns)
variance introduced by outliers: 27% (moderately inflated)

benchmarking Builder/formatKeyVariety
time                 526.5 ns   (524.7 ns .. 528.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 526.1 ns   (524.9 ns .. 528.5 ns)
std dev              5.687 ns   (3.762 ns .. 8.000 ns)

Manually building the ByteString was better, but still slightly slower than String,
due to innefficient need to S.pack . show the HashSize:

benchmarking formatKeyVariety
time                 459.5 ns   (455.8 ns .. 463.2 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 459.9 ns   (457.4 ns .. 466.6 ns)
std dev              11.65 ns   (6.860 ns .. 21.41 ns)
variance introduced by outliers: 35% (moderately inflated)

So I cheated and made parseKeyVariety cache the original ByteString,
for formatKeyVariety to use instead of re-building it. Final benchmark:

benchmarking new/formatKeyVariety
time                 50.64 ns   (50.57 ns .. 50.73 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 51.05 ns   (50.60 ns .. 52.71 ns)
std dev              2.790 ns   (259.6 ps .. 5.916 ns)
variance introduced by outliers: 75% (severely inflated)

benchmarking new/parseKeyVariety
time                 71.88 ns   (71.54 ns .. 72.24 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 71.97 ns   (71.69 ns .. 72.47 ns)
std dev              1.249 ns   (910.7 ps .. 1.791 ns)
variance introduced by outliers: 22% (moderately inflated)
2019-01-11 16:32:51 -04:00
git-annex.branchable.com@1c3a8a83c15a19620a0a1a2e653d7c662fc8fe50
c30045bf75 Added a comment: get dry-run-ish option 2019-01-11 16:18:13 +00:00
guzik.sergey@9391b6c15e4938a539e36fbe5bab71df07111d2e
92013d86f6 2019-01-11 10:40:22 +00:00
6yearold@36d59212c29d2959d6532d6db7928f01541bcf83
a472cd3488 Added a comment 2019-01-11 06:28:32 +00:00
alogic0@d8fbd9f5b547237a650aa1d5605c2d3592496916
d2dd0a2b52 Added a comment: it's fixed 2019-01-11 00:54:02 +00:00
Joey Hess
c7333be02d
devblog 2019-01-10 17:24:51 -04:00
Joey Hess
ed8d9a29fe
add missing case 2019-01-10 17:17:37 -04:00
Joey Hess
2eadb6cd68
convert transitions.log to attoparsec and bytestring-builder
Not likely to be any speed gain here, but this completes porting every
log file over.

And, it let me get rid of code copied from ghc and modified, so
simplifying the licensing.
2019-01-10 17:13:30 -04:00
Joey Hess
591e4b145f
convert old uuid-based log parsers to attoparsec
This preserves the workaround for the old bug that caused NoUUID items
to be stored in the log, prefixing log lines with " ". It's now handled
implicitly, by using takeWhile1 (/= ' ') to get the uuid.

There is a behavior change from the old parser, which split the value
into words and then recombined it. That meant that "foo  bar" and "foo\tbar"
came out as "foo bar". That behavior was not documented, and seems
surprising; it meant that after a git-annex describe here "foo  bar",
you wouldn't get that same string back out when git-annex displayed repo
descriptions.

Otoh, some other parsers relied on the old behavior, and the attoparsec
rewrites had to deal with the issue themselves...

For group.log, there are some edge cases around the user providing a
group name with a leading or trailing space. The old parser would ignore
such excess whitespace. The new parser does too, because the alternative
is to refuse to parse something like " group1  group2 " due to excess
whitespace, which would be even more confusing behavior.

The only git-annex branch log file that is not converted to attoparsec
and bytestring-builder now is transitions.log.
2019-01-10 16:34:20 -04:00