Commit graph

33775 commits

Author SHA1 Message Date
Joey Hess
4536c93bb2
cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

It means that every place a Key has any of its fields changed, the cache
has to be dropped. I've grepped and found them all. But, it would be
better to avoid that gotcha somehow..
2019-01-14 16:37:28 -04:00
Joey Hess
918868915c
rename page 2019-01-14 15:57:04 -04:00
Joey Hess
5d98cba923
use ByteStrings when reading annex symlinks and pointers
Now there's a ByteString used all the way from disk to Key.

The main complication in this conversion was the use of fromInternalGitPath
in several places to munge things on Windows. The things that used that
were changed to parse the ByteString using either path separator.

Also some code that had read from files to a String lazily was changed
to read a minimal strict ByteString.
2019-01-14 15:37:08 -04:00
Joey Hess
0a8d93cb8a
convert to ByteString 2019-01-14 14:02:47 -04:00
Joey Hess
0acbbf208f
use fileKey here
This doesn't change behavior in any way worth mentioning, but it's the
right thing to do.
2019-01-14 13:22:33 -04:00
Joey Hess
303e828b7c
rest of the deserializeKey renameing 2019-01-14 13:17:47 -04:00
Joey Hess
1791447cc8
avoid creating work tree files in subdirectories in an edge case
A keyName could contain "/", though this is unlikely and certianly only
ever could happen with WORM keys.

The change to addunused to escape that is no problem at all.

The change to VariantFile to escape it means that different versions of
git-annex could resolve a merge conflict differently in this case, which
is unfortunate. There would be different .variant files used, so the two
resolutions would themselves merge together without additional
conflicts, but the user would have to clean up the extra .variant
files.
2019-01-14 13:14:25 -04:00
Joey Hess
d3ab5e626b
rename key2file and file2key
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.

Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
2019-01-14 13:03:35 -04:00
Joey Hess
ff0a2bee2d
avoid unnecessary conversion from and back to ByteString 2019-01-14 12:40:13 -04:00
Joey Hess
52695e5925
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-13 14:03:12 -04:00
guzik.sergey@9391b6c15e4938a539e36fbe5bab71df07111d2e
ac9267a9a3 Added a comment 2019-01-13 14:50:44 +00:00
jonas
2a6f8fa74a 2019-01-13 12:34:15 +00:00
jonas
4de1aeacc2 2019-01-13 12:28:21 +00:00
reed
e8839a4357 2019-01-12 21:34:36 +00:00
Joey Hess
dc9087edc6
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-12 13:38:52 -04:00
Joey Hess
fc21cccf1c
slight optimisation more 2019-01-11 19:56:31 -04:00
Ilya_Shlyakhter
15dd1a17a1 Added a comment 2019-01-11 22:23:06 +00:00
Joey Hess
c1c976d1fa
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-11 17:25:56 -04:00
Joey Hess
4148548084
roadmap update 2019-01-11 17:25:37 -04:00
Joey Hess
e0567e4e55
devblog 2019-01-11 17:25:28 -04:00
Joey Hess
d12f4db54d
comment 2019-01-11 16:54:07 -04:00
Joey Hess
05de519d2c
update re field ordering 2019-01-11 16:51:54 -04:00
Joey Hess
727767e1e2
make everything build again after ByteString Key changes 2019-01-11 16:39:46 -04:00
Joey Hess
151562b537
convert key2file and file2key to use builder and attoparsec
The new parser is significantly stricter than the old one:

The old file2key allowed the fields to come in any order,
but the new one requires the fixed order that git-annex has always used.
Hopefully this will not cause any breakage.

And the old file2key allowed eg SHA1-m1-m2-m3-m4-m5-m6--xxxx
while the new does not allow duplication of fields. This could potentially
improve security, because allowing lots of extra junk like that in a key
could potentially be used in a SHA1 collision attack, although the current
attacks need binary data and not this kind of structured numeric data.

Speed improved of course, and fairly substantially, in microbenchmarks:

benchmarking old/key2file
time                 2.264 μs   (2.257 μs .. 2.273 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.265 μs   (2.260 μs .. 2.275 μs)
std dev              21.17 ns   (13.06 ns .. 39.26 ns)

benchmarking new/key2file'
time                 1.744 μs   (1.741 μs .. 1.747 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.745 μs   (1.742 μs .. 1.751 μs)
std dev              13.55 ns   (9.099 ns .. 21.89 ns)

benchmarking old/file2key
time                 6.114 μs   (6.102 μs .. 6.129 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.118 μs   (6.106 μs .. 6.143 μs)
std dev              55.00 ns   (30.08 ns .. 100.2 ns)

benchmarking new/file2key'
time                 1.791 μs   (1.782 μs .. 1.801 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.792 μs   (1.785 μs .. 1.804 μs)
std dev              32.46 ns   (20.59 ns .. 50.82 ns)
variance introduced by outliers: 19% (moderately inflated)
2019-01-11 16:33:42 -04:00
Joey Hess
b552551b33
use ByteString in Key for speed
This is an easy win for parseKeyVariety:

benchmarking old/parseKeyVariety
time                 1.515 μs   (1.512 μs .. 1.517 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.515 μs   (1.513 μs .. 1.517 μs)
std dev              6.417 ns   (4.992 ns .. 8.113 ns)

benchmarking new/parseKeyVariety
time                 54.97 ns   (54.70 ns .. 55.40 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 55.42 ns   (55.05 ns .. 56.03 ns)
std dev              1.562 ns   (969.5 ps .. 2.442 ns)
variance introduced by outliers: 44% (moderately inflated)

For formatKeyVariety, using a Builder is marginally worse than building a
String... (This is with criterion evaluating fully to nf not whnf)

benchmarking old/formatKeyVariety
time                 434.3 ns   (428.0 ns .. 440.4 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 430.6 ns   (428.2 ns .. 433.9 ns)
std dev              9.166 ns   (6.932 ns .. 11.94 ns)
variance introduced by outliers: 27% (moderately inflated)

benchmarking Builder/formatKeyVariety
time                 526.5 ns   (524.7 ns .. 528.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 526.1 ns   (524.9 ns .. 528.5 ns)
std dev              5.687 ns   (3.762 ns .. 8.000 ns)

Manually building the ByteString was better, but still slightly slower than String,
due to innefficient need to S.pack . show the HashSize:

benchmarking formatKeyVariety
time                 459.5 ns   (455.8 ns .. 463.2 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 459.9 ns   (457.4 ns .. 466.6 ns)
std dev              11.65 ns   (6.860 ns .. 21.41 ns)
variance introduced by outliers: 35% (moderately inflated)

So I cheated and made parseKeyVariety cache the original ByteString,
for formatKeyVariety to use instead of re-building it. Final benchmark:

benchmarking new/formatKeyVariety
time                 50.64 ns   (50.57 ns .. 50.73 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 51.05 ns   (50.60 ns .. 52.71 ns)
std dev              2.790 ns   (259.6 ps .. 5.916 ns)
variance introduced by outliers: 75% (severely inflated)

benchmarking new/parseKeyVariety
time                 71.88 ns   (71.54 ns .. 72.24 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 71.97 ns   (71.69 ns .. 72.47 ns)
std dev              1.249 ns   (910.7 ps .. 1.791 ns)
variance introduced by outliers: 22% (moderately inflated)
2019-01-11 16:32:51 -04:00
git-annex.branchable.com@1c3a8a83c15a19620a0a1a2e653d7c662fc8fe50
c30045bf75 Added a comment: get dry-run-ish option 2019-01-11 16:18:13 +00:00
guzik.sergey@9391b6c15e4938a539e36fbe5bab71df07111d2e
92013d86f6 2019-01-11 10:40:22 +00:00
6yearold@36d59212c29d2959d6532d6db7928f01541bcf83
a472cd3488 Added a comment 2019-01-11 06:28:32 +00:00
alogic0@d8fbd9f5b547237a650aa1d5605c2d3592496916
d2dd0a2b52 Added a comment: it's fixed 2019-01-11 00:54:02 +00:00
Joey Hess
c7333be02d
devblog 2019-01-10 17:24:51 -04:00
Joey Hess
ed8d9a29fe
add missing case 2019-01-10 17:17:37 -04:00
Joey Hess
2eadb6cd68
convert transitions.log to attoparsec and bytestring-builder
Not likely to be any speed gain here, but this completes porting every
log file over.

And, it let me get rid of code copied from ghc and modified, so
simplifying the licensing.
2019-01-10 17:13:30 -04:00
Joey Hess
591e4b145f
convert old uuid-based log parsers to attoparsec
This preserves the workaround for the old bug that caused NoUUID items
to be stored in the log, prefixing log lines with " ". It's now handled
implicitly, by using takeWhile1 (/= ' ') to get the uuid.

There is a behavior change from the old parser, which split the value
into words and then recombined it. That meant that "foo  bar" and "foo\tbar"
came out as "foo bar". That behavior was not documented, and seems
surprising; it meant that after a git-annex describe here "foo  bar",
you wouldn't get that same string back out when git-annex displayed repo
descriptions.

Otoh, some other parsers relied on the old behavior, and the attoparsec
rewrites had to deal with the issue themselves...

For group.log, there are some edge cases around the user providing a
group name with a leading or trailing space. The old parser would ignore
such excess whitespace. The new parser does too, because the alternative
is to refuse to parse something like " group1  group2 " due to excess
whitespace, which would be even more confusing behavior.

The only git-annex branch log file that is not converted to attoparsec
and bytestring-builder now is transitions.log.
2019-01-10 16:34:20 -04:00
Joey Hess
66603d6f75
attoparsec parsers for all new-format uuid-based logs
There should be some speed gains here, especially for chunk and remote
state logs, which are queried once per key.

Now only old-format uuid-based logs still need to be converted to attoparsec.
2019-01-10 13:30:36 -04:00
Joey Hess
7e54c215b4
Remove redundant endOfInput
parseLogLines consumes till endOfInput
2019-01-10 12:42:59 -04:00
Joey Hess
6f66b53a30
newtype Group to ByteString
This may speed up queries for things in groups, due to Eq and Ord being faster.
2019-01-09 15:05:49 -04:00
Joey Hess
3f7fe1d325
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-09 14:22:35 -04:00
Joey Hess
1928b82867
marginally faster VectorClock Builder
show of a POSIXTime is 7-bit ascii, so no need to use the filesystem
encoding on it
2019-01-09 14:17:00 -04:00
Joey Hess
232b1a08f3
simplification now that all logs use Builder 2019-01-09 14:10:05 -04:00
Joey Hess
2fef43dd71
convert all per-uuid log files to use Builder
Mostly didn't push the ByteStrings down very deep, but all of these log
files are not written to frequently at all, so slight remaining
innefficiency doesn't matter.

In Logs.UUID, removed the fixBadUUID code that cleaned up after a bug in
git-annex versions 3.20111105-3.20111110. In the unlikely event that a repo was
last touched by that ancient git-annex version, the descriptions of remotes
would appear missing when used with this version of git-annex. That is such minor
breakage, and so unlikely to still be a problem for any repos, that it was not
worth forward-porting that code to ByteString.
2019-01-09 14:00:35 -04:00
Joey Hess
de4980ef85
simplify Show instance by deriving 2019-01-09 13:13:31 -04:00
Joey Hess
11f4d993b5
fix format of show instance
It's only used to show test suite failures..
2019-01-09 13:12:25 -04:00
Joey Hess
2d46038754
converting more log files to use Builder
Probably not any particular speedup in this, since most of these logs
are not written to often. Possibly chunk log writing is sped up, but
writes to chunk logs are interleaved with expensive data transfers to
remotes, so unlikely to be a noticiable speedup.
2019-01-09 13:06:37 -04:00
duncan_bayne
74682ffaad Added a comment: Known issue 2019-01-09 09:17:20 +00:00
duncan_bayne
3126dfd84d Added a comment: Bug raised 2019-01-08 21:13:38 +00:00
Ilya_Shlyakhter
15fddad749 added suggestion to support MD5E keys that omit file size, and/or support attaching checksums to URL/WORM keys in metadata 2019-01-08 20:12:38 +00:00
Ilya_Shlyakhter
921c9f4e72 posted a question about the "here" remote 2019-01-08 19:47:05 +00:00
aaron@784fd85903b38afc8a44d117ba95e3dcce0bd230
42b8a23be4 Format post with code blocks for readability 2019-01-07 21:54:10 +00:00
Joey Hess
5500cbbc30
Merge branch 'master' of ssh://git-annex.branchable.com 2019-01-07 16:36:53 -04:00
Joey Hess
95a966666a
devblog 2019-01-07 16:36:34 -04:00