The new parser is significantly stricter than the old one:
The old file2key allowed the fields to come in any order,
but the new one requires the fixed order that git-annex has always used.
Hopefully this will not cause any breakage.
And the old file2key allowed eg SHA1-m1-m2-m3-m4-m5-m6--xxxx
while the new does not allow duplication of fields. This could potentially
improve security, because allowing lots of extra junk like that in a key
could potentially be used in a SHA1 collision attack, although the current
attacks need binary data and not this kind of structured numeric data.
Speed improved of course, and fairly substantially, in microbenchmarks:
benchmarking old/key2file
time 2.264 μs (2.257 μs .. 2.273 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.265 μs (2.260 μs .. 2.275 μs)
std dev 21.17 ns (13.06 ns .. 39.26 ns)
benchmarking new/key2file'
time 1.744 μs (1.741 μs .. 1.747 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.745 μs (1.742 μs .. 1.751 μs)
std dev 13.55 ns (9.099 ns .. 21.89 ns)
benchmarking old/file2key
time 6.114 μs (6.102 μs .. 6.129 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 6.118 μs (6.106 μs .. 6.143 μs)
std dev 55.00 ns (30.08 ns .. 100.2 ns)
benchmarking new/file2key'
time 1.791 μs (1.782 μs .. 1.801 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 1.792 μs (1.785 μs .. 1.804 μs)
std dev 32.46 ns (20.59 ns .. 50.82 ns)
variance introduced by outliers: 19% (moderately inflated)
git-annex allows managing files with git, without checking the file
contents into git. While that may seem paradoxical, it is useful when
dealing with files larger than git can currently easily handle, whether due
to limitations in memory, checksumming time, or disk space.
For documentation, see doc/ or <https://git-annex.branchable.com/>