sped up the --all option by 2x to 16x by using git cat-file --buffer

This assumes that no location log files will have a newline or carriage
return in their name. catObjectStream skips any such files due to
cat-file not supporting them.

Keys have been prevented from containing newlines since 2011,
commit 480495beb4. If some old repo
had a key with a newline in it, --all will just skip processing that key.
Other things, like .git/annex/unused files certianly assume no newlines in
keys too, and AFAICR, such keys never actually worked.

Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys
generated before that point could perhaps contain a CR. (URL probably not,
http probably doesn't support an URL with a raw CR in it.) So, added
a warning in fsck about such keys. Although, fsck --all will naturally
skip them, so won't be able to warn about them. Not entirely
satisfactory, but I'll bet there are not really any such keys in
existence.

Thanks to Lukey for finding this optimisation.
This commit is contained in:
Joey Hess 2020-07-07 13:46:45 -04:00
parent 98e2e3cb9c
commit d010ab04be
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 162 additions and 24 deletions

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="joey"
subject="""comment 10"""
date="2020-07-07T17:13:46Z"
content="""
Wow, I implemented your --buffer trick, and `get --all`
is over 2x faster. `sync --content --all` somewhat less, but still
another decent improvement there. (cold cache timings)
And some warm cache times are *much* faster than my cold cache benchmarks.
`get --all` is 17x faster in a 10k file repo, which makes it only 3x slower
than `get` without --all.
I think I will still leave this open because it's still worth considering
sqlite caching or finding a way to speed up the second sync --all pass...
but would be interested to know how your use case is improved now.
Please feel free to find optimisations anytime, I really appreciate it.
"""]]