cache negative lookups of global numcopies and mincopies

Speeds up eg git-annex sync --content by up to 50%. When it does not need
to transfer or drop anything, it now noops a lot more quickly.

I didn't see anything else in sync --content noop loop that could really
be sped up. It has to cat git objects to keys, stat object files, etc.

Sponsored-by: unqueued on Patreon
This commit is contained in:
Joey Hess 2023-06-06 14:15:47 -04:00
parent 4437e187e6
commit 3c15e0f7a0
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 38 additions and 6 deletions

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="joey"
subject="""comment 14"""
date="2023-06-06T17:11:35Z"
content="""
There's only one import in the sync, and your output shows it completed
(with error).
The only other phase of sync that could be run after that and take a lot of
time is content syncing. You would have to have annex.synccontent set
somewhere for sync to do that. Do you?
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="joey"
subject="""comment 15"""
date="2023-06-06T17:31:49Z"
content="""
It would make a lot of sense for --content syncing to be what remains slow.
That has to scan over all the files and when it decides that it does not
need to copy the content anywhere, that's a tight loop with no output.
In my repo with 10000 files that was set up by the latest test case,
`git-annex sync` takes 13 seconds, and with --content it takes 61 seconds.
I optimised a numcopies/mincopies lookup away, and that got it
down to 28 seconds.
The cidsdb does not get accessed by the --content scan
in my testing, although there may be other situations where it does.
"""]]