update RepoSize database from git-annex branch incrementally

The use of catObjectStream is optimally fast. Although it might be
possible to combine this with git-annex branch merge to avoid some
redundant work.

Benchmarking, a git-annex branch that had 100000 files changed
took less than 1.88 seconds to run through this.
This commit is contained in:
Joey Hess 2024-08-17 13:30:24 -04:00
parent 8239824d92
commit d09a005f2b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
9 changed files with 115 additions and 33 deletions

View file

@ -476,19 +476,11 @@ reconcileStaged dbisnew qh = ifM isBareRepo
dbwriter dbchanged n catreader = liftIO catreader >>= \case
Just (ka, content) -> do
changed <- ka (parseLinkTargetOrPointerLazy =<< content)
!n' <- countdownToMessage n
n' <- countdownToMessage n $
showSideAction "scanning for annexed files"
dbwriter (dbchanged || changed) n' catreader
Nothing -> return dbchanged
-- When the diff is large, the scan can take a while,
-- so let the user know what's going on.
countdownToMessage n
| n < 1 = return 0
| n == 1 = do
showSideAction "scanning for annexed files"
return 0
| otherwise = return (pred n)
-- How large is large? Too large and there will be a long
-- delay before the message is shown; too short and the message
-- will clutter things up unnecessarily. It's uncommon for 1000