update RepoSize database from git-annex branch incrementally

The use of catObjectStream is optimally fast. Although it might be
possible to combine this with git-annex branch merge to avoid some
redundant work.

Benchmarking, a git-annex branch that had 100000 files changed
took less than 1.88 seconds to run through this.
This commit is contained in:
Joey Hess 2024-08-17 13:30:24 -04:00
parent 8239824d92
commit d09a005f2b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
9 changed files with 115 additions and 33 deletions

View file

@ -16,6 +16,7 @@ import Logs.Presence.Pure
import Control.Concurrent
import qualified Data.Map.Strict as M
import qualified Data.Set as S
updateRepoSize :: UUID -> Key -> LogStatus -> Annex ()
updateRepoSize u k s = do
@ -46,3 +47,8 @@ removeKeyRepoSize k mrs = case mrs of
Nothing -> Nothing
where
ksz = fromMaybe 0 $ fromKey keySize k
accumRepoSizes :: Key -> (S.Set UUID, S.Set UUID) -> M.Map UUID RepoSize -> M.Map UUID RepoSize
accumRepoSizes k (newlocs, removedlocs) sizemap =
let !sizemap' = foldl' (flip $ M.alter $ addKeyRepoSize k) sizemap newlocs
in foldl' (flip $ M.alter $ removeKeyRepoSize k) sizemap' removedlocs