make --rebalance of balanced use fullysizebalanced when useful

When the specified number of copies is > 1, and some repositories are
too full, it can be better to move content from them to other less full
repositories, in order to make space for new content.

annex.fullybalancedthreshhold is documented, but not implemented yet

This is not tested very well yet, and is known to sometimes take several
runs to stabalize.
This commit is contained in:
Joey Hess 2024-08-21 17:56:06 -04:00
parent 9e87061de2
commit 76ece2a699
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 70 additions and 28 deletions

View file

@ -599,18 +599,31 @@ limitFullyBalanced :: Maybe UUID -> Annex GroupMap -> MkLimit Annex
limitFullyBalanced = limitFullyBalanced' "fullybalanced" limitFullyBalanced = limitFullyBalanced' "fullybalanced"
limitFullyBalanced' :: String -> Maybe UUID -> Annex GroupMap -> MkLimit Annex limitFullyBalanced' :: String -> Maybe UUID -> Annex GroupMap -> MkLimit Annex
limitFullyBalanced' = limitFullyBalanced'' filtercandidates limitFullyBalanced' = limitFullyBalanced'' $ \n key candidates -> do
where maxsizes <- getMaxSizes
filtercandidates _ key candidates = do sizemap <- getRepoSizes False
maxsizes <- getMaxSizes let threshhold = 0.9 :: Double
sizemap <- getRepoSizes False let toofull u =
currentlocs <- S.fromList <$> loggedLocations key case (M.lookup u maxsizes, M.lookup u sizemap) of
let keysize = fromMaybe 0 (fromKey keySize key) (Just (MaxSize maxsize), Just (RepoSize reposize)) ->
let hasspace u = case (M.lookup u maxsizes, M.lookup u sizemap) of fromIntegral reposize >= fromIntegral maxsize * threshhold
(Just maxsize, Just reposize) -> _ -> False
repoHasSpace keysize (u `S.member` currentlocs) reposize maxsize needsizebalance <- ifM (Annex.getRead Annex.rebalance)
_ -> True ( return $ n > 1 &&
return $ S.filter hasspace candidates n > S.size candidates
- S.size (S.filter toofull candidates)
, return False
)
if needsizebalance
then filterCandidatesFullySizeBalanced maxsizes sizemap n key candidates
else do
currentlocs <- S.fromList <$> loggedLocations key
let keysize = fromMaybe 0 (fromKey keySize key)
let hasspace u = case (M.lookup u maxsizes, M.lookup u sizemap) of
(Just maxsize, Just reposize) ->
repoHasSpace keysize (u `S.member` currentlocs) reposize maxsize
_ -> True
return $ S.filter hasspace candidates
repoHasSpace :: Integer -> Bool -> RepoSize -> MaxSize -> Bool repoHasSpace :: Integer -> Bool -> RepoSize -> MaxSize -> Bool
repoHasSpace keysize inrepo (RepoSize reposize) (MaxSize maxsize) repoHasSpace keysize inrepo (RepoSize reposize) (MaxSize maxsize)
@ -673,23 +686,31 @@ limitFullySizeBalanced :: Maybe UUID -> Annex GroupMap -> MkLimit Annex
limitFullySizeBalanced = limitFullySizeBalanced' "fullysizebalanced" limitFullySizeBalanced = limitFullySizeBalanced' "fullysizebalanced"
limitFullySizeBalanced' :: String -> Maybe UUID -> Annex GroupMap -> MkLimit Annex limitFullySizeBalanced' :: String -> Maybe UUID -> Annex GroupMap -> MkLimit Annex
limitFullySizeBalanced' = limitFullyBalanced'' filtercandidates limitFullySizeBalanced' = limitFullyBalanced'' $ \n key candidates -> do
maxsizes <- getMaxSizes
sizemap <- getRepoSizes False
filterCandidatesFullySizeBalanced maxsizes sizemap n key candidates
filterCandidatesFullySizeBalanced
:: M.Map UUID MaxSize
-> M.Map UUID RepoSize
-> Int
-> Key
-> S.Set UUID
-> Annex (S.Set UUID)
filterCandidatesFullySizeBalanced maxsizes sizemap n key candidates = do
currentlocs <- S.fromList <$> loggedLocations key
let keysize = fromMaybe 0 (fromKey keySize key)
let go u = case (M.lookup u maxsizes, M.lookup u sizemap, u `S.member` currentlocs) of
(Just maxsize, Just reposize, inrepo)
| repoHasSpace keysize inrepo reposize maxsize ->
proportionfree keysize inrepo u reposize maxsize
| otherwise -> Nothing
_ -> Nothing
return $ S.fromList $
map fst $ take n $ reverse $ sortOn snd $
mapMaybe go $ S.toList candidates
where where
filtercandidates n key candidates = do
maxsizes <- getMaxSizes
sizemap <- getRepoSizes False
currentlocs <- S.fromList <$> loggedLocations key
let keysize = fromMaybe 0 (fromKey keySize key)
let go u = case (M.lookup u maxsizes, M.lookup u sizemap, u `S.member` currentlocs) of
(Just maxsize, Just reposize, inrepo)
| repoHasSpace keysize inrepo reposize maxsize ->
proportionfree keysize inrepo u reposize maxsize
| otherwise -> Nothing
_ -> Nothing
return $ S.fromList $
map fst $ take n $ reverse $ sortOn snd $
mapMaybe go $ S.toList candidates
proportionfree keysize inrepo u (RepoSize reposize) (MaxSize maxsize) proportionfree keysize inrepo u (RepoSize reposize) (MaxSize maxsize)
| maxsize > 0 = Just | maxsize > 0 = Just
( u ( u

View file

@ -318,6 +318,16 @@ elsewhere to allow removing it).
When the `--rebalance` option is used, `balanced` is the same as When the `--rebalance` option is used, `balanced` is the same as
`fullybalanced`. `fullybalanced`.
When the specified number is greater than 1, and too many repositories
in the group are more than 90% full (as configured by
annex.fullybalancedthreshhold), this behaves like `fullysizebalanced`.
For example, `fullybalanced=foo:3`, when group foo has 5 repositories,
two 50% full and three 99% full, will make some content move from the
full repositories to the others. Moving content like that is expensive,
but it allows new files to continue to be stored on the specified number
of repositories.
* `sizebalanced=groupname:number` * `sizebalanced=groupname:number`
Distributes content amoung repositories in the group, keeping Distributes content amoung repositories in the group, keeping

View file

@ -928,6 +928,12 @@ repository, using [[git-annex-config]]. See its man page for a list.)
The default reserve is 100 megabytes. The default reserve is 100 megabytes.
* `annex.fullybalancedthreshhold`
Configures the percent full a repository must be in order for
the "fullybalanced" preferred content expression to consider it
to be full. The default is 90.
* `annex.skipunknown` * `annex.skipunknown`
Set to true to make commands like "git-annex get" silently skip over Set to true to make commands like "git-annex get" silently skip over

View file

@ -30,6 +30,11 @@ Planned schedule of work:
## work notes ## work notes
* Implement annex.fullybalancedthreshhold
* `git-annex assist --rebalance` of `balanced=foo:2`
sometimes needs several runs to stabalize.
* Bug: * Bug:
git init foo git init foo