implement getRepoSizes
At this point the RepoSize database is getting populated, and it all seems to be working correctly. Incremental updates still need to be done to make it performant.
This commit is contained in:
parent
bba23e7cc9
commit
c200523bac
3 changed files with 60 additions and 49 deletions
|
@ -5,17 +5,52 @@
|
|||
- Licensed under the GNU AGPL version 3 or higher.
|
||||
-}
|
||||
|
||||
module Annex.RepoSize where
|
||||
module Annex.RepoSize (
|
||||
getRepoSizes,
|
||||
) where
|
||||
|
||||
import Annex.Common
|
||||
import Annex.Branch (UnmergedBranches(..))
|
||||
import qualified Annex
|
||||
import Annex.Branch (UnmergedBranches(..), getBranch)
|
||||
import Types.RepoSize
|
||||
import qualified Database.RepoSize as Db
|
||||
import Logs.Location
|
||||
import Logs.UUID
|
||||
import Git.Types (Sha)
|
||||
|
||||
import qualified Data.Map.Strict as M
|
||||
|
||||
{- Gets the repo size map. Cached for speed. -}
|
||||
getRepoSizes :: Annex (M.Map UUID RepoSize)
|
||||
getRepoSizes = maybe updateRepoSizes return =<< Annex.getState Annex.reposizes
|
||||
|
||||
{- Updates Annex.reposizes with current information from the git-annex
|
||||
- branch, supplimented with journalled but not yet committed information.
|
||||
-}
|
||||
updateRepoSizes :: Annex (M.Map UUID RepoSize)
|
||||
updateRepoSizes = bracket Db.openDb Db.closeDb $ \h -> do
|
||||
(oldsizemap, moldbranchsha) <- liftIO $ Db.getRepoSizes h
|
||||
case moldbranchsha of
|
||||
Nothing -> calculatefromscratch h >>= set
|
||||
Just oldbranchsha -> do
|
||||
currbranchsha <- getBranch
|
||||
if oldbranchsha == currbranchsha
|
||||
then journalledRepoSizes oldsizemap oldbranchsha
|
||||
>>= set
|
||||
else do
|
||||
-- XXX todo incremental update by diffing
|
||||
-- from old to new branch.
|
||||
calculatefromscratch h >>= set
|
||||
where
|
||||
calculatefromscratch h = do
|
||||
(sizemap, branchsha) <- calcBranchRepoSizes
|
||||
liftIO $ Db.setRepoSizes h sizemap branchsha
|
||||
journalledRepoSizes sizemap branchsha
|
||||
set sizemap = do
|
||||
Annex.changeState $ \st -> st
|
||||
{ Annex.reposizes = Just sizemap }
|
||||
return sizemap
|
||||
|
||||
{- Sum up the sizes of all keys in all repositories, from the information
|
||||
- in the git-annex branch, but not the journal. Retuns the sha of the
|
||||
- branch commit that was used.
|
||||
|
|
6
Limit.hs
6
Limit.hs
|
@ -598,12 +598,10 @@ limitFullyBalanced mu getgroupmap groupname = Right $ MatchFiles
|
|||
let groupmembers = fromMaybe S.empty $
|
||||
M.lookup g (uuidsByGroup gm)
|
||||
maxsizes <- getMaxSizes
|
||||
-- XXX do not calc this every time!
|
||||
(sizemap, sha) <- calcBranchRepoSizes
|
||||
sizemap' <- journalledRepoSizes sizemap sha
|
||||
sizemap <- getRepoSizes
|
||||
let keysize = fromMaybe 0 (fromKey keySize key)
|
||||
currentlocs <- S.fromList <$> loggedLocations key
|
||||
let hasspace u = case (M.lookup u maxsizes, M.lookup u sizemap') of
|
||||
let hasspace u = case (M.lookup u maxsizes, M.lookup u sizemap) of
|
||||
(Just (MaxSize maxsize), Just (RepoSize reposize)) ->
|
||||
if u `S.member` currentlocs
|
||||
then reposize <= maxsize
|
||||
|
|
|
@ -30,6 +30,23 @@ Planned schedule of work:
|
|||
|
||||
## work notes
|
||||
|
||||
* Implement [[track_free_space_in_repos_via_git-annex_branch]]:
|
||||
|
||||
* Update Annex.reposizes in Logs.Location.logChange,
|
||||
when it makes a change and when Annex.reposizes has a size
|
||||
for the UUID. So Annex.reposizes is kept up-to-date
|
||||
for each transfer and drop.
|
||||
|
||||
* When calling journalledRepoSizes make sure that the current
|
||||
process is prevented from making changes to the journal in another
|
||||
thread. Probably lock the journal? (No need to worry about changes made
|
||||
by other processes; Annex.reposizes does not need to be kept current
|
||||
with what other processes might be doing.)
|
||||
|
||||
* updateRepoSizes incrementally when the git-annex branch sha in the
|
||||
database is older than the current git-annex branch. Diff from old to
|
||||
new branch to efficiently update.
|
||||
|
||||
* `fullybalanced=foo:2` can get stuck in suboptimal situations. Eg,
|
||||
when 2 out of 3 repositories are full, and the 3rd is mostly empty,
|
||||
it is no longer possible to add new files to 2 repositories.
|
||||
|
@ -45,52 +62,13 @@ Planned schedule of work:
|
|||
Also note that "fullybalanced=foo:2" is not currently actually
|
||||
implemented!
|
||||
|
||||
* implement size-based balancing, so all balanced repositories are around
|
||||
the same percent full, either as the default or as another preferred
|
||||
content expression.
|
||||
* Make `git-annex info` use Annex.reposizes.
|
||||
|
||||
* `git-annex info` can use maxsize to display how full repositories are
|
||||
|
||||
* Implement [[track_free_space_in_repos_via_git-annex_branch]]:
|
||||
|
||||
* Goal is for limitFullyBalanced not to need to calcRepoSizes.
|
||||
|
||||
* When Annex.reposizes does not list the size of a UUID, and
|
||||
that UUID's size is needed eg for balanced preferred
|
||||
content, use calcRepoSizes and store in
|
||||
Database.RepoSizes.
|
||||
|
||||
* Load Annex.reposizes from Database.RepoSizes on demand,
|
||||
supplimenting with journalledRepoSizes.
|
||||
|
||||
* Update Annex.reposizes in Logs.Location.logChange,
|
||||
when it makes a change and when Annex.reposizes has a size
|
||||
for the UUID. So Annex.reposizes is kept up-to-date
|
||||
for each transfer and drop.
|
||||
|
||||
* When calling journalledRepoSizes make sure that the current
|
||||
process is prevented from making changes to the journal in another
|
||||
thread. Probably lock the journal? (No need to worry about changes made
|
||||
by other processes; Annex.reposizes does not need to be kept current
|
||||
with what other processes might be doing.)
|
||||
|
||||
* Update Database.RepoSizes incrementally during merge of
|
||||
git-annex branch, and after commit of git-annex branch.
|
||||
(Also update Annex.reposizes)
|
||||
|
||||
(Annex.reposizes can be updated to the resulting values as well.)
|
||||
|
||||
* Perhaps: setRepoSize to 0 when initializing a new repo or a
|
||||
new special remote (but not when reinitializing),
|
||||
and also update Annex.reposizes for that uuid.
|
||||
|
||||
Whether it makes sense to do this will depend on how expensive
|
||||
it is to update Database.RepoSize on git-annex branch merge and commit.
|
||||
If it is not expensive, will want to track reposizes from the beginning
|
||||
whenever possible, to avoid a later expensive read of the git-annex
|
||||
branch to calculate the reposizes.
|
||||
|
||||
* Make `git-annex info` use Annex.reposizes.
|
||||
* implement size-based balancing, so all balanced repositories are around
|
||||
the same percent full, either as the default or as another preferred
|
||||
content expression.
|
||||
|
||||
## completed items for August's work on balanced preferred content
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue