implement journalledRepoSizes

Plan is to run this when populating Annex.reposizes on demand.
So Annex.reposizes will be up-to-date with the journal, including
crucially journal entries for private repositories. But also
anything that has been written to the journal by another process,
especially if the process was ran with annex.alwayscommit=false.

From there, Annex.reposizes can be kept up to date with changes made
by the running process.
This commit is contained in:
Joey Hess 2024-08-14 13:46:44 -04:00
parent 8ac2685b33
commit 3e6eb2a58d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
7 changed files with 148 additions and 66 deletions

View file

@ -51,48 +51,36 @@ Planned schedule of work:
* `git-annex info` can use maxsize to display how full repositories are
* overBranchFileContents can improve its handling of journalled files
by first going over the branch, and then at the end, feeding
the journalled filenames into catObjectStream (run on the same branch
sha) to check if the file was in the branch. Only pass the journalled
file to the callback when it was not. This will avoid innaccuracies
in calcRepoSizes and git-annex info.
calcRepoSizes currently skips log files in private journals,
when they are for a key that does not appear in the git-annex branch.
It needs to include those.
* Implement [[track_free_space_in_repos_via_git-annex_branch]]:
* Goal is for limitFullyBalanced not to need to calcRepoSizes.
* Load Annex.reposizes from Database.RepoSizes on demand.
* Add git-annex branch sha to Database.RepoSizes.
* When Annex.reposizes does not list the size of a UUID, and
that UUID's size is needed eg for balanced preferred
content, use calcRepoSizes and store in
Database.RepoSizes.
* Load Annex.reposizes from Database.RepoSizes on demand,
supplimenting with journalledRepoSizes.
* Update Annex.reposizes in Logs.Location.logChange,
when it makes a change and when Annex.reposizes has a size
for the UUID. So Annex.reposizes is kept up-to-date
for each transfer and drop.
* Update Database.RepoSizes during merge of git-annex branch.
* When calling journalledRepoSizes make sure that the current
process is prevented from making changes to the journal in another
thread. Probably lock the journal? (No need to worry about changes made
by other processes; Annex.reposizes does not need to be kept current
with what other processes might be doing.)
* Update Database.RepoSizes incrementally during merge of
git-annex branch, and after commit of git-annex branch.
(Also update Annex.reposizes)
* On commit of git-annex branch, update Database.RepoSize to reflect
the size changes in the commit.
Probably cannot use Annex.reposizes for the values, since they must
match the sizes in the location log files being committed. Note
that other processes may journal location log changes, which will be
part of the commit. So need to read all the changed location logs,
and update Database.RepoSize accordingly.
Also private journals complicate this.
(Annex.reposizes can be updated to the resulting values.)
(Annex.reposizes can be updated to the resulting values as well.)
* Perhaps: setRepoSize to 0 when initializing a new repo or a
new special remote (but not when reinitializing),