Commit graph

45560 commits

Author SHA1 Message Date
Joey Hess
76ece2a699
make --rebalance of balanced use fullysizebalanced when useful
When the specified number of copies is > 1, and some repositories are
too full, it can be better to move content from them to other less full
repositories, in order to make space for new content.

annex.fullybalancedthreshhold is documented, but not implemented yet

This is not tested very well yet, and is known to sometimes take several
runs to stabalize.
2024-08-21 17:59:08 -04:00
Joey Hess
9e87061de2
Support "sizebalanced=" and "fullysizebalanced=" too
Might want to make --rebalance turn balanced=group:N where N > 1
to fullysizebalanced=group:N. Have not yet determined if that will
improve situations enough to be worth the extra work.
2024-08-21 15:01:54 -04:00
Joey Hess
4e1dcc0372
bug 2024-08-21 12:18:31 -04:00
Joey Hess
2ec4602e36
fix column width 2024-08-21 12:18:16 -04:00
Joey Hess
de7ac1bb70
fix 2024-08-20 13:52:46 -04:00
Joey Hess
476d223bce
implement fullbalanced=group:N
Rebalancing this when it gets into a suboptimal situation will need
further work.
2024-08-20 13:51:02 -04:00
Matthew
4a9e637d36 Added a comment: Help with .nfsXXXX files 2024-08-19 21:20:59 +00:00
Joey Hess
d4b2f8201d
add %full field to table 2024-08-19 11:41:48 -04:00
matrss
9cfdae4c3b Added a comment 2024-08-19 10:25:13 +00:00
Joey Hess
68a99a8f48
size based rebalancing design 2024-08-18 16:25:12 -04:00
Joey Hess
99514f9d18
maxsize overview display and --json support 2024-08-18 12:08:13 -04:00
xentac
74b953cded Added a comment 2024-08-18 03:17:12 +00:00
Joey Hess
016edcf437
adjust countdown number for RepoSize update message
Benchmarking a git-annex branch with half a million files changed,
it takes about 1 minute to update the RepoSizes. So this will display
the message after a few seconds.
2024-08-17 15:59:07 -04:00
Joey Hess
f985c58d8e
consistently don't show sizes of empty repositories
This used to be the case, and when matching options are used, that code
path still omits them, so also omit them in the getRepoSize code path.
2024-08-17 15:09:16 -04:00
Joey Hess
b62b58b50b
git-annex info speed up using getRepoSizes 2024-08-17 14:54:31 -04:00
Joey Hess
d09a005f2b
update RepoSize database from git-annex branch incrementally
The use of catObjectStream is optimally fast. Although it might be
possible to combine this with git-annex branch merge to avoid some
redundant work.

Benchmarking, a git-annex branch that had 100000 files changed
took less than 1.88 seconds to run through this.
2024-08-17 13:35:00 -04:00
Joey Hess
8239824d92
consistently omit clusters when calculating RepoSizes
updateRepoSize is only called on the UUID of a repository, not any
cluster it might be a node of. But overLocationLogs and overLocationLogsJournal
were inclusing cluster UUIDs. So it was inconsistent.

Currently I don't see any reason to calculate RepoSize for a cluster.
It's not even clear what it should mean, the total size of all nodes, or
the amount of information stored in the cluster in total?
2024-08-17 11:24:14 -04:00
Spencer
40b49e2ddd Added a comment: Remote Helper? 2024-08-17 05:33:01 +00:00
matrss
bcf876e3a0 2024-08-16 15:52:32 +00:00
matrss
f057010086 Added a comment 2024-08-16 15:45:45 +00:00
Joey Hess
61d95627f3
fix Annex.repoSize sharing between threads 2024-08-16 10:56:51 -04:00
Joey Hess
e361b9ea3c
todo 2024-08-15 16:15:48 -04:00
Joey Hess
63ccf6ffa7
todo 2024-08-15 13:50:50 -04:00
Joey Hess
4a0c7e2b2c
update 2024-08-15 13:41:47 -04:00
Joey Hess
a2da9c526b
RepoSize concurrency fix
When loading the journalled repo sizes, make sure that the current
process is prevented from making changes to the journal in another
thread.
2024-08-15 13:37:41 -04:00
Joey Hess
06064f897c
update Annex.reposizes when changing location logs
The live update is only needed when Annex.reposizes has already been
populated.
2024-08-15 13:27:14 -04:00
Joey Hess
c376b1bd7e
show message when doing possibly expensive from scratch reposize calculation 2024-08-15 12:42:36 -04:00
Joey Hess
c200523bac
implement getRepoSizes
At this point the RepoSize database is getting populated, and it
all seems to be working correctly. Incremental updates still need to be
done to make it performant.
2024-08-15 12:31:56 -04:00
Joey Hess
bba23e7cc9
do not need a db queue
This database is read once and written at most once per run.
2024-08-15 12:31:27 -04:00
Joey Hess
eac4e9391b
finalize RepoSize database
Including locking on creation, handling of permissions errors, and
setting repo sizes.

I'm confident that locking is not needed while using this database.
Since writes happen in a single transaction. When there are two writers
that are recording sizes based on different git-annex branch commits,
one will overwrite what the other one recorded. Which is fine, it's only
necessary that the database stays consistent with the content of a
git-annex branch commit.
2024-08-15 12:29:34 -04:00
Atemu
e8997d8899 Added a comment 2024-08-15 15:40:20 +00:00
Joey Hess
63a3cedc45
slightly improve hairy types 2024-08-14 16:04:18 -04:00
Joey Hess
3e6eb2a58d
implement journalledRepoSizes
Plan is to run this when populating Annex.reposizes on demand.
So Annex.reposizes will be up-to-date with the journal, including
crucially journal entries for private repositories. But also
anything that has been written to the journal by another process,
especially if the process was ran with annex.alwayscommit=false.

From there, Annex.reposizes can be kept up to date with changes made
by the running process.
2024-08-14 13:53:24 -04:00
pedro-lopes-de-azevedo
c75ecc5350 Added a comment: parameter --from not accepted 2024-08-14 14:27:54 +00:00
Joey Hess
8ac2685b33
calcBranchRepoSizes without journal files
This will be used to prime the RepoSizes database, which will always
contain values that correpond to information in the git-annex branch, so
without anything from journal files.

Factored out overJournalFileContents which will later be used to
update Annex.reposizes to include information from journal files.
This will be partitcularly important to support private UUIDs which only
ever get to journal files and not to the branch.
2024-08-14 03:19:30 -04:00
bvaa
11eb2ae6ec Added a comment 2024-08-14 07:18:26 +00:00
Joey Hess
90a79a6c1e
plan 2024-08-13 15:13:30 -04:00
Joey Hess
343c87db45
improve haddocks 2024-08-13 15:05:49 -04:00
Joey Hess
f612ebb934
avoid changing git-annex info behavior
5afbea25e7 changed it to ignore journal
files that did not correspond to a key in the git-annex branch. However,
when there is a private journal, that can happen.

Neither behavior is fully correct, so keep the old incorrect behavior
rather than introducing a new differently incorrect behavior.

I plan to eventually make git-annex info use Annex.reposizes instead of
calculating it itself, and once Annex.reposizes handles this all
correctly, this will be a moot problem.
2024-08-13 14:17:20 -04:00
Joey Hess
a979d8da41
update 2024-08-13 14:14:47 -04:00
Joey Hess
08f55948e9
take all repository locations into account for balancing
This fully fixes --rebalance stability, and also deals with an issue
where a file is present in each balanced repository and it didn't want
to remove it from any.
2024-08-13 13:46:47 -04:00
Joey Hess
10d8b3cc63
fixed --rebalance stability on drop
Was checking the wrong uuid, oops
2024-08-13 13:32:11 -04:00
Joey Hess
5afbea25e7
avoid counting size of keys that are in the journal twice
In calcRepoSizes and also git-annex info, when a key was in the journal,
it was passed to the callback twice, so the calculated size was wrong.
2024-08-13 13:23:39 -04:00
Joey Hess
467d80101a
improve handling of unmerged git-annex branches in readonly repo
git-annex info was displaying a message that didn't make sense in
context.

In calcRepoSizes, it seems better to return the information from the
git-annex branch, rather than giving up. Especially since balanced
preferred content uses it, and we can't just give up evaluating a
preferred content expression if git-annex is to be usable in such a
readonly repo.

Commit 6d7ecd9e5d nobly wanted git-annex
to behave the same with such unmerged branches as it does when it can
merge them. But for the purposes of preferred content, it seems to me
there's a sense that such an unmerged branch is the same as a remote we
have not pulled from. The balanced preferred content will either way
operate under outdated information, and so make not the best choices.
2024-08-13 13:13:12 -04:00
Joey Hess
5c35b3d579
fix typo 2024-08-13 11:47:37 -04:00
Joey Hess
745bc5c547
take maxsize into account for balanced preferred content
This is very innefficient, it will need to be optimised not to
calculate the sizes of repos every time.

Also, fixed a bug in balancedPicker that caused it to pick a too high
index when some repos were excluded due to being full.
2024-08-13 11:00:20 -04:00
Spencer
05a62e4e5f Added a comment: Workaround: --force-small 2024-08-13 07:05:57 +00:00
Spencer
3d252da06c Added a comment: Exact Moment Things Go Wrong 2024-08-13 06:22:11 +00:00
Spencer
ab5f920d77 .md linting 2024-08-13 04:46:53 +00:00
Spencer
8a91a8c208 2024-08-13 04:46:10 +00:00