Commit graph

18 commits

Author SHA1 Message Date
Joey Hess
6cf9a101b8
sim: Fix size tracking for balanced preferred content 2024-09-23 12:42:32 -04:00
Joey Hess
133584a83a
avoid locking the journal in readonly repository
The test suite flagged that git-annex info in a readonly repository was
no longer working.

.git/annex/journal.lck: openFd: permission denied

This fixes it, however, in a case where .git/annex/reposize/ is
writable, but .git/annex/journal/ is not, there will still be a
permission denied error. The solution would just be to use consistent
permissions I suppose.
2024-08-30 11:58:10 -04:00
Joey Hess
d876e06e35
err on the side of larger repository size
When a live update is removing a key, it might fail. So only count those
once they have succeeded. When a live update is adding a key, count it
immediately to avoid over-filling a repo.

This also makes the 1 minute delay between stale live changes checks
more defensible, because a stale live change can only cause us to err
more on the side of caution.
2024-08-28 14:13:12 -04:00
Joey Hess
f89a1b8216
remove stale live changes from reposize database
Reorganized the reposize database directory, and split up a column.

checkStaleSizeChanges needs to run before needLiveUpdate,
otherwise the process won't be holding a lock on its pid file, and
another process could go in and expire the live update it records. It
just so happens that they do get called in the correct order, since
checking balanced preferred content calls getLiveRepoSizes before
needLiveUpdate.

The 1 minute delay between checks is arbitrary, but will avoid excess
work. The downside of it is that, if a process is dropping a file and
gets interrupted, for 1 minute another process can expect a repository
will soon be smaller than it is. And so a process might send data to a
repository when a file is not really going to be dropped from it. But
note that can already happen if a drop takes some time in eg locking and
then fails. So it seems possible that live updates should only be
allowed to increase, rather than decrease the size of a repository.
2024-08-28 13:57:25 -04:00
Joey Hess
278adbb726
combine 2 queries 2024-08-28 11:00:59 -04:00
Joey Hess
b01a63ef62
avoid nub
There will not usually be many live changes, but usually does not mean
ever, and O(N^2) is best avoided.
2024-08-27 15:00:10 -04:00
Joey Hess
8555fb88ef
locking in checkLiveUpdate
This makes sure that two threads don't check balanced preferred content at the
same time, so each thread always sees a consistent picture of what is
happening.

This does add a fairly expensive file level lock to every check of
preferred content, in commands that use prepareLiveUpdate. It would
be good to only do that when live updates are actually needed, eg when
the preferred content expression uses balanced preferred content.
2024-08-27 13:12:43 -04:00
Joey Hess
4d2f95853d
closing in on finishing live reposizes
Fixed successfullyFinishedLiveSizeChange to not update the rolling total
when a redundant change is in RecentChanges.

Made setRepoSizes clear RecentChanges that are no longer needed.
It might be possible to clear those earlier, this is only a convenient
point to do it.

The reason it's safe to clear RecentChanges here is that, in order for a
live update to call successfullyFinishedLiveSizeChange, a change must be
made to a location log. If a RecentChange gets cleared, and just after
that a new live update is started, making the same change, the location
log has already been changed (since the RecentChange exists), and
so when the live update succeeds, it won't call
successfullyFinishedLiveSizeChange. The reason it doesn't
clear RecentChanges when there is a reduntant live update is because
I didn't want to think through whether or not all races are avoided in
that case.

The rolling total in SizeChanges is never cleared. Instead,
calcJournalledRepoSizes gets the initial value of it, and then
getLiveRepoSizes subtracts that initial value from the current value.
Since the rolling total can only be updated by updateRepoSize,
which is called with the journal locked, locking the journal in
calcJournalledRepoSizes ensures that the database does not change while
reading the journal.
2024-08-27 12:54:46 -04:00
Joey Hess
23d44aa4aa
use live reposizes in balanced preferred content 2024-08-27 10:17:43 -04:00
Joey Hess
d7813876a0
fixed the build
Manually tested getLiveRepoSizes and it is working correctly.
2024-08-27 09:41:35 -04:00
Joey Hess
21608716bd
started work on getLiveRepoSizes
Doesn't quite compile
2024-08-26 14:50:09 -04:00
Joey Hess
18f8d61f55
rolling total of size changes in RepoSize database
When a live size change completes successfully, the same transaction
that removes it from the database updates the rolling total for its
repository.

The idea is that when RepoSizes is read, SizeChanges will be as
well, and cached locally. Any time a change is made, the local cache
will be updated. So by comparing the local cache with the current
SizeChanges, it can learn about size changes that were made by other
processes. Then read the LiveSizeChanges, and add that in to get a live
picture of the current sizes.

Also added a SizeChangeId. This allows 2 different threads, or
processes, to both record a live size change for the same repo and key,
and update their own information without stepping on one-another's toes.
2024-08-25 10:34:47 -04:00
Joey Hess
9188825a4d
use FileSize
It's just an alias, so this doesn't change the db schema, but it makes
explicit that it's not stored as an int64
2024-08-25 08:22:40 -04:00
Joey Hess
c3d40b9ec3
plumb in LiveUpdate (WIP)
Each command that first checks preferred content (and/or required
content) and then does something that can change the sizes of
repositories needs to call prepareLiveUpdate, and plumb it through the
preferred content check and the location log update.

So far, only Command.Drop is done. Many other commands that don't need
to do this have been updated to keep working.

There may be some calls to NoLiveUpdate in places where that should be
done. All will need to be double checked.

Not currently in a compilable state.
2024-08-23 16:35:12 -04:00
Joey Hess
4885073377
add live size changes to RepoSize database
Not yet used.
2024-08-23 12:51:00 -04:00
Joey Hess
bba23e7cc9
do not need a db queue
This database is read once and written at most once per run.
2024-08-15 12:31:27 -04:00
Joey Hess
eac4e9391b
finalize RepoSize database
Including locking on creation, handling of permissions errors, and
setting repo sizes.

I'm confident that locking is not needed while using this database.
Since writes happen in a single transaction. When there are two writers
that are recording sizes based on different git-annex branch commits,
one will overwrite what the other one recorded. Which is fine, it's only
necessary that the database stays consistent with the content of a
git-annex branch commit.
2024-08-15 12:29:34 -04:00
Joey Hess
99a126bebb
added reposize database
The idea is that upon a merge of the git-annex branch, or a commit to
the git-annex branch, the reposize database will be updated. So it
should always accurately reflect the location log sizes, but it will
often be behind the actual current sizes.

Annex.reposizes will start with the value from the database, and get
updated with each transfer, so it will reflect a process's best
understanding of the current sizes.

When there are multiple processes all transferring to the same repo,
Annex.reposize will not reflect transfers made by the other processes
since the current process started. So when using balanced preferred
content, it may make suboptimal choices, including trying to transfer
content to the repo when another process has already filled it up.
But this is the same as if there are multiple processes running on
ifferent machines, so is acceptable. The reposize will eventually
get an accurate value reflecting changes made by other processes or in
other repos.
2024-08-12 11:19:58 -04:00