Commit graph

45585 commits

Author SHA1 Message Date
Joey Hess
65dd018850
prevent overwriting a repo in the simulation 2024-09-05 16:22:08 -04:00
Joey Hess
8f8e35ac3b
almost finished with applySimCommand
Added checks that repo names are ones that have been added to the sim.

Implemented preferred content etc setting. It does not need to parse the
expression in applySimCommand, instead that can be done when running the
sim. This keeps it pure.

But, it can't be entirely pure because of CommandAddTree. So made it
return an Annex action when necessary.

Moved makeMatcher into Annex.FileMatcher in preparation for using it,
but it's not used yet. Also moved checkPreferredContentExpression.
2024-09-05 15:22:41 -04:00
Joey Hess
710a199ce9
implement CommandUse in Annex.Sim
Refactored Remote to keep it pure.
2024-09-05 10:50:04 -04:00
Joey Hess
b932acf4ad
started Annex.Sim
Have most of the sim command handler, but to keep it pure while implementing
the rest will need some refactoring.

It seems likely that running the simulation itself will not be able to be
entirely pure. Preferred content evaluation runs in Annex after all.

Note that the somewhat awkward randomWords is because the i386ancient
build depends on a version of random too old to support generating a
random ByteString on its own.
2024-09-04 15:15:36 -04:00
Joey Hess
84c781d924
documentation for git-annex sim
command not implemented yet
2024-09-04 15:03:17 -04:00
Joey Hess
1b6c33a38e
update 2024-09-03 14:24:32 -04:00
Joey Hess
3398514c38
sim design 2024-09-03 14:23:48 -04:00
Joey Hess
5807e1480c
correct comment
This is not related to v5 versus newer versions.
2024-09-03 14:23:32 -04:00
Joey Hess
fe71400e37
fix typo 2024-09-03 14:23:14 -04:00
Joey Hess
340bdd0dac
treat "not present" in preferred content as invalid
Detect when a preferred content expression contains "not present", which
would lead to repeatedly getting and then dropping files, and make it never
match. This also applies to "not balanced" and "not sizebalanced".

--explain will tell the user when this happens

Note that getMatcher calls matchMrun' and does not check for unstable
negated limits. While there is no --present anyway, if there was,
it would not make sense for --not --present to complain about
instability and fail to match.
2024-09-03 13:50:06 -04:00
Joey Hess
8b2bd42540
Fix --debug display of onlyingroup preferred content expression. 2024-09-03 12:38:59 -04:00
Joey Hess
03864a2c3b
update 2024-09-03 11:52:54 -04:00
Joey Hess
b800ea6826
2 level toc 2024-09-02 16:32:28 -04:00
Joey Hess
ab0c82114b
Merge branch 'master' of ssh://git-annex.branchable.com 2024-09-02 16:31:31 -04:00
Joey Hess
1e1c13dd38
fix number of headers 2024-09-02 16:31:03 -04:00
lucas.gautheron@f2b5c93a64b028c1ec8698b9c2412ed51ff22040
850ea3a9b8 2024-09-02 15:12:02 +00:00
lucas.gautheron@f2b5c93a64b028c1ec8698b9c2412ed51ff22040
925c203c09 2024-09-02 15:08:25 +00:00
Joey Hess
9d29b99ac4
add news item for git-annex 10.20240831 2024-08-31 19:50:36 -04:00
Joey Hess
b3dc656153
releasing package git-annex version 10.20240831 2024-08-31 19:50:26 -04:00
Joey Hess
35ff8c8c00
use Utility.PID
fixes build on i386ancient
2024-08-30 14:56:38 -04:00
Joey Hess
3b74386ed4
fix liveupdate locking
This fixes the build on windows.

Changed it to use lock pools, which will behave better if two threads
call getLiveRepoSizes at the same time.

Also this should make it work when annex.pidlock is set. In that case,
once the current process locks this file, or anything, any other process
will have to wait on the pid lock. So checkStaleSizeChanges will
correctly identify any other live changes in the database as stale,
since there can only be one git-annex process running.
2024-08-30 14:49:18 -04:00
Joey Hess
c10d11959a
fix paste oops
Wow, I pasted a big thing into entirely the wrong file, but it was in a
comment so it compiled anyway.
2024-08-30 14:35:05 -04:00
Joey Hess
698d9252a5
mention sizebalanced as well as balanced 2024-08-30 12:06:45 -04:00
Joey Hess
133584a83a
avoid locking the journal in readonly repository
The test suite flagged that git-annex info in a readonly repository was
no longer working.

.git/annex/journal.lck: openFd: permission denied

This fixes it, however, in a case where .git/annex/reposize/ is
writable, but .git/annex/journal/ is not, there will still be a
permission denied error. The solution would just be to use consistent
permissions I suppose.
2024-08-30 11:58:10 -04:00
Joey Hess
53b7375cc6
update 2024-08-30 11:14:45 -04:00
Joey Hess
54b6151412
document using balanced preferred content in a cluster 2024-08-30 11:08:32 -04:00
Joey Hess
d0938d730b
Merge branch 'master' into balanced 2024-08-30 11:01:39 -04:00
Joey Hess
242c525659
lookupkey: Allow using --ref in a bare repository. 2024-08-30 10:55:48 -04:00
yarikoptic
e2b7895cbc Added a comment 2024-08-29 18:35:47 +00:00
Joey Hess
d876e06e35
err on the side of larger repository size
When a live update is removing a key, it might fail. So only count those
once they have succeeded. When a live update is adding a key, count it
immediately to avoid over-filling a repo.

This also makes the 1 minute delay between stale live changes checks
more defensible, because a stale live change can only cause us to err
more on the side of caution.
2024-08-28 14:13:12 -04:00
Joey Hess
f89a1b8216
remove stale live changes from reposize database
Reorganized the reposize database directory, and split up a column.

checkStaleSizeChanges needs to run before needLiveUpdate,
otherwise the process won't be holding a lock on its pid file, and
another process could go in and expire the live update it records. It
just so happens that they do get called in the correct order, since
checking balanced preferred content calls getLiveRepoSizes before
needLiveUpdate.

The 1 minute delay between checks is arbitrary, but will avoid excess
work. The downside of it is that, if a process is dropping a file and
gets interrupted, for 1 minute another process can expect a repository
will soon be smaller than it is. And so a process might send data to a
repository when a file is not really going to be dropped from it. But
note that can already happen if a drop takes some time in eg locking and
then fails. So it seems possible that live updates should only be
allowed to increase, rather than decrease the size of a repository.
2024-08-28 13:57:25 -04:00
Joey Hess
278adbb726
combine 2 queries 2024-08-28 11:00:59 -04:00
Joey Hess
e006acef22
avoid reposize database locking overhead when not needed
Only when the preferred content expression being matched uses balanced
preferred content is this overhead needed.

It might be possible to eliminate the locking entirely. Eg, check the
live changes before and after the action and re-run if they are not
stable. For now, this is good enough, it avoids existing preferred
content getting slow. If balanced preferred content turns out to be too
slow to check, that could be tried later.
2024-08-28 10:52:34 -04:00
matrss
833150fd25 Added a comment 2024-08-28 14:11:36 +00:00
mih
16f9042046 Added a comment: Needed to retrieve single file metadata from bare repo 2024-08-28 13:58:30 +00:00
matrss
3f62116d64 Added a comment 2024-08-28 08:47:33 +00:00
Joey Hess
09955deebe
fix a deadlock when not using --auto
Live update never gets started, but then it still waited for it to
finish.

This only deadlocked with -J4 or so, not without -J. Unsure why.
2024-08-27 15:47:57 -04:00
Joey Hess
b01a63ef62
avoid nub
There will not usually be many live changes, but usually does not mean
ever, and O(N^2) is best avoided.
2024-08-27 15:00:10 -04:00
Joey Hess
0a119184e6
thoughts 2024-08-27 14:59:13 -04:00
Joey Hess
8555fb88ef
locking in checkLiveUpdate
This makes sure that two threads don't check balanced preferred content at the
same time, so each thread always sees a consistent picture of what is
happening.

This does add a fairly expensive file level lock to every check of
preferred content, in commands that use prepareLiveUpdate. It would
be good to only do that when live updates are actually needed, eg when
the preferred content expression uses balanced preferred content.
2024-08-27 13:12:43 -04:00
Joey Hess
4d2f95853d
closing in on finishing live reposizes
Fixed successfullyFinishedLiveSizeChange to not update the rolling total
when a redundant change is in RecentChanges.

Made setRepoSizes clear RecentChanges that are no longer needed.
It might be possible to clear those earlier, this is only a convenient
point to do it.

The reason it's safe to clear RecentChanges here is that, in order for a
live update to call successfullyFinishedLiveSizeChange, a change must be
made to a location log. If a RecentChange gets cleared, and just after
that a new live update is started, making the same change, the location
log has already been changed (since the RecentChange exists), and
so when the live update succeeds, it won't call
successfullyFinishedLiveSizeChange. The reason it doesn't
clear RecentChanges when there is a reduntant live update is because
I didn't want to think through whether or not all races are avoided in
that case.

The rolling total in SizeChanges is never cleared. Instead,
calcJournalledRepoSizes gets the initial value of it, and then
getLiveRepoSizes subtracts that initial value from the current value.
Since the rolling total can only be updated by updateRepoSize,
which is called with the journal locked, locking the journal in
calcJournalledRepoSizes ensures that the database does not change while
reading the journal.
2024-08-27 12:54:46 -04:00
Joey Hess
23d44aa4aa
use live reposizes in balanced preferred content 2024-08-27 10:17:43 -04:00
Joey Hess
d7813876a0
fixed the build
Manually tested getLiveRepoSizes and it is working correctly.
2024-08-27 09:41:35 -04:00
Joey Hess
521e0a7062
fix a deadlock
When finishedLiveUpdate was run on a different key than expected, it
blocked forever waiting for an indication the database had been updated.

Since the journal is locked when finishedLiveUpdate runs, this could
also have caused other git-annex commands to hang.
2024-08-27 00:13:54 -04:00
Spencer
949be665c0 Added contributions section to track my bugs and inquiries 2024-08-26 20:02:03 +00:00
Joey Hess
21608716bd
started work on getLiveRepoSizes
Doesn't quite compile
2024-08-26 14:50:09 -04:00
Joey Hess
db89e39df6
partially fix concurrency issue in updating the rollingtotal
It's possible for two processes or threads to both be doing the same
operation at the same time. Eg, both dropping the same key. If one
finishes and updates the rollingtotal, then the other one needs to be
prevented from later updating the rollingtotal as well. And they could
finish at the same time, or with some time in between.

Addressed this by making updateRepoSize be called with the journal
locked, and only once it's been determined that there is an actual
location change to record in the log. updateRepoSize waits for the
database to be updated.

When there is a redundant operation, updateRepoSize won't be called,
and the redundant LiveUpdate will be removed from the database on
garbage collection.

But: There will be a window where the redundant LiveUpdate is still
visible in the db, and processes can see it, combine it with the
rollingtotal, and arrive at the wrong size. This is a small window, but
it still ought to be addressed. Unsure if it would always be safe to
remove the redundant LiveUpdate? Consider the case where two drops and a
get are all running concurrently somehow, and the order they finish is
[drop, get, drop]. The second drop seems redundant to the first, but
it would not be safe to remove it. While this seems unlikely, it's hard
to rule out that a get and drop at different stages can both be running
at the same time.
2024-08-26 09:43:32 -04:00
Joey Hess
03c7f99957
todo 2024-08-25 10:48:42 -04:00
Joey Hess
18f8d61f55
rolling total of size changes in RepoSize database
When a live size change completes successfully, the same transaction
that removes it from the database updates the rolling total for its
repository.

The idea is that when RepoSizes is read, SizeChanges will be as
well, and cached locally. Any time a change is made, the local cache
will be updated. So by comparing the local cache with the current
SizeChanges, it can learn about size changes that were made by other
processes. Then read the LiveSizeChanges, and add that in to get a live
picture of the current sizes.

Also added a SizeChangeId. This allows 2 different threads, or
processes, to both record a live size change for the same repo and key,
and update their own information without stepping on one-another's toes.
2024-08-25 10:34:47 -04:00
Joey Hess
9188825a4d
use FileSize
It's just an alias, so this doesn't change the db schema, but it makes
explicit that it's not stored as an int64
2024-08-25 08:22:40 -04:00