Commit graph

2224 commits

Author SHA1 Message Date
Joey Hess
52c6434b87
fix removing simulated file 2024-09-09 14:07:52 -04:00
Joey Hess
bbd5390fa3
create simulated files 2024-09-09 11:28:30 -04:00
Joey Hess
5f3a2f4c6b
set descriptions for all simulated repos 2024-09-09 11:06:42 -04:00
Joey Hess
ec7f1f2736
simulated repository construction working 2024-09-09 10:59:01 -04:00
Joey Hess
a2c0d5e4a9
finish updateSimRepoState
Converted maps to use UUID as key.

Also added mincopies to the sim.
2024-09-09 09:37:59 -04:00
Joey Hess
4e11cb19ef
implemented cloneSimRepo
Started on updateSimRepoState
2024-09-06 14:23:29 -04:00
Joey Hess
8d707c4821
more work on applySimCommand
When using an existing repo, copy over all of its config into the sim.

Added CommandTrustLevel.

Start at creating a git clone for a simulated repo, but it's not done
yet.
2024-09-06 12:53:51 -04:00
Joey Hess
65dd018850
prevent overwriting a repo in the simulation 2024-09-05 16:22:08 -04:00
Joey Hess
8f8e35ac3b
almost finished with applySimCommand
Added checks that repo names are ones that have been added to the sim.

Implemented preferred content etc setting. It does not need to parse the
expression in applySimCommand, instead that can be done when running the
sim. This keeps it pure.

But, it can't be entirely pure because of CommandAddTree. So made it
return an Annex action when necessary.

Moved makeMatcher into Annex.FileMatcher in preparation for using it,
but it's not used yet. Also moved checkPreferredContentExpression.
2024-09-05 15:22:41 -04:00
Joey Hess
710a199ce9
implement CommandUse in Annex.Sim
Refactored Remote to keep it pure.
2024-09-05 10:50:04 -04:00
Joey Hess
b932acf4ad
started Annex.Sim
Have most of the sim command handler, but to keep it pure while implementing
the rest will need some refactoring.

It seems likely that running the simulation itself will not be able to be
entirely pure. Preferred content evaluation runs in Annex after all.

Note that the somewhat awkward randomWords is because the i386ancient
build depends on a version of random too old to support generating a
random ByteString on its own.
2024-09-04 15:15:36 -04:00
Joey Hess
340bdd0dac
treat "not present" in preferred content as invalid
Detect when a preferred content expression contains "not present", which
would lead to repeatedly getting and then dropping files, and make it never
match. This also applies to "not balanced" and "not sizebalanced".

--explain will tell the user when this happens

Note that getMatcher calls matchMrun' and does not check for unstable
negated limits. While there is no --present anyway, if there was,
it would not make sense for --not --present to complain about
instability and fail to match.
2024-09-03 13:50:06 -04:00
Joey Hess
35ff8c8c00
use Utility.PID
fixes build on i386ancient
2024-08-30 14:56:38 -04:00
Joey Hess
3b74386ed4
fix liveupdate locking
This fixes the build on windows.

Changed it to use lock pools, which will behave better if two threads
call getLiveRepoSizes at the same time.

Also this should make it work when annex.pidlock is set. In that case,
once the current process locks this file, or anything, any other process
will have to wait on the pid lock. So checkStaleSizeChanges will
correctly identify any other live changes in the database as stale,
since there can only be one git-annex process running.
2024-08-30 14:49:18 -04:00
Joey Hess
c10d11959a
fix paste oops
Wow, I pasted a big thing into entirely the wrong file, but it was in a
comment so it compiled anyway.
2024-08-30 14:35:05 -04:00
Joey Hess
133584a83a
avoid locking the journal in readonly repository
The test suite flagged that git-annex info in a readonly repository was
no longer working.

.git/annex/journal.lck: openFd: permission denied

This fixes it, however, in a case where .git/annex/reposize/ is
writable, but .git/annex/journal/ is not, there will still be a
permission denied error. The solution would just be to use consistent
permissions I suppose.
2024-08-30 11:58:10 -04:00
Joey Hess
d876e06e35
err on the side of larger repository size
When a live update is removing a key, it might fail. So only count those
once they have succeeded. When a live update is adding a key, count it
immediately to avoid over-filling a repo.

This also makes the 1 minute delay between stale live changes checks
more defensible, because a stale live change can only cause us to err
more on the side of caution.
2024-08-28 14:13:12 -04:00
Joey Hess
f89a1b8216
remove stale live changes from reposize database
Reorganized the reposize database directory, and split up a column.

checkStaleSizeChanges needs to run before needLiveUpdate,
otherwise the process won't be holding a lock on its pid file, and
another process could go in and expire the live update it records. It
just so happens that they do get called in the correct order, since
checking balanced preferred content calls getLiveRepoSizes before
needLiveUpdate.

The 1 minute delay between checks is arbitrary, but will avoid excess
work. The downside of it is that, if a process is dropping a file and
gets interrupted, for 1 minute another process can expect a repository
will soon be smaller than it is. And so a process might send data to a
repository when a file is not really going to be dropped from it. But
note that can already happen if a drop takes some time in eg locking and
then fails. So it seems possible that live updates should only be
allowed to increase, rather than decrease the size of a repository.
2024-08-28 13:57:25 -04:00
Joey Hess
e006acef22
avoid reposize database locking overhead when not needed
Only when the preferred content expression being matched uses balanced
preferred content is this overhead needed.

It might be possible to eliminate the locking entirely. Eg, check the
live changes before and after the action and re-run if they are not
stable. For now, this is good enough, it avoids existing preferred
content getting slow. If balanced preferred content turns out to be too
slow to check, that could be tried later.
2024-08-28 10:52:34 -04:00
Joey Hess
09955deebe
fix a deadlock when not using --auto
Live update never gets started, but then it still waited for it to
finish.

This only deadlocked with -J4 or so, not without -J. Unsure why.
2024-08-27 15:47:57 -04:00
Joey Hess
8555fb88ef
locking in checkLiveUpdate
This makes sure that two threads don't check balanced preferred content at the
same time, so each thread always sees a consistent picture of what is
happening.

This does add a fairly expensive file level lock to every check of
preferred content, in commands that use prepareLiveUpdate. It would
be good to only do that when live updates are actually needed, eg when
the preferred content expression uses balanced preferred content.
2024-08-27 13:12:43 -04:00
Joey Hess
4d2f95853d
closing in on finishing live reposizes
Fixed successfullyFinishedLiveSizeChange to not update the rolling total
when a redundant change is in RecentChanges.

Made setRepoSizes clear RecentChanges that are no longer needed.
It might be possible to clear those earlier, this is only a convenient
point to do it.

The reason it's safe to clear RecentChanges here is that, in order for a
live update to call successfullyFinishedLiveSizeChange, a change must be
made to a location log. If a RecentChange gets cleared, and just after
that a new live update is started, making the same change, the location
log has already been changed (since the RecentChange exists), and
so when the live update succeeds, it won't call
successfullyFinishedLiveSizeChange. The reason it doesn't
clear RecentChanges when there is a reduntant live update is because
I didn't want to think through whether or not all races are avoided in
that case.

The rolling total in SizeChanges is never cleared. Instead,
calcJournalledRepoSizes gets the initial value of it, and then
getLiveRepoSizes subtracts that initial value from the current value.
Since the rolling total can only be updated by updateRepoSize,
which is called with the journal locked, locking the journal in
calcJournalledRepoSizes ensures that the database does not change while
reading the journal.
2024-08-27 12:54:46 -04:00
Joey Hess
23d44aa4aa
use live reposizes in balanced preferred content 2024-08-27 10:17:43 -04:00
Joey Hess
d7813876a0
fixed the build
Manually tested getLiveRepoSizes and it is working correctly.
2024-08-27 09:41:35 -04:00
Joey Hess
521e0a7062
fix a deadlock
When finishedLiveUpdate was run on a different key than expected, it
blocked forever waiting for an indication the database had been updated.

Since the journal is locked when finishedLiveUpdate runs, this could
also have caused other git-annex commands to hang.
2024-08-27 00:13:54 -04:00
Joey Hess
21608716bd
started work on getLiveRepoSizes
Doesn't quite compile
2024-08-26 14:50:09 -04:00
Joey Hess
db89e39df6
partially fix concurrency issue in updating the rollingtotal
It's possible for two processes or threads to both be doing the same
operation at the same time. Eg, both dropping the same key. If one
finishes and updates the rollingtotal, then the other one needs to be
prevented from later updating the rollingtotal as well. And they could
finish at the same time, or with some time in between.

Addressed this by making updateRepoSize be called with the journal
locked, and only once it's been determined that there is an actual
location change to record in the log. updateRepoSize waits for the
database to be updated.

When there is a redundant operation, updateRepoSize won't be called,
and the redundant LiveUpdate will be removed from the database on
garbage collection.

But: There will be a window where the redundant LiveUpdate is still
visible in the db, and processes can see it, combine it with the
rollingtotal, and arrive at the wrong size. This is a small window, but
it still ought to be addressed. Unsure if it would always be safe to
remove the redundant LiveUpdate? Consider the case where two drops and a
get are all running concurrently somehow, and the order they finish is
[drop, get, drop]. The second drop seems redundant to the first, but
it would not be safe to remove it. While this seems unlikely, it's hard
to rule out that a get and drop at different stages can both be running
at the same time.
2024-08-26 09:43:32 -04:00
Joey Hess
18f8d61f55
rolling total of size changes in RepoSize database
When a live size change completes successfully, the same transaction
that removes it from the database updates the rolling total for its
repository.

The idea is that when RepoSizes is read, SizeChanges will be as
well, and cached locally. Any time a change is made, the local cache
will be updated. So by comparing the local cache with the current
SizeChanges, it can learn about size changes that were made by other
processes. Then read the LiveSizeChanges, and add that in to get a live
picture of the current sizes.

Also added a SizeChangeId. This allows 2 different threads, or
processes, to both record a live size change for the same repo and key,
and update their own information without stepping on one-another's toes.
2024-08-25 10:34:47 -04:00
Joey Hess
d60a33fd13
improve live update starting
In an expression like "balanced=foo and exclude=bar", avoid it starting
a live update when the overall expression doesn't match.
2024-08-24 13:07:05 -04:00
Joey Hess
2f20b939b7
LiveUpdate db updates working
I've tested the behavior of the thread that waits for the LiveUpdate to
be finished, and it does get signaled and exit cleanly when the
LiveUpdate is GCed instead.

Made finishedLiveUpdate wait for the thread to finish updating the
database.

There is a case where GC doesn't happen in time and the database is left
with a live update recorded in it. This should not be a problem as such
stale data can also happen when interrupted and will need to be detected
when loading the database.

Balanced preferred content expressions now call startLiveUpdate.
2024-08-24 11:49:58 -04:00
Joey Hess
84d1bb746b
LiveUpdate for clusters 2024-08-24 10:20:12 -04:00
Joey Hess
3f8675f339
more LiveUpdate plumbing 2024-08-24 09:28:41 -04:00
Joey Hess
c3d40b9ec3
plumb in LiveUpdate (WIP)
Each command that first checks preferred content (and/or required
content) and then does something that can change the sizes of
repositories needs to call prepareLiveUpdate, and plumb it through the
preferred content check and the location log update.

So far, only Command.Drop is done. Many other commands that don't need
to do this have been updated to keep working.

There may be some calls to NoLiveUpdate in places where that should be
done. All will need to be double checked.

Not currently in a compilable state.
2024-08-23 16:35:12 -04:00
Joey Hess
3fe67744b1
display new empty repos in maxsize table
A new repo that has no location log info yet, but has an entry in
uuid.log has 0 size, so make RepoSize aware of that.

Note that a new repo that does not yet appear in uuid.log will still not
be displayed.

When a remote is added but not synced with yet, it has no uuid.log
entry. If git-annex maxsize is used to configure that remote, it needs
to appear in the maxsize table, and the change to Command.MaxSize takes
care of that.
2024-08-22 07:03:22 -04:00
Joey Hess
9e87061de2
Support "sizebalanced=" and "fullysizebalanced=" too
Might want to make --rebalance turn balanced=group:N where N > 1
to fullysizebalanced=group:N. Have not yet determined if that will
improve situations enough to be worth the extra work.
2024-08-21 15:01:54 -04:00
Joey Hess
476d223bce
implement fullbalanced=group:N
Rebalancing this when it gets into a suboptimal situation will need
further work.
2024-08-20 13:51:02 -04:00
Joey Hess
016edcf437
adjust countdown number for RepoSize update message
Benchmarking a git-annex branch with half a million files changed,
it takes about 1 minute to update the RepoSizes. So this will display
the message after a few seconds.
2024-08-17 15:59:07 -04:00
Joey Hess
b62b58b50b
git-annex info speed up using getRepoSizes 2024-08-17 14:54:31 -04:00
Joey Hess
d09a005f2b
update RepoSize database from git-annex branch incrementally
The use of catObjectStream is optimally fast. Although it might be
possible to combine this with git-annex branch merge to avoid some
redundant work.

Benchmarking, a git-annex branch that had 100000 files changed
took less than 1.88 seconds to run through this.
2024-08-17 13:35:00 -04:00
Joey Hess
8239824d92
consistently omit clusters when calculating RepoSizes
updateRepoSize is only called on the UUID of a repository, not any
cluster it might be a node of. But overLocationLogs and overLocationLogsJournal
were inclusing cluster UUIDs. So it was inconsistent.

Currently I don't see any reason to calculate RepoSize for a cluster.
It's not even clear what it should mean, the total size of all nodes, or
the amount of information stored in the cluster in total?
2024-08-17 11:24:14 -04:00
Joey Hess
61d95627f3
fix Annex.repoSize sharing between threads 2024-08-16 10:56:51 -04:00
Joey Hess
a2da9c526b
RepoSize concurrency fix
When loading the journalled repo sizes, make sure that the current
process is prevented from making changes to the journal in another
thread.
2024-08-15 13:37:41 -04:00
Joey Hess
06064f897c
update Annex.reposizes when changing location logs
The live update is only needed when Annex.reposizes has already been
populated.
2024-08-15 13:27:14 -04:00
Joey Hess
c376b1bd7e
show message when doing possibly expensive from scratch reposize calculation 2024-08-15 12:42:36 -04:00
Joey Hess
c200523bac
implement getRepoSizes
At this point the RepoSize database is getting populated, and it
all seems to be working correctly. Incremental updates still need to be
done to make it performant.
2024-08-15 12:31:56 -04:00
Joey Hess
eac4e9391b
finalize RepoSize database
Including locking on creation, handling of permissions errors, and
setting repo sizes.

I'm confident that locking is not needed while using this database.
Since writes happen in a single transaction. When there are two writers
that are recording sizes based on different git-annex branch commits,
one will overwrite what the other one recorded. Which is fine, it's only
necessary that the database stays consistent with the content of a
git-annex branch commit.
2024-08-15 12:29:34 -04:00
Joey Hess
63a3cedc45
slightly improve hairy types 2024-08-14 16:04:18 -04:00
Joey Hess
3e6eb2a58d
implement journalledRepoSizes
Plan is to run this when populating Annex.reposizes on demand.
So Annex.reposizes will be up-to-date with the journal, including
crucially journal entries for private repositories. But also
anything that has been written to the journal by another process,
especially if the process was ran with annex.alwayscommit=false.

From there, Annex.reposizes can be kept up to date with changes made
by the running process.
2024-08-14 13:53:24 -04:00
Joey Hess
8ac2685b33
calcBranchRepoSizes without journal files
This will be used to prime the RepoSizes database, which will always
contain values that correpond to information in the git-annex branch, so
without anything from journal files.

Factored out overJournalFileContents which will later be used to
update Annex.reposizes to include information from journal files.
This will be partitcularly important to support private UUIDs which only
ever get to journal files and not to the branch.
2024-08-14 03:19:30 -04:00
Joey Hess
5afbea25e7
avoid counting size of keys that are in the journal twice
In calcRepoSizes and also git-annex info, when a key was in the journal,
it was passed to the callback twice, so the calculated size was wrong.
2024-08-13 13:23:39 -04:00
Joey Hess
467d80101a
improve handling of unmerged git-annex branches in readonly repo
git-annex info was displaying a message that didn't make sense in
context.

In calcRepoSizes, it seems better to return the information from the
git-annex branch, rather than giving up. Especially since balanced
preferred content uses it, and we can't just give up evaluating a
preferred content expression if git-annex is to be usable in such a
readonly repo.

Commit 6d7ecd9e5d nobly wanted git-annex
to behave the same with such unmerged branches as it does when it can
merge them. But for the purposes of preferred content, it seems to me
there's a sense that such an unmerged branch is the same as a remote we
have not pulled from. The balanced preferred content will either way
operate under outdated information, and so make not the best choices.
2024-08-13 13:13:12 -04:00
Joey Hess
745bc5c547
take maxsize into account for balanced preferred content
This is very innefficient, it will need to be optimised not to
calculate the sizes of repos every time.

Also, fixed a bug in balancedPicker that caused it to pick a too high
index when some repos were excluded due to being full.
2024-08-13 11:00:20 -04:00
Joey Hess
99a126bebb
added reposize database
The idea is that upon a merge of the git-annex branch, or a commit to
the git-annex branch, the reposize database will be updated. So it
should always accurately reflect the location log sizes, but it will
often be behind the actual current sizes.

Annex.reposizes will start with the value from the database, and get
updated with each transfer, so it will reflect a process's best
understanding of the current sizes.

When there are multiple processes all transferring to the same repo,
Annex.reposize will not reflect transfers made by the other processes
since the current process started. So when using balanced preferred
content, it may make suboptimal choices, including trying to transfer
content to the repo when another process has already filled it up.
But this is the same as if there are multiple processes running on
ifferent machines, so is acceptable. The reposize will eventually
get an accurate value reflecting changes made by other processes or in
other repos.
2024-08-12 11:19:58 -04:00
Joey Hess
bd5affa362
use hmac in balanced preferred content
This deals with the possible security problem that someone could make an
unusually low UUID and generate keys that are all constructed to hash to
a number that, mod the number of repositories in the group, == 0.
So balanced preferred content would always put those keys in the
repository with the low UUID as long as the group contains the
number of repositories that the attacker anticipated.
Presumably the attacker than holds the data for ransom? Dunno.

Anyway, the partial solution is to use HMAC (sha256) with all the UUIDs
combined together as the "secret", and the key as the "message". Now any
change in the set of UUIDs in a group will invalidate the attacker's
constructed keys from hashing to anything in particular.

Given that there are plenty of other things someone can do if they can
write to the repository -- including modifying preferred content so only
their repository wants files, and numcopies so other repositories drom
them -- this seems like safeguard enough.

Note that, in balancedPicker, combineduuids is memoized.
2024-08-10 16:32:54 -04:00
Joey Hess
3ce2e95a5f
balanced preferred content and --rebalance
This all works fine. But it doesn't check repository sizes yet, and
without repository size checking, once a repository gets full, there
will be no other repository that will want its files.

Use of sha2 seems unncessary, probably alder2 or md5 or crc would have
been enough. Possibly just summing up the bytes of the key mod the number
of repositories would have sufficed. But sha2 is there, and probably
hardware accellerated. I doubt very much there is any security benefit
to using it though. If someone wants to construct a key that will be
balanced onto a given repository, sha2 is certianly not going to stop
them.
2024-08-09 14:16:09 -04:00
Joey Hess
3ea835c7e8
proxied exporttree=yes versionedexport=yes remotes are not untrusted
This removes versionedExport, which was only used by the S3 special
remote. Instead, versionedexport=yes is a common way for remotes to
indicate that they are versioned.
2024-08-08 15:24:19 -04:00
Joey Hess
1038567881
proxy stores received keys to known export locations
This handles the workflow where the branch is first pushed to the proxy,
and then files in the exported tree are later are copied to the proxied remote.

Turns out that the way the export log is structured, nothing needs
to be done to finalize the export once the last key is sent to it. Which
is great because that would have been a lot of complication. On
receiving the push, Command.Export runs and calls recordExportBeginning,
does as much as it can to update the export with the files currently
on it, and then calls recordExportUnderway. At that point, the
export.log records the export as "complete", but it's not really. And
that's fine. The same happens when using `git-annex export` when some
files are not available to send. Other repositories that have
access to the special remote can already retrieve files from it. As
the missing files get copied to the exported remote, all that needs
to be done is record each in the export db.

At this point, proxying to exporttree=yes annexobjects=yes special remotes
is fully working. Except for in the case where multiple files in the
tree use the same key, and the files are sent to the proxied remote
before pushing the tree.

It seems that even special remotes without annexobjects=yes will work if
used with the workflow where the git-annex branch is pushed before
copying files. But not with the `git-annex push` workflow.
2024-08-07 09:47:34 -04:00
Joey Hess
c53f61e93f
Merge branch 'master' into exportreeplus 2024-08-06 14:46:33 -04:00
Joey Hess
3cc03b4c96
fix file corruption when proxying an upload to a special remote
The file corruption consists of each chunk of the file being duplicated.
Since chunks are typically a fixed size, it would certianly be possible
to get from a corrupted file back to the original file. But this is still
bad data loss.

Reversion was in commit fcc052bed8.
Luckily that did not make the most recent release.
2024-08-06 14:41:19 -04:00
Joey Hess
3289b1ad02
proxying to exporttree=yes annexobjects=yes basically working
It works when using git-annex sync/push/assist, or when manually sending
all content to the proxied remote before pushing to the proxy remote.
But when the push comes before the content is sent, sending content does
not update the exported tree.
2024-08-06 14:21:23 -04:00
Joey Hess
a3d96474f2
rename to annexobjects location on unexport
This avoids needing to re-upload the file again to get it to the
annexobjects location, which git-annex sync was doing when it was
preferred content.

If the file is not preferred content, sync will drop it from the
annexobjects location.

If the file has been deleted from the tree, it will remain in the
annexobjects location until an unused/dropunused pass is done.
2024-08-04 11:58:07 -04:00
Joey Hess
28b29f63dc
initial support for annexobjects=yes
Works but some commands may need changes to support special remotes
configured this way.
2024-08-02 14:07:45 -04:00
Joey Hess
3a1f39fbdf
Avoid loading cluster log at startup
This fixes a problem with datalad's test suite, where loading the cluster
log happened to cause the git-annex branch commits to take a different
shape, with an additional commit.

It's also faster though, since many commands don't need the cluster log.

Just fill Annex.clusters with a thunk.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-07-31 15:54:14 -04:00
Joey Hess
dc11c5f493
final fix to windows build 2024-07-29 16:32:24 -04:00
Joey Hess
be8e1ab512
more fixes to windows build for content retention files
Will probably build successfully now. Still untested.
2024-07-29 15:14:12 -04:00
Joey Hess
647bff9770
more fixes to windows build for content retention files 2024-07-29 13:58:40 -04:00
Joey Hess
fcc052bed8
When proxying an upload to a special remote, verify the hash.
While usually uploading to a special remote does not verify the content,
the content in a repository is assumed to be valid, and there is no trust
boundary. But with a proxied special remote, there may be users who are
allowed to store objects, but are not really trusted.

Another way to look at this is it's the equivilant of git-annex-shell
checking the hash of received data, which it does (see StoreContent
implementation).
2024-07-29 13:40:51 -04:00
Joey Hess
f397296739
add missing do on windows 2024-07-29 12:54:52 -04:00
Joey Hess
0dc064a9ad
When proxying for a special remote, avoid unncessary hashing
Like the comment says, the client will do its own verification. But it was
calling verifyKeyContentPostRetrieval, which was hashing the file.
2024-07-29 11:18:03 -04:00
Joey Hess
dfe65b92c8
avoid repeatedly parsing the proxy log 2024-07-28 16:04:20 -04:00
Joey Hess
cdc4bd7443
fix hang in PUT of large file to a special remote node of a cluster over http 2024-07-28 15:34:59 -04:00
Joey Hess
18ed4e5b20
use closedv rather than separate endv
Doesn't fix any known problem, but this way if the connection does get
closed, it will notice.
2024-07-28 15:11:31 -04:00
Joey Hess
66679c9bb4
remove temp file after upload to special remote 2024-07-28 14:36:45 -04:00
Joey Hess
6722a61a21
clusters need enableInteractiveBranchAccess
As seen in commit 770aac97a7, a cluster
relies accurate location logs. If long-running processes are serving a
cluster, and one process puts a file, the other process needs to see
what nodes it was stored on when checking if the file is present.
2024-07-28 12:39:42 -04:00
Joey Hess
bd3d327d8a
smarter BranchState cache invalidation
Only invalidate a just-written file in the cache, not the whole cache.

This will avoid the possibly performance impact of cache invalidation
mentioned in commit 770aac97a7
2024-07-28 12:33:32 -04:00
Joey Hess
770aac97a7
share single BranchState amoung all threads
This fixes a problem when git-annex testremote is run against a cluster
accessed via the http server. Annex.Cluster uses the location log
to find nodes that contain a key when checking if the key is present or getting
it. Just after a key was stored to a cluster node, reading the location log
was not getting the UUID of that node.

Apparently the Annex action that wrote to the location log, and the one
that read from it were run with two different Annex states. The http server
does use several different Annex threads.

BranchState was part of the AnnexState, and so two threads could have
different BranchStates.

Moved BranchState to the AnnexRead, so all threads will see the common state.

This might possibly impact performance. If one thread is writing changes to the
branch, and another thread is reading from the branch, the writing thread will
now invalidate the BranchState's cache, which will cause the reading thread to
need to do extra work. But correctness is surely more important. If did is
found to have impacted performance, it could probably be dealt with by doing
smarter BranchState cache invalidation.

Another way this might impact performance is that the BranchState has a small
cache. If several threads were reading from the branch and relying on the value
they just read still being in the case, now a cache miss will be more likely.
Increasing the BranchState cache to the number of jobs might be a good
idea to amelorate that. But the cache is currently an innefficient list,
so making it large would need changes to the data types.

(Commit 4304f1b6ae dealt with a follow-on
effect of the bug fixed here.)
2024-07-28 12:30:27 -04:00
Joey Hess
fbbedae497
add --clusterjobs option and default to 1
The default of 1 is not ideal at all, but it avoids an accidental M*N
causing so much concurrency it becomes unusable.
2024-07-28 10:36:22 -04:00
Joey Hess
1259ad89b6
cluster support in http API server
Wired it up and it seems to basically work, although the test suite is
not fully passing.

Note that --jobs currently gets multiplied by the number of nodes in the
cluster, which is probably not good.
2024-07-28 10:17:29 -04:00
Joey Hess
8ec174408e
remove duplicate code 2024-07-28 09:35:09 -04:00
Joey Hess
ef8f24f28c
fix PUT to http proxied special remote
It was hanging because it never sent FAILURE in the INVALID case.
And putoffset always triggers the INVALID case.
2024-07-28 09:14:42 -04:00
Joey Hess
0fb86d2916
UNLOCKCONTENT is not a top-level request
proxyRequest was treating UNLOCKCONTENT as a separate request.
That made it possible for there to be two different connections to the
proxied remote, with LOCKCONTENT being sent to one, and UNLOCKCONTENT
to the other one. A protocol error.

git-annex testremote now passes against a http proxied remote.
2024-07-26 20:39:06 -04:00
Joey Hess
267a202e72
clean up after http p2p proxy GET is interrupted
There was an annex worker thread that did not get stopped.

It was stuck in ReceiveMessage from the P2PHandleTMVar.

Fixed by making P2PHandleTMVar closeable.

In serveGet, releaseP2PConnection has to come first, else the
annexworker may not shut down, if it's waiting to read from it.

In proxyConnection, call closeRemoteSide in order to wait for the ssh
process (for example).
2024-07-26 15:33:20 -04:00
Joey Hess
ad025b8e5e
clean up protocol version for proxying
The proxy always checks the protocol version of a remote before talking
to it in a version-specific way, so the protocol version in the ProxyParams
is the client's protocol version. The remote will always be at the same or
an older protocol version than the client.

Note that in relayDATAFinish, when the client is at protocol version 0,
the remote must thus be as well, and that's why its version is not
checked in the case for that.

With that clarified, it's evident that, in P2P.Http.State, there's no
need to look at the proxied remote's protocol version at all.
2024-07-26 13:49:05 -04:00
Joey Hess
cc1da2d516
http p2p proxy is now largely working 2024-07-26 10:44:10 -04:00
Joey Hess
b391756b32
remove some debugging 2024-07-25 21:36:10 -04:00
Joey Hess
6ef6ad808f
use a record to reduce the huge number of parameters 2024-07-25 15:23:18 -04:00
Joey Hess
3d14e2cf58
http server support for proxies, incomplete
Refactored git-annex-shell code so this can use checkCanProxy'.

At this point all that remains is opening a proxy connection,
and using a proxy connection.
2024-07-25 13:19:24 -04:00
Joey Hess
10f2c23fd7
fix slowloris timeout in hashing resume of download of large file
Hash the data that is already present in the file before connecting to
the http server.
2024-07-24 11:03:59 -04:00
Joey Hess
0594338a78
fix meter display on resume
It still needs to be offset, otherwise on resume from 80% it will
display 1%..20%.

Seems that this bug must have affected P2P.Annex as well where it runs
this code, but apparently it didn't affect it in a very user-visible
way. Maybe the transfer log file was updated incorrectly?
2024-07-24 10:28:48 -04:00
Joey Hess
7bd616e169
Remote.Git retrieveKeyFile works with annex+http urls
This includes a bugfix to serveGet, it hung at the end.
2024-07-24 10:28:44 -04:00
Joey Hess
a2d1844292
factor out resumeVerifyFromOffset 2024-07-24 09:29:44 -04:00
Joey Hess
b4d749cc91
Merge branch 'master' into httpproto 2024-07-23 21:17:06 -04:00
Joey Hess
f7404a64c0
Propagate --force to git-annex transferrer
And other child processes.
2024-07-23 21:16:56 -04:00
Joey Hess
4826a3745d
servePut and clientPut implementation
Made the data-length header required even for v0. This simplifies the
implementation, and doesn't preclude extra verification being done for
v0.

The connectionWaitVar is an ugly hack. In servePut, nothing waits
on the waitvar, and I could not find a good way to make anything wait on
it.
2024-07-22 10:27:44 -04:00
Joey Hess
74c6175795
fix serveGet early handle close
Needed that waitv after all..
2024-07-11 09:55:17 -04:00
Joey Hess
3b37b9e53f
fix serveGet hang
This came down to SendBytes waiting on the waitv. Nothing ever filled
it.

Only Annex.Proxy needs the waitv, and it handles filling it. So make it
optional.
2024-07-11 07:46:52 -04:00
Joey Hess
f9b7ce7224
add Annex worker pool to P2PHttp
This will be needed for get and store, since those need to run Annex
actions.

withLocalP2PConnections will also probably use it.
2024-07-10 12:19:47 -04:00
Joey Hess
f452bd448a
REMOVE-BEFORE and GETTIMESTAMP proxying
For clusters, the timestamps have to be translated, since each node can
have its own idea about what time it is. To translate a timestamp, the
proxy remembers what time it asked the node for a timestamp in
GETTIMESTAMP, and applies the delta as an offset in REMOVE-BEFORE.

This does mean that a remove from a cluster has to call GETTIMESTAMP on
every node before dropping from nodes. Not very efficient. Although
currently it tries to drop from every single node anyway, which is also
not very efficient.

I thought about caching the GETTIMESTAMP from the nodes on the first
call. That would improve efficiency. But, since monotonic clocks on
!Linux don't advance when the computer is suspended, consider what might
happen if one node was suspended for a while, then came back. Its
monotonic timestamp would end up behind where the proxying expects it to
be. Would that result in removing when it shouldn't, or refusing to
remove when it should? Have not thought it through. Either way, a
cluster behaving strangly for an extended period of time because one
of its nodes was briefly asleep doesn't seem like good behavior.
2024-07-04 15:09:34 -04:00
Joey Hess
99b7a0cfe9
use REMOVE-BEFORE in P2P protocol
Only clusters still need to be fixed to close this todo.
2024-07-04 13:47:38 -04:00
Joey Hess
1243af4a18
toward SafeDropProof expiry checking
Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.

Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.

Made Remote.Git check expiry when dropping from a local remote.

Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.

Fixing the remaining 2 build warnings should complete this work.

Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?

There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
2024-07-04 12:39:06 -04:00