Commit graph

27 commits

Author SHA1 Message Date
Joey Hess
84d1bb746b
LiveUpdate for clusters 2024-08-24 10:20:12 -04:00
Joey Hess
c3d40b9ec3
plumb in LiveUpdate (WIP)
Each command that first checks preferred content (and/or required
content) and then does something that can change the sizes of
repositories needs to call prepareLiveUpdate, and plumb it through the
preferred content check and the location log update.

So far, only Command.Drop is done. Many other commands that don't need
to do this have been updated to keep working.

There may be some calls to NoLiveUpdate in places where that should be
done. All will need to be double checked.

Not currently in a compilable state.
2024-08-23 16:35:12 -04:00
Joey Hess
6722a61a21
clusters need enableInteractiveBranchAccess
As seen in commit 770aac97a7, a cluster
relies accurate location logs. If long-running processes are serving a
cluster, and one process puts a file, the other process needs to see
what nodes it was stored on when checking if the file is present.
2024-07-28 12:39:42 -04:00
Joey Hess
fbbedae497
add --clusterjobs option and default to 1
The default of 1 is not ideal at all, but it avoids an accidental M*N
causing so much concurrency it becomes unusable.
2024-07-28 10:36:22 -04:00
Joey Hess
1259ad89b6
cluster support in http API server
Wired it up and it seems to basically work, although the test suite is
not fully passing.

Note that --jobs currently gets multiplied by the number of nodes in the
cluster, which is probably not good.
2024-07-28 10:17:29 -04:00
Joey Hess
8ec174408e
remove duplicate code 2024-07-28 09:35:09 -04:00
Joey Hess
0fb86d2916
UNLOCKCONTENT is not a top-level request
proxyRequest was treating UNLOCKCONTENT as a separate request.
That made it possible for there to be two different connections to the
proxied remote, with LOCKCONTENT being sent to one, and UNLOCKCONTENT
to the other one. A protocol error.

git-annex testremote now passes against a http proxied remote.
2024-07-26 20:39:06 -04:00
Joey Hess
ad025b8e5e
clean up protocol version for proxying
The proxy always checks the protocol version of a remote before talking
to it in a version-specific way, so the protocol version in the ProxyParams
is the client's protocol version. The remote will always be at the same or
an older protocol version than the client.

Note that in relayDATAFinish, when the client is at protocol version 0,
the remote must thus be as well, and that's why its version is not
checked in the case for that.

With that clarified, it's evident that, in P2P.Http.State, there's no
need to look at the proxied remote's protocol version at all.
2024-07-26 13:49:05 -04:00
Joey Hess
6ef6ad808f
use a record to reduce the huge number of parameters 2024-07-25 15:23:18 -04:00
Joey Hess
f452bd448a
REMOVE-BEFORE and GETTIMESTAMP proxying
For clusters, the timestamps have to be translated, since each node can
have its own idea about what time it is. To translate a timestamp, the
proxy remembers what time it asked the node for a timestamp in
GETTIMESTAMP, and applies the delta as an offset in REMOVE-BEFORE.

This does mean that a remove from a cluster has to call GETTIMESTAMP on
every node before dropping from nodes. Not very efficient. Although
currently it tries to drop from every single node anyway, which is also
not very efficient.

I thought about caching the GETTIMESTAMP from the nodes on the first
call. That would improve efficiency. But, since monotonic clocks on
!Linux don't advance when the computer is suspended, consider what might
happen if one node was suspended for a while, then came back. Its
monotonic timestamp would end up behind where the proxying expects it to
be. Would that result in removing when it shouldn't, or refusing to
remove when it should? Have not thought it through. Either way, a
cluster behaving strangly for an extended period of time because one
of its nodes was briefly asleep doesn't seem like good behavior.
2024-07-04 15:09:34 -04:00
Joey Hess
2f2cc38c28
fix build on old ghc
getStdRandom used to be an IO action
2024-07-02 12:27:14 -04:00
Joey Hess
62750f0102
shut down RemoteSides cleanly
Before it just exited without actually shutting down the RemoteSides,
when the client hung up.
2024-06-28 13:19:57 -04:00
Joey Hess
cf59d7f92c
GET and CHECKPRESENT amoung lowest cost cluster nodes
Before it was using a node that might have had a higher cost.

Also threw in a random selection from amoung the low cost nodes. Of
course this is a poor excuse for load balancing, but it's better than
nothing. Most of the time...
2024-06-27 14:36:55 -04:00
Joey Hess
dabd05e547
remove a TODO marker
I have a todo item for this outside the code
2024-06-27 13:36:04 -04:00
Joey Hess
3dad9446ce
distributed cluster cycle prevention
Added BYPASS to P2P protocol, and use it to avoid cycling between
cluster gateways.

Distributed clusters are working well now!
2024-06-27 12:20:22 -04:00
Joey Hess
effaf51b1f
avoid loop between cluster gateways
The VIA extension is still needed to avoid some extra work and ugly
messages, but this is enough that it actually works.

This filters out the RemoteSides that are a proxied connection via a
remote gateway to the cluster.

The VIA extension will not filter those out, but will send VIA to them
on connect, which will cause the ones that are accessed via the listed
gateways to be filtered out.
2024-06-26 15:29:59 -04:00
Joey Hess
4172109c8d
support multi-gateway clusters
VIA extension still needed otherwise a copy to a cluster can loop
forever.
2024-06-26 15:07:03 -04:00
Joey Hess
cec2848e8a
support annex.jobs for clusters 2024-06-25 14:54:20 -04:00
Joey Hess
1bfe7f8a53
honor preferred content settings of cluster nodes
Except when no nodes want a file, it has to be stored somewhere, so
store it on all. Which is not really desirable, but neither is having to
pick one.

ProtoAssociatedFile deserialization is rather broken, and this could
possibly affect preferred content expressions that match on filenames.

The inability to roundtrip whitespace like tabs and newlines through is
not a problem because preferred content expressions can't be written
that match on whitespace such as a tab. For example:

joey@darkstar:~/tmp/bench/z>git-annex wanted  origin-node2 'exclude=*CTRL-VTab*'
wanted origin-node2
git-annex: Parse error: Parse failure: near "*"

But, the filtering of control characters could perhaps be a problem. I think
that filtering is now obsolete, git-annex has comprehensive filtering of
control characters when displaying filenames, that happens at a higher level.
However, I don't want to risk a security hole so am leaving in that filtering
in ProtoAssociatedFile deserialization for now.
2024-06-25 11:43:09 -04:00
Joey Hess
a23b0abf28
PUT to cluster send to all nodes rather than none
If the location log says all nodes contain content, pass in all nodes,
rather than none.

The location log can be wrong. While it's good to avoid unncessessary
connections to nodes that already contain a key, it would be bad to
refuse to accept an upload at all when the location log is wrong.

Also, passing in no nodes leaves the proxy in an untenable state. It
can't proxy to no nodes. So it closes the connection. Passing in all
nodes means it has to do the work to connect to all of them, and see
that they say they already have the content, and then it can tell the
client that.
2024-06-25 10:32:34 -04:00
Joey Hess
202ea3ff2a
don't sync with cluster nodes by default
Avoid `git-annex sync --content` etc from operating on cluster nodes by default
since syncing with a cluster implicitly syncs with its nodes. This avoids a
lot of unncessary work when a cluster has a lot of nodes just in checking
if each node's preferred content is satisfied. And it avoids content
being sent to nodes individually, so instead syncing with clusters always
fanout uploads to nodes.

The downside is that there are situations where a cluster's preferred content
settings can be met, but those of its nodes are not. Or where a node does not
contain a key, but the cluster does, and there are not enough copies of the key
yet, so it would be desirable the send it there. I think that's an acceptable
tradeoff. These kind of situations are ones where the cluster itself should
probably be responsible for copying content to the node. Which it can do much
less expensively than a client can. Part of the balanced preferred content
design that I will be working on in a couple of months involves rebalancing
clusters, so I expect to revisit this.

The use of annex-sync config does allow running git-annex sync with a specific
node, or nodes, and it will sync with it. And it's also possible to set
annex-sync git configs to make it sync with a node by default. (Although that
will require setting up an explicit git remote for the node rather than relying
on the proxied remote.)

Logs.Cluster.Basic is needed because Remote.Git cannot import Logs.Cluster
due to a cycle. And the Annex.Startup load of clusters happens
too late for Remote.Git to use that. This does mean one redundant load
of the cluster log, though only when there is a proxy.
2024-06-25 10:24:38 -04:00
Joey Hess
5b332a87be
dropping from clusters
Dropping from a cluster drops from every node of the cluster.
Including nodes that the cluster does not think have the content.
This is different from GET and CHECKPRESENT, which do trust the
cluster's location log. The difference is that removing from a cluster
should make 100% the content is gone from every node. So doing extra
work is ok. Compare with CHECKPRESENT where checking every node could
make it very expensive, and the worst that can happen in a false
negative is extra work being done.

Extended the P2P protocol with FAILURE-PLUS to handle the case where a
drop from one node succeeds, but a drop from another node fails. In that
case the entire cluster drop has failed.

Note that SUCCESS-PLUS is returned when dropping from a proxied remote
that is not a cluster, when the protocol version supports it. This is
because P2P.Proxy does not know when it's proxying for a single node
cluster vs for a remote that is not a cluster.
2024-06-23 09:43:40 -04:00
Joey Hess
f18740699e
P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS
Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS
is complete, when a PUT stores to additional repositories
than the expected on, the location log is updated with the
additional UUIDs that contain the content.

Started implementing PUT fanout to multiple remotes for clusters.
It is untested, and I fear fencepost errors in the relative
offset calculations. And it is missing proxying for the protocol
after DATA.
2024-06-18 16:21:40 -04:00
Joey Hess
fb0fd78485
only use a remote as a node when git configuration is set
Avoids someone writing to cluster.log and nominating remotes
of someone else's repository as a cluster.
2024-06-18 11:37:38 -04:00
Joey Hess
f049156a03
checkpresent support for clusters
This assumes that the proxy for a cluster has up-to-date location
logs. If it didn't, it might proxy the checkpresent to a node that no
longer has the content, while some other node still does, and so
it would incorrectly appear that the cluster no longer contains the
content.

Since cluster UUIDs are not stored to location logs,
git-annex fsck --fast when claiming to fix a location log when
that occurred would not cause any problems. And presumably the location
tracking would later get sorted out.

At least usually, changes to the content of nodes goes via the proxy,
and it will update its location logs, so they will be accurate. However,
if there were multiple proxies to the same cluster, or nodes were
accessed directly (or via proxy to the node and not the cluster),
the proxy's location log could certainly be wrong.

(The location log access for GET has the same issues.)
2024-06-18 11:16:16 -04:00
Joey Hess
88d9a02f7c
initial, working support for getting from clusters
Currently tends to put all the load on a single node, which will need to
be improved.
2024-06-18 11:01:10 -04:00
Joey Hess
f0d6114286
refactor cluster code into own module 2024-06-18 10:36:04 -04:00