Commit graph

31 commits

Author SHA1 Message Date
Joey Hess
f452bd448a
REMOVE-BEFORE and GETTIMESTAMP proxying
For clusters, the timestamps have to be translated, since each node can
have its own idea about what time it is. To translate a timestamp, the
proxy remembers what time it asked the node for a timestamp in
GETTIMESTAMP, and applies the delta as an offset in REMOVE-BEFORE.

This does mean that a remove from a cluster has to call GETTIMESTAMP on
every node before dropping from nodes. Not very efficient. Although
currently it tries to drop from every single node anyway, which is also
not very efficient.

I thought about caching the GETTIMESTAMP from the nodes on the first
call. That would improve efficiency. But, since monotonic clocks on
!Linux don't advance when the computer is suspended, consider what might
happen if one node was suspended for a while, then came back. Its
monotonic timestamp would end up behind where the proxying expects it to
be. Would that result in removing when it shouldn't, or refusing to
remove when it should? Have not thought it through. Either way, a
cluster behaving strangly for an extended period of time because one
of its nodes was briefly asleep doesn't seem like good behavior.
2024-07-04 15:09:34 -04:00
Joey Hess
fa5e7463eb
fix display when proxied GET yields ERROR
The error message is not displayed to the use, but this mirrors the
behavior when a regular get from a special remote fails. At least now
there is not a protocol error.
2024-07-01 11:19:02 -04:00
Joey Hess
158d7bc933
fix handling of ERROR in response to REMOVE
This allows an error message from a proxied special remote to be
displayed to the client.

In the case where removal from several nodes of a cluster fails,
there can be several errors. What to do? I decided to only show
the first error to the user. Probably in this case the user is not in a
position to do anything about an error message, so best keep it simple.
If the problem with the first node is fixed, they'll see the error from
the next node.
2024-06-28 14:10:25 -04:00
Joey Hess
62750f0102
shut down RemoteSides cleanly
Before it just exited without actually shutting down the RemoteSides,
when the client hung up.
2024-06-28 13:19:57 -04:00
Joey Hess
cf59d7f92c
GET and CHECKPRESENT amoung lowest cost cluster nodes
Before it was using a node that might have had a higher cost.

Also threw in a random selection from amoung the low cost nodes. Of
course this is a poor excuse for load balancing, but it's better than
nothing. Most of the time...
2024-06-27 14:36:55 -04:00
Joey Hess
3dad9446ce
distributed cluster cycle prevention
Added BYPASS to P2P protocol, and use it to avoid cycling between
cluster gateways.

Distributed clusters are working well now!
2024-06-27 12:20:22 -04:00
Joey Hess
cec2848e8a
support annex.jobs for clusters 2024-06-25 14:54:20 -04:00
Joey Hess
818030e4d3
improve handling of cluster nodes disconnecting 2024-06-25 14:10:06 -04:00
Joey Hess
1bfe7f8a53
honor preferred content settings of cluster nodes
Except when no nodes want a file, it has to be stored somewhere, so
store it on all. Which is not really desirable, but neither is having to
pick one.

ProtoAssociatedFile deserialization is rather broken, and this could
possibly affect preferred content expressions that match on filenames.

The inability to roundtrip whitespace like tabs and newlines through is
not a problem because preferred content expressions can't be written
that match on whitespace such as a tab. For example:

joey@darkstar:~/tmp/bench/z>git-annex wanted  origin-node2 'exclude=*CTRL-VTab*'
wanted origin-node2
git-annex: Parse error: Parse failure: near "*"

But, the filtering of control characters could perhaps be a problem. I think
that filtering is now obsolete, git-annex has comprehensive filtering of
control characters when displaying filenames, that happens at a higher level.
However, I don't want to risk a security hole so am leaving in that filtering
in ProtoAssociatedFile deserialization for now.
2024-06-25 11:43:09 -04:00
Joey Hess
8a341cd195
fix comparison
With this a PUT to two remotes that have different partial amounts
transferred works reliably. I'm not sure though that it doesn't have
fencepost errors.
2024-06-23 16:01:58 -04:00
Joey Hess
466c972913
don't use SUCCESS-PLUS unncessarily
When dropping from a proxied remote that is not a cluster,
SUCCESS-PLUS is not needed, so don't use it.
2024-06-23 10:07:26 -04:00
Joey Hess
2762f9c4ce
fix location log update for copy to 1-node cluster 2024-06-23 09:53:33 -04:00
Joey Hess
5b332a87be
dropping from clusters
Dropping from a cluster drops from every node of the cluster.
Including nodes that the cluster does not think have the content.
This is different from GET and CHECKPRESENT, which do trust the
cluster's location log. The difference is that removing from a cluster
should make 100% the content is gone from every node. So doing extra
work is ok. Compare with CHECKPRESENT where checking every node could
make it very expensive, and the worst that can happen in a false
negative is extra work being done.

Extended the P2P protocol with FAILURE-PLUS to handle the case where a
drop from one node succeeds, but a drop from another node fails. In that
case the entire cluster drop has failed.

Note that SUCCESS-PLUS is returned when dropping from a proxied remote
that is not a cluster, when the protocol version supports it. This is
because P2P.Proxy does not know when it's proxying for a single node
cluster vs for a remote that is not a cluster.
2024-06-23 09:43:40 -04:00
Joey Hess
ecab2e03b9
working PUT fanout to multiple remotes for clusters
Still need to check for fencepost errors on resume when
different nodes have different amounts of data.
2024-06-20 10:04:26 -04:00
Joey Hess
f18740699e
P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS
Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS
is complete, when a PUT stores to additional repositories
than the expected on, the location log is updated with the
additional UUIDs that contain the content.

Started implementing PUT fanout to multiple remotes for clusters.
It is untested, and I fear fencepost errors in the relative
offset calculations. And it is missing proxying for the protocol
after DATA.
2024-06-18 16:21:40 -04:00
Joey Hess
f049156a03
checkpresent support for clusters
This assumes that the proxy for a cluster has up-to-date location
logs. If it didn't, it might proxy the checkpresent to a node that no
longer has the content, while some other node still does, and so
it would incorrectly appear that the cluster no longer contains the
content.

Since cluster UUIDs are not stored to location logs,
git-annex fsck --fast when claiming to fix a location log when
that occurred would not cause any problems. And presumably the location
tracking would later get sorted out.

At least usually, changes to the content of nodes goes via the proxy,
and it will update its location logs, so they will be accurate. However,
if there were multiple proxies to the same cluster, or nodes were
accessed directly (or via proxy to the node and not the cluster),
the proxy's location log could certainly be wrong.

(The location log access for GET has the same issues.)
2024-06-18 11:16:16 -04:00
Joey Hess
88d9a02f7c
initial, working support for getting from clusters
Currently tends to put all the load on a single node, which will need to
be improved.
2024-06-18 11:01:10 -04:00
Joey Hess
ef26470810
ProxySelector data type 2024-06-17 19:19:15 -04:00
Joey Hess
7a839a983a
preparing for cluster node selection
Support selecting what remote to proxy for each top-level P2P protocol
message.

This only needs to be extended now to support fanout to multiple
nodes for PUT and REMOVE, and with a remote that fails for
LOCKCONTENT and UNLOCKCONTENT.

But a good first step would be to implement CHECKPRESENT and GET for
clusters. Both should select a node that actually does have the content.
That will allow a cluster to work for GET even when location tracking is
out of date.
2024-06-17 15:51:10 -04:00
Joey Hess
291280ced2
started on git-annex-shell cluster support
Works down to P2P protocol.

The question now is, how to handle protocol version negotiation for
clusters? Connecting to each node to find their protocol versions and
using the lowest would be too expensive with a lot of nodes. So it seems
that the cluster needs to pick its own protocol version to use with the
client.

Then it can either negotiate that same version with the nodes when
it comes time to use them, or it can translate between multiple protocol
versions. That seems complicated. Thinking it would be ok to refuse to
use a node if it is not able to negotiate the same protocol version with
it as with the client. That will mean that sometimes need nodes to be
upgraded when upgrading the cluster's proxy. But protocol versions
rarely change.
2024-06-17 15:10:04 -04:00
Joey Hess
c7ad44e4d1
work toward supporting proxying to multiple remotes at once
For eg, upload fanout.

Delay connecting to a remote until it's needed. When there are many
proxied remotes, it would not do for the proxy to connect to each of
them on startup; that could take a long time.
2024-06-17 14:16:44 -04:00
Joey Hess
83a1db8d17
more specific type 2024-06-17 13:04:40 -04:00
Joey Hess
b72ccc6f0c
improve types 2024-06-17 12:44:08 -04:00
Joey Hess
dfdda95053
proxy updates location tracking information
This does mean a redundant write to the git-annex branch. But,
it means that two clients can be using the same proxy, and after
one sends a file to a proxied remote, the other only has to pull from
the proxy to learn about that. It does not need to pull from every
remote behind the proxy (which it couldn't do anyway as git repo
access is not currently proxied).

Anyway, the overhead of this in git-annex branch writes is no worse
than eg, sending a file to a repository where git-annex assistant
is running, which then sends the file on to a remote, and updates
the git-annex branch then. Indeed, when the assistant also drops
the local copy, that results in more writes to the git-annex branch.
2024-06-12 11:37:14 -04:00
Joey Hess
96853cd833
finish P2P protocol proxying
CONNECT is not supported by git-annex-shell p2pstdio, but for proxying
to tor-annex remotes, it will be supported, and will make a git pull/push
to a proxied remote work the same with that as it does over ssh,
eg it accesses the proxy's git repo not the proxied remote's git repo.

The p2p protocol docs say that NOTIFYCHANGES is not always supported,
and it looked annoying to implement it for this, and it also seems
pretty useless, so make it be a protocol error. git-annex remotedaemon
will already be getting change notifications from the proxy's git repo,
so there's no need to get additional redundant change notifications for
proxied remotes that would be for changes to the same git repo.
2024-06-12 10:40:51 -04:00
Joey Hess
6e1df33960
minimized code duplication due to type checker limitations 2024-06-11 17:16:49 -04:00
Joey Hess
5beaffb412
proxying PUT now working
The almost identical code duplication between relayDATA and relayDATA'
is very annoying. I tried quite a few things to parameterize them, but
the type checker is having fits when I try it.
2024-06-11 16:56:52 -04:00
Joey Hess
a2f4a8eddf
proxying GET now working
Memory use is small and constant; receiveBytes returns a lazy bytestring
and it does stream.

Comparing speed of a get of a 500 mb file over proxy from origin-origin,
vs from the same remote over a direct ssh:

joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from origin-origin
get bigfile (from origin-origin...)
ok
(recording state in git...)
1.89user 0.67system 0:10.79elapsed 23%CPU (0avgtext+0avgdata 68716maxresident)k
0inputs+984320outputs (0major+10779minor)pagefaults 0swaps

joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from direct-ssh
get bigfile (from direct-ssh...)
ok
1.79user 0.63system 0:10.49elapsed 23%CPU (0avgtext+0avgdata 65776maxresident)k
0inputs+1024312outputs (0major+9773minor)pagefaults 0swaps

So the proxy doesn't add much overhead even when run on the same machine as
the client and remote.

Still, piping receiveBytes into sendBytes like this does suggest that the proxy
could be made to use less CPU resouces by using `sendfile()`.
2024-06-11 15:09:43 -04:00
Joey Hess
58d8ba5a4f
implement simple proxy actions (untested)
Still need to implement GET and PUT, and will implement CONNECT and
NOTIFYCHANGE for completeness.

All ServerMode checking is implemented for the proxy.

There are two possible approaches for how the proxy sends back messages
from the remote to the client. One would be to have a background thread
that reads messages and sends them back as they come in. The other,
which is being implemented so far, is to read messages from the remote
at points where it is expected to send them, and relay back to the
client before reading the next message from the client. At this point,
I'm unsure which approach would be better.

The need for proxynoresponse to be used by UNLOCKCONTENT, for example,
builds protocol knowledge into the proxy which it would not need with
the other method.
2024-06-11 12:56:20 -04:00
Joey Hess
92c83a417f
refactoring 2024-06-11 10:22:05 -04:00
Joey Hess
501d65eeab
started implementing git-annex-shell proxy
So far, it negotiates VERSION with both parties. This is a tricky dance.

Untested.
2024-06-10 18:01:36 -04:00