Commit graph

101 commits

Author SHA1 Message Date
Joey Hess
a01426b713
avoid padding in servePut
This means that when the client sends a truncated data to indicate
invalidity, DATA is not passed the full expected data. That leaves the
P2P connection in a state where it cannot be reused. While so far, they
are not reused, they will be later when proxies are supported. So, have
to close the P2P connection in this situation.
2024-07-22 12:30:30 -04:00
Joey Hess
efa0efdc44
avoid padding in clientPut
Instead truncate when necessary to indicate invalid content was sent.
Very similar to how serveGet handles it.
2024-07-22 11:47:24 -04:00
Joey Hess
72d0769ca5
avoid padding content in serveGet
Always truncate instead. The padding risked something not noticing the
content was bad and getting a file that was corrupted in a novel way
with the padding "X" at the end. A truncated file is better.
2024-07-22 11:19:52 -04:00
Joey Hess
4826a3745d
servePut and clientPut implementation
Made the data-length header required even for v0. This simplifies the
implementation, and doesn't preclude extra verification being done for
v0.

The connectionWaitVar is an ugly hack. In servePut, nothing waits
on the waitvar, and I could not find a good way to make anything wait on
it.
2024-07-22 10:27:44 -04:00
Joey Hess
97a2d0e4fb
use worker pool in withLocalP2PConnections
This allows multiple clients to be handled at the same time.
2024-07-11 14:37:52 -04:00
Joey Hess
14e0f778b7
simplify 2024-07-11 11:50:44 -04:00
Joey Hess
2228d56db3
serveGet invalidation 2024-07-11 11:42:32 -04:00
Joey Hess
3b37b9e53f
fix serveGet hang
This came down to SendBytes waiting on the waitv. Nothing ever filled
it.

Only Annex.Proxy needs the waitv, and it handles filling it. So make it
optional.
2024-07-11 07:46:52 -04:00
Joey Hess
8cb1332407
update 2024-07-10 16:10:08 -04:00
Joey Hess
b5b3d8cde2
update 2024-07-09 14:30:50 -04:00
Joey Hess
751b8e0baf
implemented serveCheckPresent
Still need a way to run Proto though
2024-07-09 09:08:42 -04:00
Joey Hess
3f402a20a8
implement Locker 2024-07-08 21:00:10 -04:00
Joey Hess
838169ee86
status 2024-07-07 16:16:11 -04:00
Joey Hess
40306d3fcf
finalizing HTTP P2p protocol some more
Added v2-v0 endpoints. These are tedious, but will be needed in order to
use the HTTP protocol to proxy to repositories with older git-annex,
where git-annex-shell will be speaking an older version of the protocol.

Changed GET to use 422 when the content is not present. 404 is needed to
detect when a protocol version is not supported.
2024-07-05 15:34:58 -04:00
Joey Hess
0bfdc57d25
update 2024-07-04 15:18:06 -04:00
Joey Hess
f69661ab65
status 2024-07-03 17:04:12 -04:00
Joey Hess
6a95eb08ce
status 2024-07-03 15:01:34 -04:00
Joey Hess
badcb502a4
todo 2024-07-03 13:15:09 -04:00
Joey Hess
24d63e8c8e
update 2024-07-02 18:04:29 -04:00
Joey Hess
b2a24a1669
update 2024-07-02 16:16:37 -04:00
Joey Hess
fbc4d549f3
reorder 2024-07-01 11:44:54 -04:00
Joey Hess
8db30323b0
update 2024-07-01 11:38:29 -04:00
Joey Hess
1e1584d34b
toc 2024-07-01 11:37:12 -04:00
Joey Hess
d9e66f7754
update 2024-07-01 11:33:07 -04:00
Joey Hess
f58a5f577d
update 2024-07-01 11:29:04 -04:00
Joey Hess
fa5e7463eb
fix display when proxied GET yields ERROR
The error message is not displayed to the use, but this mirrors the
behavior when a regular get from a special remote fails. At least now
there is not a protocol error.
2024-07-01 11:19:02 -04:00
Joey Hess
dce3848ad8
avoid populating proxy's object file when storing on special remote
Now that storeKey can have a different object file passed to it, this
complication is not needed. This avoids a lot of strange situations,
and will also be needed if streaming is eventually supported.
2024-07-01 10:53:49 -04:00
Joey Hess
0dfdc9f951
dup stdio handles for P2P proxy
Special remotes might output to stdout, or read from stdin, which would
mess up the P2P protocol. So dup the handles to avoid any such problem.
2024-07-01 10:06:29 -04:00
Joey Hess
0e19c1c9fa
todo 2024-06-28 17:14:18 -04:00
Joey Hess
711a5166e2
PUT to proxied special remote working
Still needs some work.

The reason that the waitv is necessary is because without it,
runNet loops back around and reads the next protocol message. But it's
not finished reading the whole bytestring yet, and so it reads some part
of it.
2024-06-28 17:10:58 -04:00
Joey Hess
2e5af38f86
GET from proxied special remote
Working, but lots of room for improvement...

Without streaming, so there is a delay before download begins as the
file is retreived from the special remote.

And when resuming it retrieves the whole file from the special remote
*again*.

Also, if the special remote throws an exception, currently it
shows as "protocol error".
2024-06-28 15:44:48 -04:00
Joey Hess
5b1971e2f8
merged the proxy branch into master! 2024-06-27 15:44:11 -04:00
Joey Hess
85f4527d74
update 2024-06-27 15:28:10 -04:00
Joey Hess
20ef1262df
give proxied cluster nodes a higher cost than the cluster gateway
This makes eg git-annex get default to using the cluster rather than an
arbitrary node, which is better UI.

The actual cost of accessing a proxied node vs using the cluster is
basically the same. But using the cluster allows smarter load-balancing
to be done on the cluster.
2024-06-27 15:21:03 -04:00
Joey Hess
cf59d7f92c
GET and CHECKPRESENT amoung lowest cost cluster nodes
Before it was using a node that might have had a higher cost.

Also threw in a random selection from amoung the low cost nodes. Of
course this is a poor excuse for load balancing, but it's better than
nothing. Most of the time...
2024-06-27 14:36:55 -04:00
Joey Hess
dceb8dc776
update 2024-06-27 13:40:09 -04:00
Joey Hess
c9d63d74d8
remove viconfig item
it works when run on a client that has the cluster gateway as a remote,
just not when on the cluster gateway
2024-06-27 13:34:24 -04:00
Joey Hess
87a7eeac33
document various multi-gateway cluster considerations
Perhaps this will avoid me needing to eg, implement spanning tree
protocol. ;-)
2024-06-27 13:33:19 -04:00
Joey Hess
8e322f76bc
updates 2024-06-27 12:57:08 -04:00
Joey Hess
3dad9446ce
distributed cluster cycle prevention
Added BYPASS to P2P protocol, and use it to avoid cycling between
cluster gateways.

Distributed clusters are working well now!
2024-06-27 12:20:22 -04:00
Joey Hess
effaf51b1f
avoid loop between cluster gateways
The VIA extension is still needed to avoid some extra work and ugly
messages, but this is enough that it actually works.

This filters out the RemoteSides that are a proxied connection via a
remote gateway to the cluster.

The VIA extension will not filter those out, but will send VIA to them
on connect, which will cause the ones that are accessed via the listed
gateways to be filtered out.
2024-06-26 15:29:59 -04:00
Joey Hess
4172109c8d
support multi-gateway clusters
VIA extension still needed otherwise a copy to a cluster can loop
forever.
2024-06-26 15:07:03 -04:00
Joey Hess
07e899c9d3
git-annex-shell: proxy nodes located beyond remote cluster gateways
Walking a tightrope between security and convenience here, because
git-annex-shell needs to only proxy for things when there has been
an explicit, local action to configure them.

In this case, the user has to have run `git-annex extendcluster`,
which now sets annex-cluster-gateway on the remote.

Note that any repositories that the gateway is recorded to
proxy for will be proxied onward. This is not limited to cluster nodes,
because checking the node log would not add any security; someone could
add any uuid to it. The gateway of course then does its own
checking to determine if it will allow proxying for the remote.
2024-06-26 12:56:16 -04:00
Joey Hess
798d6f6a46
todo 2024-06-25 17:58:45 -04:00
Joey Hess
e3dd29409b
improve docs 2024-06-25 17:50:22 -04:00
Joey Hess
0a1001dbfb
update 2024-06-25 17:26:26 -04:00
Joey Hess
9a8dcb58cd
design for distributed clusters 2024-06-25 17:20:49 -04:00
Joey Hess
b9889917a3
thoughts on cycles
Rejected the idea of automatically instantiating remotes for proxies-of-proxies.
That needs cycle protection, while the current behavior, which happened
for free, is that running git-annex updateproxy on the proxy can be used
to configure it, but only for topologies that actually exist.
2024-06-25 15:32:11 -04:00
Joey Hess
cec2848e8a
support annex.jobs for clusters 2024-06-25 14:54:20 -04:00
Joey Hess
5ede109ae5
gave up on upload fanout to cluster's proxy
The problem with that idea is that the cluster's proxy is necessarily a
remote, and necessarily one that we'll want to sync with, since the git
repository is stored there. So when its preferred content wants a file,
and the cluster does too, the file will get uploaded to it as well as to
the cluster. With fanout, the upload to the cluster will populate the
proxy as well, avoiding a second upload. But only if the file is sent to
the cluster first. If it's sent to the proxy first, there will be two
uploads.

Another, lesser problem is that a repository can proxy for more than one
cluster. So when does it make sense to drop content from the repository?
It could be done when dropping from one cluster, but what of the other
one?

This complication was not necessary anyway. Instead, if it's desirable
to have some content accessed from close to the proxy, one of the
cluster nodes can just be put on the same filesystem as it. That will be
just as fast as storing the content on the proxy.
2024-06-25 13:35:12 -04:00