Commit graph

34502 commits

Author SHA1 Message Date
Joey Hess
74f81ebd04
Merge remote-tracking branch 'origin/httpproto' 2024-07-29 11:25:27 -04:00
stv0g
6352cebb92 Added a comment: importtree=yes Support 2024-07-29 06:50:01 +00:00
Joey Hess
cd89f91aa5
remove uuid from annex+http urls
Not needed it turns out.
2024-07-28 20:29:42 -04:00
Joey Hess
bc9cc79e85
set remote's annexUrl automatically
When the remote repository's git config file
has annex.url set to an annex+http url.
2024-07-28 20:13:41 -04:00
Joey Hess
c87cfe1e00
todo 2024-07-28 17:29:32 -04:00
Joey Hess
ccbdaf0448
documentation for p2phttp 2024-07-28 17:19:27 -04:00
Joey Hess
dfe65b92c8
avoid repeatedly parsing the proxy log 2024-07-28 16:04:20 -04:00
Joey Hess
2fdec6b4e1
update 2024-07-28 15:55:24 -04:00
Joey Hess
ddabc138ec
todo 2024-07-28 15:41:31 -04:00
Joey Hess
cdc4bd7443
fix hang in PUT of large file to a special remote node of a cluster over http 2024-07-28 15:34:59 -04:00
Joey Hess
66679c9bb4
remove temp file after upload to special remote 2024-07-28 14:36:45 -04:00
Joey Hess
9461793ffc
Merge remote-tracking branch 'origin/master' into httpproto 2024-07-28 14:24:15 -04:00
Joey Hess
ccd102cd19
update 2024-07-28 14:22:44 -04:00
Joey Hess
5e205f215d
clean shut down of cluster connection when PUT is interrupted
An interrupted `git-annex copy --to` a cluster via the http server,
when repeated, failed. The http server output "transfer already in
progress, or unable to take transfer lock". Apparently a second
connection was opened to the cluster, because the first connection
never got shut down.

Turned out the problem was that when proxying to a cluster, it would read a
short ByteString from the client, and send that to the nodes. But that left the
nodes warning more. Meanwhile, the proxy was expecting a SUCCESS/FAILURE
message from the nodes. So it didn't return, and so the cluster connection
stayed open.
2024-07-28 14:20:11 -04:00
Joey Hess
bdde6d829c
fix http proxying for a local git remote with a relative path
git-annex-shell expects an absolute path
2024-07-28 13:35:51 -04:00
Joey Hess
41667ad36b
found some bugs with clusters 2024-07-28 13:00:05 -04:00
Joey Hess
770aac97a7
share single BranchState amoung all threads
This fixes a problem when git-annex testremote is run against a cluster
accessed via the http server. Annex.Cluster uses the location log
to find nodes that contain a key when checking if the key is present or getting
it. Just after a key was stored to a cluster node, reading the location log
was not getting the UUID of that node.

Apparently the Annex action that wrote to the location log, and the one
that read from it were run with two different Annex states. The http server
does use several different Annex threads.

BranchState was part of the AnnexState, and so two threads could have
different BranchStates.

Moved BranchState to the AnnexRead, so all threads will see the common state.

This might possibly impact performance. If one thread is writing changes to the
branch, and another thread is reading from the branch, the writing thread will
now invalidate the BranchState's cache, which will cause the reading thread to
need to do extra work. But correctness is surely more important. If did is
found to have impacted performance, it could probably be dealt with by doing
smarter BranchState cache invalidation.

Another way this might impact performance is that the BranchState has a small
cache. If several threads were reading from the branch and relying on the value
they just read still being in the case, now a cache miss will be more likely.
Increasing the BranchState cache to the number of jobs might be a good
idea to amelorate that. But the cache is currently an innefficient list,
so making it large would need changes to the data types.

(Commit 4304f1b6ae dealt with a follow-on
effect of the bug fixed here.)
2024-07-28 12:30:27 -04:00
Joey Hess
fbbedae497
add --clusterjobs option and default to 1
The default of 1 is not ideal at all, but it avoids an accidental M*N
causing so much concurrency it becomes unusable.
2024-07-28 10:36:22 -04:00
Joey Hess
1259ad89b6
cluster support in http API server
Wired it up and it seems to basically work, although the test suite is
not fully passing.

Note that --jobs currently gets multiplied by the number of nodes in the
cluster, which is probably not good.
2024-07-28 10:17:29 -04:00
Joey Hess
0cdd418407
tested shutdown of connection to http proxied special remote
I had worried it might not work properly, but it does, the endv works.
2024-07-28 09:17:47 -04:00
Joey Hess
ef8f24f28c
fix PUT to http proxied special remote
It was hanging because it never sent FAILURE in the INVALID case.
And putoffset always triggers the INVALID case.
2024-07-28 09:14:42 -04:00
Joey Hess
0ea645944e
thoughts on exporttree 2024-07-27 19:59:54 -04:00
Joey Hess
1c0448e33c
update 2024-07-26 20:44:01 -04:00
Joey Hess
0fb86d2916
UNLOCKCONTENT is not a top-level request
proxyRequest was treating UNLOCKCONTENT as a separate request.
That made it possible for there to be two different connections to the
proxied remote, with LOCKCONTENT being sent to one, and UNLOCKCONTENT
to the other one. A protocol error.

git-annex testremote now passes against a http proxied remote.
2024-07-26 20:39:06 -04:00
Joey Hess
a3dab58be2
fix hang at end of PUT to proxied p2p http remote
sendExactly will now be sure to evaluate the whole lazy ByteString.

In this case, the lazy ByteString was exactly the right lenth.
But, it seems that L.take caused it to not actually be fully evaluated.

In servePut, this manifested as gather never being fully evaluated,
which caused the hang.

Very, very subtle, and horrible bug. Clearly the use of lazy ByteString
(or really just laziness) is at fault, and it would be very worth moving
to conduit or whatever to avoid this.
2024-07-26 19:50:15 -04:00
Joey Hess
b431201e1f
update 2024-07-26 17:15:09 -04:00
Joey Hess
d1faa13d6a
implement proxy connection pool
removeOldestProxyConnectionPool will be innefficient the larger the pool
is. A better data structure could be more efficient. Eg, make each value
in the pool include the timestamp of its oldest element, then the oldest
value can be found and modified, rather than rebuilding the whole Map.

But, for pools of a few hundred items, this should be fine. It's O(n*n log n)
or so.

Also, when more than 1 connection with the same pool key exists,
it's efficient even for larger pools, since removeOldestProxyConnectionPool
is not needed.

The default of 1 idle connection could perhaps be larger.. like the
number of jobs? Otoh, it seems good to ramp up and down the number of
connections, which does happen. With 1, there is at most one stale
connection, which might cause a request to fail.
2024-07-26 17:03:31 -04:00
yarikoptic
de90a2c5de initial report on keeping association with the remote 2024-07-26 20:01:23 +00:00
Joey Hess
ad025b8e5e
clean up protocol version for proxying
The proxy always checks the protocol version of a remote before talking
to it in a version-specific way, so the protocol version in the ProxyParams
is the client's protocol version. The remote will always be at the same or
an older protocol version than the client.

Note that in relayDATAFinish, when the client is at protocol version 0,
the remote must thus be as well, and that's why its version is not
checked in the case for that.

With that clarified, it's evident that, in P2P.Http.State, there's no
need to look at the proxied remote's protocol version at all.
2024-07-26 13:49:05 -04:00
Joey Hess
576ec6ed71
fix hang in GET from http p2p proxy
serverP2PConnection = proxyfromclientconn causes serveGet to
signalFullyConsumedByteString to it, which is what it's waiting for
2024-07-26 12:51:00 -04:00
Joey Hess
f052091558
update 2024-07-26 11:01:45 -04:00
Joey Hess
cc1da2d516
http p2p proxy is now largely working 2024-07-26 10:44:10 -04:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92
17f7912dd5 Added a comment 2024-07-26 11:42:49 +00:00
Joey Hess
96ad0ccc5b
wip 2024-07-25 15:39:57 -04:00
Joey Hess
b13c2407af
p2phttp drop supports checking proof timestamps
At this point the p2phttp implementation is fully complete!
2024-07-25 10:11:09 -04:00
Joey Hess
6a3f755bfa
add common parameters to generic get API
Honestly this was just done to make the documentation correct. There's
no point in using these parameters. And they're optional.
2024-07-24 20:55:58 -04:00
Joey Hess
f5624a69e3
expire lock after 10 minutes initially
Once keeplocked is called, the lock will expire at the end
of that call. But if keeplocked never gets called, this avoids
the lock persisting forever.
2024-07-24 14:25:40 -04:00
Joey Hess
97836aafba
Remote.Git lockContent works with annex+http urls 2024-07-24 13:42:57 -04:00
Joey Hess
9fa9678585
Remote.Git removeKey works with annex+http urls
Does not yet handle drop proof lock timestamp checking.
2024-07-24 12:33:26 -04:00
Joey Hess
fd3bdb2300
update 2024-07-24 12:19:53 -04:00
Joey Hess
0d81d1ee2f
update 2024-07-24 12:18:51 -04:00
Joey Hess
cfdb80cd05
progress meter for p2phttp storeKey 2024-07-24 12:14:56 -04:00
Joey Hess
0280e2dd5e
update 2024-07-24 11:13:37 -04:00
Joey Hess
10f2c23fd7
fix slowloris timeout in hashing resume of download of large file
Hash the data that is already present in the file before connecting to
the http server.
2024-07-24 11:03:59 -04:00
Joey Hess
7bd616e169
Remote.Git retrieveKeyFile works with annex+http urls
This includes a bugfix to serveGet, it hung at the end.
2024-07-24 10:28:44 -04:00
Joey Hess
b4d749cc91
Merge branch 'master' into httpproto 2024-07-23 21:17:06 -04:00
Joey Hess
f7404a64c0
Propagate --force to git-annex transferrer
And other child processes.
2024-07-23 21:16:56 -04:00
Joey Hess
7d4045277a
bug 2024-07-23 21:02:31 -04:00
Joey Hess
48657405c6
cache credentials for p2phttp in memory 2024-07-23 18:45:02 -04:00
Joey Hess
b89c784a9b
use git credential when p2phttp needs auth 2024-07-23 18:11:15 -04:00