Commit graph

34527 commits

Author SHA1 Message Date
Joey Hess
ffba57c9fc
cleanup comments on removed news post 2024-07-31 14:05:48 -04:00
Joey Hess
0403603483
add news item for git-annex 10.20240731 2024-07-31 14:05:11 -04:00
Joey Hess
f914ee61e3
analysis 2024-07-31 12:19:12 -04:00
Joey Hess
1e8208457f
pinged 2024-07-31 10:06:43 -04:00
Joey Hess
9e901d326d
comment 2024-07-31 10:04:08 -04:00
Spencer
6af48665e0 [Bug] Trust but Verify: RClone 2024-07-31 00:34:01 +00:00
Joey Hess
d52fd3cf83
update 2024-07-30 12:17:05 -04:00
Joey Hess
1500a9525d
todo 2024-07-30 11:58:44 -04:00
Joey Hess
1632beaf70
fix negative DATA when 1 node of a cluster has a partial transfer 2024-07-30 11:42:17 -04:00
Joey Hess
1560e0eee9
comment 2024-07-30 10:50:13 -04:00
Joey Hess
73703d1bef
close 2024-07-29 15:15:40 -04:00
Joey Hess
fcc052bed8
When proxying an upload to a special remote, verify the hash.
While usually uploading to a special remote does not verify the content,
the content in a repository is assumed to be valid, and there is no trust
boundary. But with a proxied special remote, there may be users who are
allowed to store objects, but are not really trusted.

Another way to look at this is it's the equivilant of git-annex-shell
checking the hash of received data, which it does (see StoreContent
implementation).
2024-07-29 13:40:51 -04:00
Joey Hess
380af6ac5f
update github badges
Seems the urls changed and the old ones will be falsely green forever.

Found new ones in readme at https://github.com/datalad/git-annex
2024-07-29 13:00:00 -04:00
Joey Hess
b4eb6e3ced
comment 2024-07-29 11:59:33 -04:00
Joey Hess
321e2adf66
don't think I ever implementned the 422 idea, it will 404 2024-07-29 11:49:40 -04:00
Joey Hess
d3f584fcdb
wording 2024-07-29 11:44:44 -04:00
Joey Hess
5f5c29fbe7
link 2024-07-29 11:43:30 -04:00
Joey Hess
f3b207a4b9
wording 2024-07-29 11:37:13 -04:00
Joey Hess
6068379e80
typo 2024-07-29 11:34:46 -04:00
Joey Hess
db66612b8f
Merge branch 'httpproto' 2024-07-29 11:33:39 -04:00
Joey Hess
74f81ebd04
Merge remote-tracking branch 'origin/httpproto' 2024-07-29 11:25:27 -04:00
Joey Hess
6f20085a60
update 2024-07-29 11:25:07 -04:00
Joey Hess
60b1c53df5
preparing to merge 2024-07-29 11:22:27 -04:00
Joey Hess
4f3ae96666
cleanly close proxy connection on interrupted PUT
An interrupted PUT to cluster that has a node that is a special remote
over http left open the connection to the cluster, so the next request
opens another one. So did an interrupted PUT directly to the proxied
special remote over http.

proxySpecialRemote was stuck waiting for all the DATA. Its connection
remained open so it kept waiting.

In servePut, checktooshort handles closing the P2P connection
when too short a data is received from PUT. But, checktooshort was only
called after the protoaction, which is what runs the proxy, which is
what was getting stuck. Modified it to run as a background thread,
which waits for the tooshortv to be written to, which gather always does
once it gets to the end of the data received from the http client.

That makes proxyConnection's releaseconn run once all data is received
from the http client. Made it close the connection handles before
waiting on the asyncworker thread. This lets proxySpecialRemote finish
processing any data from the handle, and then it will give up,
more or less cleanly, if it didn't receive enough data.

I say "more or less cleanly" because with both sides of the P2P
connection taken down, some protocol unhappyness results. Which can lead
to some ugly debug messages. But also can cause the asyncworker thread
to throw an exception. So made withP2PConnections not crash when it
receives an exception from releaseconn.

This did have a small change to the behavior of an interrupted PUT when
proxying to a regular remote. proxyConnection has a protoerrorhandler
that closes the proxy connection on a protocol error. But the proxy
connection is also closed by checktooshort when it closes the P2P
connection. Closing the same proxy connection twice is not a problem,
it just results in duplicated debug messages about it.
2024-07-29 10:37:19 -04:00
Joey Hess
c8e7231f48
add debugging of opening and closing connections to proxies 2024-07-29 09:52:26 -04:00
Joey Hess
7ac8d36f38
idea 2024-07-29 09:11:27 -04:00
stv0g
6352cebb92 Added a comment: importtree=yes Support 2024-07-29 06:50:01 +00:00
Joey Hess
cd89f91aa5
remove uuid from annex+http urls
Not needed it turns out.
2024-07-28 20:29:42 -04:00
Joey Hess
bc9cc79e85
set remote's annexUrl automatically
When the remote repository's git config file
has annex.url set to an annex+http url.
2024-07-28 20:13:41 -04:00
Joey Hess
c87cfe1e00
todo 2024-07-28 17:29:32 -04:00
Joey Hess
ccbdaf0448
documentation for p2phttp 2024-07-28 17:19:27 -04:00
Joey Hess
dfe65b92c8
avoid repeatedly parsing the proxy log 2024-07-28 16:04:20 -04:00
Joey Hess
2fdec6b4e1
update 2024-07-28 15:55:24 -04:00
Joey Hess
ddabc138ec
todo 2024-07-28 15:41:31 -04:00
Joey Hess
cdc4bd7443
fix hang in PUT of large file to a special remote node of a cluster over http 2024-07-28 15:34:59 -04:00
Joey Hess
66679c9bb4
remove temp file after upload to special remote 2024-07-28 14:36:45 -04:00
Joey Hess
9461793ffc
Merge remote-tracking branch 'origin/master' into httpproto 2024-07-28 14:24:15 -04:00
Joey Hess
ccd102cd19
update 2024-07-28 14:22:44 -04:00
Joey Hess
5e205f215d
clean shut down of cluster connection when PUT is interrupted
An interrupted `git-annex copy --to` a cluster via the http server,
when repeated, failed. The http server output "transfer already in
progress, or unable to take transfer lock". Apparently a second
connection was opened to the cluster, because the first connection
never got shut down.

Turned out the problem was that when proxying to a cluster, it would read a
short ByteString from the client, and send that to the nodes. But that left the
nodes warning more. Meanwhile, the proxy was expecting a SUCCESS/FAILURE
message from the nodes. So it didn't return, and so the cluster connection
stayed open.
2024-07-28 14:20:11 -04:00
Joey Hess
bdde6d829c
fix http proxying for a local git remote with a relative path
git-annex-shell expects an absolute path
2024-07-28 13:35:51 -04:00
Joey Hess
41667ad36b
found some bugs with clusters 2024-07-28 13:00:05 -04:00
Joey Hess
770aac97a7
share single BranchState amoung all threads
This fixes a problem when git-annex testremote is run against a cluster
accessed via the http server. Annex.Cluster uses the location log
to find nodes that contain a key when checking if the key is present or getting
it. Just after a key was stored to a cluster node, reading the location log
was not getting the UUID of that node.

Apparently the Annex action that wrote to the location log, and the one
that read from it were run with two different Annex states. The http server
does use several different Annex threads.

BranchState was part of the AnnexState, and so two threads could have
different BranchStates.

Moved BranchState to the AnnexRead, so all threads will see the common state.

This might possibly impact performance. If one thread is writing changes to the
branch, and another thread is reading from the branch, the writing thread will
now invalidate the BranchState's cache, which will cause the reading thread to
need to do extra work. But correctness is surely more important. If did is
found to have impacted performance, it could probably be dealt with by doing
smarter BranchState cache invalidation.

Another way this might impact performance is that the BranchState has a small
cache. If several threads were reading from the branch and relying on the value
they just read still being in the case, now a cache miss will be more likely.
Increasing the BranchState cache to the number of jobs might be a good
idea to amelorate that. But the cache is currently an innefficient list,
so making it large would need changes to the data types.

(Commit 4304f1b6ae dealt with a follow-on
effect of the bug fixed here.)
2024-07-28 12:30:27 -04:00
Joey Hess
fbbedae497
add --clusterjobs option and default to 1
The default of 1 is not ideal at all, but it avoids an accidental M*N
causing so much concurrency it becomes unusable.
2024-07-28 10:36:22 -04:00
Joey Hess
1259ad89b6
cluster support in http API server
Wired it up and it seems to basically work, although the test suite is
not fully passing.

Note that --jobs currently gets multiplied by the number of nodes in the
cluster, which is probably not good.
2024-07-28 10:17:29 -04:00
Joey Hess
0cdd418407
tested shutdown of connection to http proxied special remote
I had worried it might not work properly, but it does, the endv works.
2024-07-28 09:17:47 -04:00
Joey Hess
ef8f24f28c
fix PUT to http proxied special remote
It was hanging because it never sent FAILURE in the INVALID case.
And putoffset always triggers the INVALID case.
2024-07-28 09:14:42 -04:00
Joey Hess
0ea645944e
thoughts on exporttree 2024-07-27 19:59:54 -04:00
Joey Hess
1c0448e33c
update 2024-07-26 20:44:01 -04:00
Joey Hess
0fb86d2916
UNLOCKCONTENT is not a top-level request
proxyRequest was treating UNLOCKCONTENT as a separate request.
That made it possible for there to be two different connections to the
proxied remote, with LOCKCONTENT being sent to one, and UNLOCKCONTENT
to the other one. A protocol error.

git-annex testremote now passes against a http proxied remote.
2024-07-26 20:39:06 -04:00
Joey Hess
a3dab58be2
fix hang at end of PUT to proxied p2p http remote
sendExactly will now be sure to evaluate the whole lazy ByteString.

In this case, the lazy ByteString was exactly the right lenth.
But, it seems that L.take caused it to not actually be fully evaluated.

In servePut, this manifested as gather never being fully evaluated,
which caused the hang.

Very, very subtle, and horrible bug. Clearly the use of lazy ByteString
(or really just laziness) is at fault, and it would be very worth moving
to conduit or whatever to avoid this.
2024-07-26 19:50:15 -04:00