Commit graph

45348 commits

Author SHA1 Message Date
Joey Hess
e73bbdd95c
enable servant in windows build 2024-07-29 17:24:31 -04:00
Joey Hess
1467fed572
fix build with old text
Don't need decodeUtf8Lenient here because B64.encode surely always
generates utf8. So decodeUtf8 is safe, it will never throw an exception.
2024-07-29 17:21:41 -04:00
Joey Hess
dc11c5f493
final fix to windows build 2024-07-29 16:32:24 -04:00
Joey Hess
73703d1bef
close 2024-07-29 15:15:40 -04:00
Joey Hess
be8e1ab512
more fixes to windows build for content retention files
Will probably build successfully now. Still untested.
2024-07-29 15:14:12 -04:00
Joey Hess
647bff9770
more fixes to windows build for content retention files 2024-07-29 13:58:40 -04:00
Joey Hess
fcc052bed8
When proxying an upload to a special remote, verify the hash.
While usually uploading to a special remote does not verify the content,
the content in a repository is assumed to be valid, and there is no trust
boundary. But with a proxied special remote, there may be users who are
allowed to store objects, but are not really trusted.

Another way to look at this is it's the equivilant of git-annex-shell
checking the hash of received data, which it does (see StoreContent
implementation).
2024-07-29 13:40:51 -04:00
Joey Hess
960daf210b
run with noMessages
This avoids extraneous output from p2phttp, including eg, progress
displays when transferring to proxied special remotes.
2024-07-29 13:35:08 -04:00
Joey Hess
65da672ae6
add libghc-servant-client-core-dev dep 2024-07-29 13:10:40 -04:00
Joey Hess
074fad819d
changelog 2024-07-29 13:09:19 -04:00
Joey Hess
380af6ac5f
update github badges
Seems the urls changed and the old ones will be falsely green forever.

Found new ones in readme at https://github.com/datalad/git-annex
2024-07-29 13:00:00 -04:00
Joey Hess
f397296739
add missing do on windows 2024-07-29 12:54:52 -04:00
Joey Hess
b4eb6e3ced
comment 2024-07-29 11:59:33 -04:00
Joey Hess
321e2adf66
don't think I ever implementned the 422 idea, it will 404 2024-07-29 11:49:40 -04:00
Joey Hess
d3f584fcdb
wording 2024-07-29 11:44:44 -04:00
Joey Hess
5f5c29fbe7
link 2024-07-29 11:43:30 -04:00
Joey Hess
f3b207a4b9
wording 2024-07-29 11:37:13 -04:00
Joey Hess
6068379e80
typo 2024-07-29 11:34:46 -04:00
Joey Hess
db66612b8f
Merge branch 'httpproto' 2024-07-29 11:33:39 -04:00
Joey Hess
74f81ebd04
Merge remote-tracking branch 'origin/httpproto' 2024-07-29 11:25:27 -04:00
Joey Hess
6f20085a60
update 2024-07-29 11:25:07 -04:00
Joey Hess
60b1c53df5
preparing to merge 2024-07-29 11:22:27 -04:00
Joey Hess
0dc064a9ad
When proxying for a special remote, avoid unncessary hashing
Like the comment says, the client will do its own verification. But it was
calling verifyKeyContentPostRetrieval, which was hashing the file.
2024-07-29 11:18:03 -04:00
Joey Hess
7402ae61d9
fix reversion in GET from proxy over http
4f3ae96666 caused a hang in GET,
which git-annex testremote could reliably cause.

The problem is that closing both P2P handles before waiting on the
asyncworker prevents all the DATA from getting sent.

The solution is to only close the P2P handles early when the
P2PConnection is being closed. When it's being released, let the
asyncworker finish. closeP2PConnection is called in GET when it was
unable to send all data, and in PUT when it did not receive all the
data, and in both cases closing the P2P handles early is ok.
2024-07-29 11:07:09 -04:00
Joey Hess
6af44b9de6
p2phttp remotes are not readonly
That prevented testremote from working when remote.name.url = http://..
2024-07-29 10:54:14 -04:00
Joey Hess
4f3ae96666
cleanly close proxy connection on interrupted PUT
An interrupted PUT to cluster that has a node that is a special remote
over http left open the connection to the cluster, so the next request
opens another one. So did an interrupted PUT directly to the proxied
special remote over http.

proxySpecialRemote was stuck waiting for all the DATA. Its connection
remained open so it kept waiting.

In servePut, checktooshort handles closing the P2P connection
when too short a data is received from PUT. But, checktooshort was only
called after the protoaction, which is what runs the proxy, which is
what was getting stuck. Modified it to run as a background thread,
which waits for the tooshortv to be written to, which gather always does
once it gets to the end of the data received from the http client.

That makes proxyConnection's releaseconn run once all data is received
from the http client. Made it close the connection handles before
waiting on the asyncworker thread. This lets proxySpecialRemote finish
processing any data from the handle, and then it will give up,
more or less cleanly, if it didn't receive enough data.

I say "more or less cleanly" because with both sides of the P2P
connection taken down, some protocol unhappyness results. Which can lead
to some ugly debug messages. But also can cause the asyncworker thread
to throw an exception. So made withP2PConnections not crash when it
receives an exception from releaseconn.

This did have a small change to the behavior of an interrupted PUT when
proxying to a regular remote. proxyConnection has a protoerrorhandler
that closes the proxy connection on a protocol error. But the proxy
connection is also closed by checktooshort when it closes the P2P
connection. Closing the same proxy connection twice is not a problem,
it just results in duplicated debug messages about it.
2024-07-29 10:37:19 -04:00
Joey Hess
c8e7231f48
add debugging of opening and closing connections to proxies 2024-07-29 09:52:26 -04:00
Joey Hess
7ac8d36f38
idea 2024-07-29 09:11:27 -04:00
stv0g
6352cebb92 Added a comment: importtree=yes Support 2024-07-29 06:50:01 +00:00
Joey Hess
5ef3f1e703
remove unused imports 2024-07-28 21:11:23 -04:00
Joey Hess
cd89f91aa5
remove uuid from annex+http urls
Not needed it turns out.
2024-07-28 20:29:42 -04:00
Joey Hess
bc9cc79e85
set remote's annexUrl automatically
When the remote repository's git config file
has annex.url set to an annex+http url.
2024-07-28 20:13:41 -04:00
Joey Hess
c87cfe1e00
todo 2024-07-28 17:29:32 -04:00
Joey Hess
ccbdaf0448
documentation for p2phttp 2024-07-28 17:19:27 -04:00
Joey Hess
dfe65b92c8
avoid repeatedly parsing the proxy log 2024-07-28 16:04:20 -04:00
Joey Hess
2fdec6b4e1
update 2024-07-28 15:55:24 -04:00
Joey Hess
ddabc138ec
todo 2024-07-28 15:41:31 -04:00
Joey Hess
cdc4bd7443
fix hang in PUT of large file to a special remote node of a cluster over http 2024-07-28 15:34:59 -04:00
Joey Hess
18ed4e5b20
use closedv rather than separate endv
Doesn't fix any known problem, but this way if the connection does get
closed, it will notice.
2024-07-28 15:11:31 -04:00
Joey Hess
66679c9bb4
remove temp file after upload to special remote 2024-07-28 14:36:45 -04:00
Joey Hess
9461793ffc
Merge remote-tracking branch 'origin/master' into httpproto 2024-07-28 14:24:15 -04:00
Joey Hess
ccd102cd19
update 2024-07-28 14:22:44 -04:00
Joey Hess
5e205f215d
clean shut down of cluster connection when PUT is interrupted
An interrupted `git-annex copy --to` a cluster via the http server,
when repeated, failed. The http server output "transfer already in
progress, or unable to take transfer lock". Apparently a second
connection was opened to the cluster, because the first connection
never got shut down.

Turned out the problem was that when proxying to a cluster, it would read a
short ByteString from the client, and send that to the nodes. But that left the
nodes warning more. Meanwhile, the proxy was expecting a SUCCESS/FAILURE
message from the nodes. So it didn't return, and so the cluster connection
stayed open.
2024-07-28 14:20:11 -04:00
Joey Hess
bdde6d829c
fix http proxying for a local git remote with a relative path
git-annex-shell expects an absolute path
2024-07-28 13:35:51 -04:00
Joey Hess
41667ad36b
found some bugs with clusters 2024-07-28 13:00:05 -04:00
Joey Hess
6722a61a21
clusters need enableInteractiveBranchAccess
As seen in commit 770aac97a7, a cluster
relies accurate location logs. If long-running processes are serving a
cluster, and one process puts a file, the other process needs to see
what nodes it was stored on when checking if the file is present.
2024-07-28 12:39:42 -04:00
Joey Hess
bd3d327d8a
smarter BranchState cache invalidation
Only invalidate a just-written file in the cache, not the whole cache.

This will avoid the possibly performance impact of cache invalidation
mentioned in commit 770aac97a7
2024-07-28 12:33:32 -04:00
Joey Hess
770aac97a7
share single BranchState amoung all threads
This fixes a problem when git-annex testremote is run against a cluster
accessed via the http server. Annex.Cluster uses the location log
to find nodes that contain a key when checking if the key is present or getting
it. Just after a key was stored to a cluster node, reading the location log
was not getting the UUID of that node.

Apparently the Annex action that wrote to the location log, and the one
that read from it were run with two different Annex states. The http server
does use several different Annex threads.

BranchState was part of the AnnexState, and so two threads could have
different BranchStates.

Moved BranchState to the AnnexRead, so all threads will see the common state.

This might possibly impact performance. If one thread is writing changes to the
branch, and another thread is reading from the branch, the writing thread will
now invalidate the BranchState's cache, which will cause the reading thread to
need to do extra work. But correctness is surely more important. If did is
found to have impacted performance, it could probably be dealt with by doing
smarter BranchState cache invalidation.

Another way this might impact performance is that the BranchState has a small
cache. If several threads were reading from the branch and relying on the value
they just read still being in the case, now a cache miss will be more likely.
Increasing the BranchState cache to the number of jobs might be a good
idea to amelorate that. But the cache is currently an innefficient list,
so making it large would need changes to the data types.

(Commit 4304f1b6ae dealt with a follow-on
effect of the bug fixed here.)
2024-07-28 12:30:27 -04:00
Joey Hess
4304f1b6ae
better handling of content not available from cluster
Sending ERROR caused the client to get confused and protocol to freeze.
Better to send empty DATA and indicate it's not valid.

This fixes a hang in git-annex testremote of a cluster accessed via the
http server. That testremote is still failing, for some reason after
storing a test key, the cluster reports it as not present.
2024-07-28 11:09:07 -04:00
Joey Hess
fbbedae497
add --clusterjobs option and default to 1
The default of 1 is not ideal at all, but it avoids an accidental M*N
causing so much concurrency it becomes unusable.
2024-07-28 10:36:22 -04:00