Commit graph

2111 commits

Author SHA1 Message Date
Joey Hess
dc11c5f493
final fix to windows build 2024-07-29 16:32:24 -04:00
Joey Hess
be8e1ab512
more fixes to windows build for content retention files
Will probably build successfully now. Still untested.
2024-07-29 15:14:12 -04:00
Joey Hess
647bff9770
more fixes to windows build for content retention files 2024-07-29 13:58:40 -04:00
Joey Hess
fcc052bed8
When proxying an upload to a special remote, verify the hash.
While usually uploading to a special remote does not verify the content,
the content in a repository is assumed to be valid, and there is no trust
boundary. But with a proxied special remote, there may be users who are
allowed to store objects, but are not really trusted.

Another way to look at this is it's the equivilant of git-annex-shell
checking the hash of received data, which it does (see StoreContent
implementation).
2024-07-29 13:40:51 -04:00
Joey Hess
f397296739
add missing do on windows 2024-07-29 12:54:52 -04:00
Joey Hess
0dc064a9ad
When proxying for a special remote, avoid unncessary hashing
Like the comment says, the client will do its own verification. But it was
calling verifyKeyContentPostRetrieval, which was hashing the file.
2024-07-29 11:18:03 -04:00
Joey Hess
dfe65b92c8
avoid repeatedly parsing the proxy log 2024-07-28 16:04:20 -04:00
Joey Hess
cdc4bd7443
fix hang in PUT of large file to a special remote node of a cluster over http 2024-07-28 15:34:59 -04:00
Joey Hess
18ed4e5b20
use closedv rather than separate endv
Doesn't fix any known problem, but this way if the connection does get
closed, it will notice.
2024-07-28 15:11:31 -04:00
Joey Hess
66679c9bb4
remove temp file after upload to special remote 2024-07-28 14:36:45 -04:00
Joey Hess
6722a61a21
clusters need enableInteractiveBranchAccess
As seen in commit 770aac97a7, a cluster
relies accurate location logs. If long-running processes are serving a
cluster, and one process puts a file, the other process needs to see
what nodes it was stored on when checking if the file is present.
2024-07-28 12:39:42 -04:00
Joey Hess
bd3d327d8a
smarter BranchState cache invalidation
Only invalidate a just-written file in the cache, not the whole cache.

This will avoid the possibly performance impact of cache invalidation
mentioned in commit 770aac97a7
2024-07-28 12:33:32 -04:00
Joey Hess
770aac97a7
share single BranchState amoung all threads
This fixes a problem when git-annex testremote is run against a cluster
accessed via the http server. Annex.Cluster uses the location log
to find nodes that contain a key when checking if the key is present or getting
it. Just after a key was stored to a cluster node, reading the location log
was not getting the UUID of that node.

Apparently the Annex action that wrote to the location log, and the one
that read from it were run with two different Annex states. The http server
does use several different Annex threads.

BranchState was part of the AnnexState, and so two threads could have
different BranchStates.

Moved BranchState to the AnnexRead, so all threads will see the common state.

This might possibly impact performance. If one thread is writing changes to the
branch, and another thread is reading from the branch, the writing thread will
now invalidate the BranchState's cache, which will cause the reading thread to
need to do extra work. But correctness is surely more important. If did is
found to have impacted performance, it could probably be dealt with by doing
smarter BranchState cache invalidation.

Another way this might impact performance is that the BranchState has a small
cache. If several threads were reading from the branch and relying on the value
they just read still being in the case, now a cache miss will be more likely.
Increasing the BranchState cache to the number of jobs might be a good
idea to amelorate that. But the cache is currently an innefficient list,
so making it large would need changes to the data types.

(Commit 4304f1b6ae dealt with a follow-on
effect of the bug fixed here.)
2024-07-28 12:30:27 -04:00
Joey Hess
fbbedae497
add --clusterjobs option and default to 1
The default of 1 is not ideal at all, but it avoids an accidental M*N
causing so much concurrency it becomes unusable.
2024-07-28 10:36:22 -04:00
Joey Hess
1259ad89b6
cluster support in http API server
Wired it up and it seems to basically work, although the test suite is
not fully passing.

Note that --jobs currently gets multiplied by the number of nodes in the
cluster, which is probably not good.
2024-07-28 10:17:29 -04:00
Joey Hess
8ec174408e
remove duplicate code 2024-07-28 09:35:09 -04:00
Joey Hess
ef8f24f28c
fix PUT to http proxied special remote
It was hanging because it never sent FAILURE in the INVALID case.
And putoffset always triggers the INVALID case.
2024-07-28 09:14:42 -04:00
Joey Hess
0fb86d2916
UNLOCKCONTENT is not a top-level request
proxyRequest was treating UNLOCKCONTENT as a separate request.
That made it possible for there to be two different connections to the
proxied remote, with LOCKCONTENT being sent to one, and UNLOCKCONTENT
to the other one. A protocol error.

git-annex testremote now passes against a http proxied remote.
2024-07-26 20:39:06 -04:00
Joey Hess
267a202e72
clean up after http p2p proxy GET is interrupted
There was an annex worker thread that did not get stopped.

It was stuck in ReceiveMessage from the P2PHandleTMVar.

Fixed by making P2PHandleTMVar closeable.

In serveGet, releaseP2PConnection has to come first, else the
annexworker may not shut down, if it's waiting to read from it.

In proxyConnection, call closeRemoteSide in order to wait for the ssh
process (for example).
2024-07-26 15:33:20 -04:00
Joey Hess
ad025b8e5e
clean up protocol version for proxying
The proxy always checks the protocol version of a remote before talking
to it in a version-specific way, so the protocol version in the ProxyParams
is the client's protocol version. The remote will always be at the same or
an older protocol version than the client.

Note that in relayDATAFinish, when the client is at protocol version 0,
the remote must thus be as well, and that's why its version is not
checked in the case for that.

With that clarified, it's evident that, in P2P.Http.State, there's no
need to look at the proxied remote's protocol version at all.
2024-07-26 13:49:05 -04:00
Joey Hess
cc1da2d516
http p2p proxy is now largely working 2024-07-26 10:44:10 -04:00
Joey Hess
b391756b32
remove some debugging 2024-07-25 21:36:10 -04:00
Joey Hess
6ef6ad808f
use a record to reduce the huge number of parameters 2024-07-25 15:23:18 -04:00
Joey Hess
3d14e2cf58
http server support for proxies, incomplete
Refactored git-annex-shell code so this can use checkCanProxy'.

At this point all that remains is opening a proxy connection,
and using a proxy connection.
2024-07-25 13:19:24 -04:00
Joey Hess
10f2c23fd7
fix slowloris timeout in hashing resume of download of large file
Hash the data that is already present in the file before connecting to
the http server.
2024-07-24 11:03:59 -04:00
Joey Hess
0594338a78
fix meter display on resume
It still needs to be offset, otherwise on resume from 80% it will
display 1%..20%.

Seems that this bug must have affected P2P.Annex as well where it runs
this code, but apparently it didn't affect it in a very user-visible
way. Maybe the transfer log file was updated incorrectly?
2024-07-24 10:28:48 -04:00
Joey Hess
7bd616e169
Remote.Git retrieveKeyFile works with annex+http urls
This includes a bugfix to serveGet, it hung at the end.
2024-07-24 10:28:44 -04:00
Joey Hess
a2d1844292
factor out resumeVerifyFromOffset 2024-07-24 09:29:44 -04:00
Joey Hess
b4d749cc91
Merge branch 'master' into httpproto 2024-07-23 21:17:06 -04:00
Joey Hess
f7404a64c0
Propagate --force to git-annex transferrer
And other child processes.
2024-07-23 21:16:56 -04:00
Joey Hess
4826a3745d
servePut and clientPut implementation
Made the data-length header required even for v0. This simplifies the
implementation, and doesn't preclude extra verification being done for
v0.

The connectionWaitVar is an ugly hack. In servePut, nothing waits
on the waitvar, and I could not find a good way to make anything wait on
it.
2024-07-22 10:27:44 -04:00
Joey Hess
74c6175795
fix serveGet early handle close
Needed that waitv after all..
2024-07-11 09:55:17 -04:00
Joey Hess
3b37b9e53f
fix serveGet hang
This came down to SendBytes waiting on the waitv. Nothing ever filled
it.

Only Annex.Proxy needs the waitv, and it handles filling it. So make it
optional.
2024-07-11 07:46:52 -04:00
Joey Hess
f9b7ce7224
add Annex worker pool to P2PHttp
This will be needed for get and store, since those need to run Annex
actions.

withLocalP2PConnections will also probably use it.
2024-07-10 12:19:47 -04:00
Joey Hess
f452bd448a
REMOVE-BEFORE and GETTIMESTAMP proxying
For clusters, the timestamps have to be translated, since each node can
have its own idea about what time it is. To translate a timestamp, the
proxy remembers what time it asked the node for a timestamp in
GETTIMESTAMP, and applies the delta as an offset in REMOVE-BEFORE.

This does mean that a remove from a cluster has to call GETTIMESTAMP on
every node before dropping from nodes. Not very efficient. Although
currently it tries to drop from every single node anyway, which is also
not very efficient.

I thought about caching the GETTIMESTAMP from the nodes on the first
call. That would improve efficiency. But, since monotonic clocks on
!Linux don't advance when the computer is suspended, consider what might
happen if one node was suspended for a while, then came back. Its
monotonic timestamp would end up behind where the proxying expects it to
be. Would that result in removing when it shouldn't, or refusing to
remove when it should? Have not thought it through. Either way, a
cluster behaving strangly for an extended period of time because one
of its nodes was briefly asleep doesn't seem like good behavior.
2024-07-04 15:09:34 -04:00
Joey Hess
99b7a0cfe9
use REMOVE-BEFORE in P2P protocol
Only clusters still need to be fixed to close this todo.
2024-07-04 13:47:38 -04:00
Joey Hess
1243af4a18
toward SafeDropProof expiry checking
Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.

Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.

Made Remote.Git check expiry when dropping from a local remote.

Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.

Fixing the remaining 2 build warnings should complete this work.

Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?

There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
2024-07-04 12:39:06 -04:00
Joey Hess
d2b27ca136
add content retention files
This allows lockContentShared to lock content for eg, 10 minutes and
if the process then gets terminated before it can unlock, the content
will remain locked for that amount of time.

The Windows implementation is not yet tested.

In P2P.Annex, a duration of 10 minutes is used. This way, when p2pstdio
or remotedaemon is serving the P2P protocol, and is asked to
LOCKCONTENT, and that process gets killed, the content will not be
subject to deletion. This is not a perfect solution to
doc/todo/P2P_locking_connection_drop_safety.mdwn yet, but it gets most
of the way there, without needing any P2P protocol changes.

This is only done in v10 and higher repositories (or on Windows). It
might be possible to backport it to v8 or earlier, but it would
complicate locking even further, and without a separate lock file, might
be hard. I think that by the time this fix reaches a given user, they
will probably have been running git-annex 10.x long enough that their v8
repositories will have upgraded to v10 after the 1 year wait. And it's
not as if git-annex hasn't already been subject to this problem (though
I have not heard of any data loss caused by it) for 6 years already, so
waiting another fraction of a year on top of however long it takes this
fix to reach users is unlikely to be a problem.
2024-07-03 14:58:39 -04:00
Joey Hess
2f2cc38c28
fix build on old ghc
getStdRandom used to be an IO action
2024-07-02 12:27:14 -04:00
Joey Hess
fa5e7463eb
fix display when proxied GET yields ERROR
The error message is not displayed to the use, but this mirrors the
behavior when a regular get from a special remote fails. At least now
there is not a protocol error.
2024-07-01 11:19:02 -04:00
Joey Hess
dce3848ad8
avoid populating proxy's object file when storing on special remote
Now that storeKey can have a different object file passed to it, this
complication is not needed. This avoids a lot of strange situations,
and will also be needed if streaming is eventually supported.
2024-07-01 10:53:49 -04:00
Joey Hess
8b5fc94d50
add optional object file location to storeKey
This will be used by the next commit to simplify the proxy.
2024-07-01 10:42:27 -04:00
Joey Hess
711a5166e2
PUT to proxied special remote working
Still needs some work.

The reason that the waitv is necessary is because without it,
runNet loops back around and reads the next protocol message. But it's
not finished reading the whole bytestring yet, and so it reads some part
of it.
2024-06-28 17:10:58 -04:00
Joey Hess
2e5af38f86
GET from proxied special remote
Working, but lots of room for improvement...

Without streaming, so there is a delay before download begins as the
file is retreived from the special remote.

And when resuming it retrieves the whole file from the special remote
*again*.

Also, if the special remote throws an exception, currently it
shows as "protocol error".
2024-06-28 15:44:48 -04:00
Joey Hess
158d7bc933
fix handling of ERROR in response to REMOVE
This allows an error message from a proxied special remote to be
displayed to the client.

In the case where removal from several nodes of a cluster fails,
there can be several errors. What to do? I decided to only show
the first error to the user. Probably in this case the user is not in a
position to do anything about an error message, so best keep it simple.
If the problem with the first node is fixed, they'll see the error from
the next node.
2024-06-28 14:10:25 -04:00
Joey Hess
a6ea057f6b
fix handling of ERROR in response to CHECKPRESENT
That error is now rethrown on the client, so it will be displayed.

For example:

$ git-annex fsck x --fast --from AMS-dir
fsck x (special remote reports: directory /home/joey/tmp/bench2/dir is not accessible) failed

No protocol version check is needed. Because in order to talk to a
proxied special remote, the client has to be running the upcoming
git-annex release. Which has this fix in it.
2024-06-28 13:46:27 -04:00
Joey Hess
d3c75c003a
proxying special remotes
This is early, but already working for CHECKPRESENT.

However, when the special remote throws an exception on checkPresent,
this happens:

[2024-06-28 13:22:18.520884287] (P2P.IO) [ThreadId 4] P2P > ERROR directory /home/joey/tmp/bench2/dir is not accessible
[2024-06-28 13:22:18.521053135] (P2P.IO) [ThreadId 4] P2P < ERROR expected SUCCESS or FAILURE
git-annex: client error: expected SUCCESS or FAILURE
(fixing location log) p2pstdio: 1 failed

  ** Based on the location log, x
  ** was expected to be present, but its content is missing.
failed
2024-06-28 13:31:19 -04:00
Joey Hess
62750f0102
shut down RemoteSides cleanly
Before it just exited without actually shutting down the RemoteSides,
when the client hung up.
2024-06-28 13:19:57 -04:00
Joey Hess
cf59d7f92c
GET and CHECKPRESENT amoung lowest cost cluster nodes
Before it was using a node that might have had a higher cost.

Also threw in a random selection from amoung the low cost nodes. Of
course this is a poor excuse for load balancing, but it's better than
nothing. Most of the time...
2024-06-27 14:36:55 -04:00
Joey Hess
dabd05e547
remove a TODO marker
I have a todo item for this outside the code
2024-06-27 13:36:04 -04:00