When a uuid is not known, rescan for new repositories. Easy.
When a repository is removed, it will also get removed from the server
state on the next scan. But until a new uuid is seen, there will not be
a scan. This leaves the server trying to serve a uuid whose repository
is gone. That seems buggy. While getting just fails, dropping fails the
first time, but seems to leave the server in an unusable state, so the
next drop attempt hangs. The server is still able to serve other uuids,
only the one whose repository was removed has that problem.
--jobs is usually an Annex option setter, but --directory runs in IO, so
would not have that available. So instead moved the option parser into
the command's Options.
Untested, but it compiles, so.
Known problems:
* --jobs is not available to startIO
* Does not notice when new repositories are added to a directory.
* Does not notice when repositories are removed from a directory.
This fixes support for proxying after last commit broke it.
Note that withP2PConnections is called at server startup, and so only
proxies seen at that point will appear in the map and be used. It was
already the case that a proxy added after p2phttp was running would not
be served.
I think that is possibly a bug, but at least this commit doesn't
introduce the problem, though it might make it harder to fix it.
As bugs go, it's probably not a big deal, because after all,
git configs needs to be set in the local repository, followed by
git-annex updateproxy being run, to set up proxying. If someone is doing
that, they can restart their http server I suppose.
This is early groundwork for making p2phttp support serving multiple
repositories from a single daemon.
So far only 1 repository is served still. And this commit breaks support
for proxying!
When remote.name.annexUrl is an annex+http(s) url, that uses the same
hostname as remote.name.url, which is itself a http(s) url, they are
assumed to share a username and password.
This avoids unnecessary duplicate password prompts.
Apparently the protoerrhandler parameter never runs. Also the const
typo prevented the type checker from complaining that relayPUTRecord was
being called with 1 less parameter than needed.
So move the relayPUTRecord out of it.
But not yet when proxying to special remotes.
When proxying for a cluster, the client can store the object on any node
or nodes of the cluster, and send DATA-PRESENT. That gets proxied to each
node, and if any of them agree that they have the data, the proxy will
respond with SUCCESS or SUCCESS-PLUS.
I think it's ok to not check for the proxied remotes supporting protocol
version 4. When there are multiple remotes in a cluster, it behaves as
described above, and if they all respond with ERROR, the result will be
FAILURE. And when not proxying for a cluster, the proxy negotiates the
p2p protocol to be the same version or lower than the proxied remote,
which will prevent sending DATA-PRESENT when it's too old.
Changed the protocol docs because servant parses "true" and "false" for
booleans in query parameters, not "1" and "0".
clientPut with datapresent=True is not used by git-annex, and I don't
anticipate it being used in git-annex, except for testing.
I've tested this by making clientPut be called with datapresent=True and
git-annex copy to a remote succeeds once the object file is first
manually copied to the remote. That would be a good test for the test
suite, but running the http client means exposing it to at least
localhost, and would fail if a real http client was already running on
that port.
* p2phttp: Allow unauthenticated users to lock content by default.
* p2phttp: Added --unauth-nolocking option to prevent unauthenticated
users from locking content.
The rationalle for this is that locking is not really a write operation, so
makes sense to allow in a repository that only allows read-only access. Not
supporting locking in that situation will prevent the user from dropping
content from a special remote they control in cases where the other copy of
the content is on the p2phttp server.
Also, when p2phttp is configured to also allow authenticated access,
lockcontent was resulting in a password prompt for users who had no way to
authenticate. And there is no good way to distinguish between the two types
of users client side.
--unauth-nolocking anticipates that this might be abused, and seems better
than disabling unauthenticated access entirely if a server is being
attacked. It may be that rate limiting locking by IP address or similar
would be an effective measure in such a situation. Or just limiting the
number of locks by anonymous users that can be live at any one time. Since
the impact of such an DOS attempt is limited to preventing dropping content
from the server, it seems not a very appealing target anyway.
p2phttp: Support serving unauthenticated users while requesting
authentication for operations that need it. Eg, --unauth-readonly can be
combined with --authenv.
Drop locking currently needs authentication so it will prompt for that.
That still needs to be addressed somehow.
Each command that first checks preferred content (and/or required
content) and then does something that can change the sizes of
repositories needs to call prepareLiveUpdate, and plumb it through the
preferred content check and the location log update.
So far, only Command.Drop is done. Many other commands that don't need
to do this have been updated to keep working.
There may be some calls to NoLiveUpdate in places where that should be
done. All will need to be double checked.
Not currently in a compilable state.
When getting from a P2P HTTP remote, prompt for credentials when required,
instead of failing.
This feels like it might be a bug in servant-client. withClientM's type
suggests it would not throw a ClientError. But it does in this case.
4f3ae96666 caused a hang in GET,
which git-annex testremote could reliably cause.
The problem is that closing both P2P handles before waiting on the
asyncworker prevents all the DATA from getting sent.
The solution is to only close the P2P handles early when the
P2PConnection is being closed. When it's being released, let the
asyncworker finish. closeP2PConnection is called in GET when it was
unable to send all data, and in PUT when it did not receive all the
data, and in both cases closing the P2P handles early is ok.
An interrupted PUT to cluster that has a node that is a special remote
over http left open the connection to the cluster, so the next request
opens another one. So did an interrupted PUT directly to the proxied
special remote over http.
proxySpecialRemote was stuck waiting for all the DATA. Its connection
remained open so it kept waiting.
In servePut, checktooshort handles closing the P2P connection
when too short a data is received from PUT. But, checktooshort was only
called after the protoaction, which is what runs the proxy, which is
what was getting stuck. Modified it to run as a background thread,
which waits for the tooshortv to be written to, which gather always does
once it gets to the end of the data received from the http client.
That makes proxyConnection's releaseconn run once all data is received
from the http client. Made it close the connection handles before
waiting on the asyncworker thread. This lets proxySpecialRemote finish
processing any data from the handle, and then it will give up,
more or less cleanly, if it didn't receive enough data.
I say "more or less cleanly" because with both sides of the P2P
connection taken down, some protocol unhappyness results. Which can lead
to some ugly debug messages. But also can cause the asyncworker thread
to throw an exception. So made withP2PConnections not crash when it
receives an exception from releaseconn.
This did have a small change to the behavior of an interrupted PUT when
proxying to a regular remote. proxyConnection has a protoerrorhandler
that closes the proxy connection on a protocol error. But the proxy
connection is also closed by checktooshort when it closes the P2P
connection. Closing the same proxy connection twice is not a problem,
it just results in duplicated debug messages about it.
An interrupted `git-annex copy --to` a cluster via the http server,
when repeated, failed. The http server output "transfer already in
progress, or unable to take transfer lock". Apparently a second
connection was opened to the cluster, because the first connection
never got shut down.
Turned out the problem was that when proxying to a cluster, it would read a
short ByteString from the client, and send that to the nodes. But that left the
nodes warning more. Meanwhile, the proxy was expecting a SUCCESS/FAILURE
message from the nodes. So it didn't return, and so the cluster connection
stayed open.
As seen in commit 770aac97a7, a cluster
relies accurate location logs. If long-running processes are serving a
cluster, and one process puts a file, the other process needs to see
what nodes it was stored on when checking if the file is present.
Sending ERROR caused the client to get confused and protocol to freeze.
Better to send empty DATA and indicate it's not valid.
This fixes a hang in git-annex testremote of a cluster accessed via the
http server. That testremote is still failing, for some reason after
storing a test key, the cluster reports it as not present.
Wired it up and it seems to basically work, although the test suite is
not fully passing.
Note that --jobs currently gets multiplied by the number of nodes in the
cluster, which is probably not good.
proxyRequest was treating UNLOCKCONTENT as a separate request.
That made it possible for there to be two different connections to the
proxied remote, with LOCKCONTENT being sent to one, and UNLOCKCONTENT
to the other one. A protocol error.
git-annex testremote now passes against a http proxied remote.
sendExactly will now be sure to evaluate the whole lazy ByteString.
In this case, the lazy ByteString was exactly the right lenth.
But, it seems that L.take caused it to not actually be fully evaluated.
In servePut, this manifested as gather never being fully evaluated,
which caused the hang.
Very, very subtle, and horrible bug. Clearly the use of lazy ByteString
(or really just laziness) is at fault, and it would be very worth moving
to conduit or whatever to avoid this.
removeOldestProxyConnectionPool will be innefficient the larger the pool
is. A better data structure could be more efficient. Eg, make each value
in the pool include the timestamp of its oldest element, then the oldest
value can be found and modified, rather than rebuilding the whole Map.
But, for pools of a few hundred items, this should be fine. It's O(n*n log n)
or so.
Also, when more than 1 connection with the same pool key exists,
it's efficient even for larger pools, since removeOldestProxyConnectionPool
is not needed.
The default of 1 idle connection could perhaps be larger.. like the
number of jobs? Otoh, it seems good to ramp up and down the number of
connections, which does happen. With 1, there is at most one stale
connection, which might cause a request to fail.
There was an annex worker thread that did not get stopped.
It was stuck in ReceiveMessage from the P2PHandleTMVar.
Fixed by making P2PHandleTMVar closeable.
In serveGet, releaseP2PConnection has to come first, else the
annexworker may not shut down, if it's waiting to read from it.
In proxyConnection, call closeRemoteSide in order to wait for the ssh
process (for example).