git-annex

Author	SHA1	Message	Date
Joey Hess	3a55f0eec3	use DataLength	2024-07-11 11:57:55 -04:00
Joey Hess	14e0f778b7	simplify	2024-07-11 11:50:44 -04:00
Joey Hess	2228d56db3	serveGet invalidation	2024-07-11 11:42:32 -04:00
Joey Hess	80d2ffc79a	implement generic get server	2024-07-11 11:26:03 -04:00
Joey Hess	a7383b5c59	move serveruuid into routes In particular the generic get route needs it, so that when a single http server is serving multiple repositories, it knows what repository to use.	2024-07-11 11:19:20 -04:00
Joey Hess	74c6175795	fix serveGet early handle close Needed that waitv after all..	2024-07-11 09:55:17 -04:00
Joey Hess	2c13e6c165	fix annexworker shutdown on early client disconnect	2024-07-11 09:15:52 -04:00
Joey Hess	6e3d35744d	cleanup	2024-07-11 07:56:55 -04:00
Joey Hess	3b37b9e53f	fix serveGet hang This came down to SendBytes waiting on the waitv. Nothing ever filled it. Only Annex.Proxy needs the waitv, and it handles filling it. So make it optional.	2024-07-11 07:46:52 -04:00
Joey Hess	80fb5445b5	a little progress on serveGet hang Now it gets to the validity checker, but it seems it never runs it.	2024-07-10 17:48:48 -04:00
Joey Hess	1e0f92a5a1	implemented serveGet and clientGet Both are only at bare proof of concept stage. Still need to deal with signaling validity and invalidity, and checking it. And there's a bad bug: After -JN*2 requests, another request hangs! So, I think it's failing to free up the Annex worker and end of request lifetime. Perhaps I need to use this: https://docs.servant.dev/en/stable/cookbook/managed-resource/ManagedResource.html	2024-07-10 16:06:39 -04:00
Joey Hess	f9b7ce7224	add Annex worker pool to P2PHttp This will be needed for get and store, since those need to run Annex actions. withLocalP2PConnections will also probably use it.	2024-07-10 12:19:47 -04:00
Joey Hess	d4b9aea87b	implement gettimestamp	2024-07-10 10:23:10 -04:00
Joey Hess	7c588a5791	implement remove-before The reason to use removeBeforeRemoteEndTime is twofold. First, removeBefore sends two protocol commands. Currently, the HTTP protocol runner only supports sending a single command per invocation. Secondly, the http server gets a monotonic timestamp from the client. So translating back to a POSIXTime would be annoying. The timestamp flow with a proxy will be: - client gets timestamp, which gets the monotonic timestamp from the proxied remote via the proxy. The timestamp is currently not proxied when there is a single proxy. - client calls remove-before - http server calls removeBeforeRemoteEndTime which sends REMOVE-BEFORE to the proxied remote.	2024-07-10 10:03:26 -04:00
Joey Hess	e9cba0a580	Revert "proxy local timestamps in single proxy case as well as cluster case" Turns out not to be necessary. I think. This reverts commit `81e11efda1`.	2024-07-10 09:45:23 -04:00
Joey Hess	81e11efda1	proxy local timestamps in single proxy case as well as cluster case This is less efficient, but it guarantees that the timestamps that the client is sending are local timestamps, which turns out to be necessary for the HTTP PTP protocol server.	2024-07-10 09:40:13 -04:00
Joey Hess	518248f009	remove debug print	2024-07-10 09:37:43 -04:00
Joey Hess	b8a26712c6	implement clientRemove Tested removal.	2024-07-10 09:20:13 -04:00
Joey Hess	48f76cb3e8	implement serveRemove and send WWW-Authenticate header on auth failure	2024-07-10 09:13:01 -04:00
Joey Hess	97d0fc9b65	git-annex p2phttp options	2024-07-10 00:01:55 -04:00
Joey Hess	6a8a4d1775	authentication is implemented just need to make Command.P2PHttp generate a GetServerMode from options	2024-07-09 20:54:47 -04:00
Joey Hess	e5bf49b879	http basic authorization header parsing Sadly servant does not expose this though it also implements it.	2024-07-09 20:07:53 -04:00
Joey Hess	08371c3745	started on auth	2024-07-09 17:30:55 -04:00
Joey Hess	dcd77ee555	fix p2phttp server to not get stuck Process 1 command, then stop. Hopefully each of the Handlers will only need 1 command.	2024-07-09 14:26:30 -04:00
Joey Hess	3d13521479	set up handles for p2phttp Now it fully works.. for the first request. But then it gets stuck waiting for the P2P protocol runner to shut down.	2024-07-09 13:50:42 -04:00
Joey Hess	edf8a3df2d	p2phttp is almost working for checkpresent The server is fully running annex actions, only the P2PConnection is wrong, currently using stdio.	2024-07-09 13:37:55 -04:00
Joey Hess	a3dd8b4bcb	capture API version in routes Needed so the client can send it.	2024-07-09 12:04:29 -04:00
Joey Hess	751b8e0baf	implemented serveCheckPresent Still need a way to run Proto though	2024-07-09 09:08:42 -04:00
Joey Hess	9a592f946f	split module	2024-07-08 21:12:23 -04:00
Joey Hess	3f402a20a8	implement Locker	2024-07-08 21:00:10 -04:00
Joey Hess	b758b01692	add lockids to http p2p protocol	2024-07-08 20:18:55 -04:00
Joey Hess	58031455dc	add lock map	2024-07-08 14:20:30 -04:00
Joey Hess	0bdee626ad	thread in a state	2024-07-08 14:00:23 -04:00
Joey Hess	69c4f07ab0	finish get API	2024-07-08 13:27:50 -04:00
Joey Hess	82d66ede5e	convert lockcontent api to http long polling Websockets would work, but the problem with using them for this is that each lockcontent call is a separate websocket connection. And that's an actual TCP connection. One TCP connection per file dropped would be too expensive. With http long polling, regular http pipelining can be used, so it will reuse a TCP connection. Unfortunately, at least with servant, bi-directional streams with long polling don't result in true bidirectional full duplex communication. Servant processes the whole client body stream before generating the server body stream. I think it's entirely possible to do full bi-directional communication over http, but it would need changes to servant. And, there's no way for the client to tell if the server successfully locked the content, since the server will keep processing the client stream no matter what.: So, added a new api endpoint, keeplocked. lockcontent will lock the key for 10 minutes with retention lock, and then a call to keeplocked will keep it locked for as long as needed. This does mean that there will need to be a Map of locks by key, and I will probably want to add some kind of lock identifier that lockcontent returns.	2024-07-08 12:57:46 -04:00
Joey Hess	522700d1c4	implemented servant-client support for websockets	2024-07-08 07:44:59 -04:00
Joey Hess	392b15d5c3	may have found a way to make a request for a websocket?! dunno, it compiles anyway	2024-07-07 21:51:30 -04:00
Joey Hess	9ee005e49a	dummy HasClient ClientM WebSocket Enough to let lockcontent routes be included and servant-client be used. But not enough to use servant-client with those routes. May need to implement a separate runner for that part of the protocol? Also some misc other stuff needed to use servant-client. And fix exposing of UUID in the JSON types. UUID does actually have aeson instances, but they're used elsewhere (metadata --batch, although only included to get it to compile, not actually used in there) and not suitable for use here since this must work with every possible UUID.	2024-07-07 21:21:45 -04:00
Joey Hess	bfa8c39adb	servant client mostly implemented lockcontent had to be disabled until I can implement HasClient ClientM WebSocket and in clientGet, it's not clear how to use the v1 and v0 versions, which don't have a DataLengthHeader	2024-07-07 16:08:05 -04:00
Joey Hess	9a726cedf6	servant server now compiling Just need to fill in some undefined	2024-07-07 14:48:20 -04:00
Joey Hess	1dbb5ec70d	servant API type is complete	2024-07-07 12:59:12 -04:00
Joey Hess	f452bd448a	REMOVE-BEFORE and GETTIMESTAMP proxying For clusters, the timestamps have to be translated, since each node can have its own idea about what time it is. To translate a timestamp, the proxy remembers what time it asked the node for a timestamp in GETTIMESTAMP, and applies the delta as an offset in REMOVE-BEFORE. This does mean that a remove from a cluster has to call GETTIMESTAMP on every node before dropping from nodes. Not very efficient. Although currently it tries to drop from every single node anyway, which is also not very efficient. I thought about caching the GETTIMESTAMP from the nodes on the first call. That would improve efficiency. But, since monotonic clocks on !Linux don't advance when the computer is suspended, consider what might happen if one node was suspended for a while, then came back. Its monotonic timestamp would end up behind where the proxying expects it to be. Would that result in removing when it shouldn't, or refusing to remove when it should? Have not thought it through. Either way, a cluster behaving strangly for an extended period of time because one of its nodes was briefly asleep doesn't seem like good behavior.	2024-07-04 15:09:34 -04:00
Joey Hess	99b7a0cfe9	use REMOVE-BEFORE in P2P protocol Only clusters still need to be fixed to close this todo.	2024-07-04 13:47:38 -04:00
Joey Hess	1243af4a18	toward SafeDropProof expiry checking Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is based on a LockedCopy. If there are several LockedCopies, it uses the closest expiry time. That is not optimal, it may be that the proof expires based on one LockedCopy but another one has not expired. But that seems unlikely to really happen, and anyway the user can just re-run a drop if it fails due to expiry. Pass the SafeDropProof to removeKey, which is responsible for checking it for expiry in situations where that could be a problem. Which really only means in Remote.Git. Made Remote.Git check expiry when dropping from a local remote. Checking expiry when dropping from a P2P remote is not yet implemented. P2P.Protocol.remove has SafeDropProof plumbed through to it for that purpose. Fixing the remaining 2 build warnings should complete this work. Note that the use of a POSIXTime here means that if the clock gets set forward while git-annex is in the middle of a drop, it may say that dropping took too long. That seems ok. Less ok is that if the clock gets turned back a sufficient amount (eg 5 minutes), proof expiry won't be noticed. It might be better to use the Monotonic clock, but that doesn't advance when a laptop is suspended, and while there is the linux Boottime clock, that is not available on other systems. Perhaps a combination of POSIXTime and the Monotonic clock could detect laptop suspension and also detect clock being turned back? There is a potential future flag day where p2pDefaultLockContentRetentionDuration is not assumed, but is probed using the P2P protocol, and peers that don't support it can no longer produce a LockedCopy. Until that happens, when git-annex is communicating with older peers there is a risk of data loss when a ssh connection closes during LOCKCONTENT.	2024-07-04 12:39:06 -04:00
Joey Hess	5b6150e5d5	factor out Utility.MonotonicClock	2024-07-03 17:54:01 -04:00
Joey Hess	543c610a31	REMOVE-BEFORE and GETTIMESTAMP Only implemented server side, not used client side yet. And not yet implemented for proxies/clusters, for which there's a build warning about unhandled cases. This is P2P protocol version 3. Probably will be the only change in that version.. Added a dependency on clock to access a monotonic clock. On i386-ancient, that is at version 0.2.0.0.	2024-07-03 17:01:58 -04:00
Joey Hess	d2b27ca136	add content retention files This allows lockContentShared to lock content for eg, 10 minutes and if the process then gets terminated before it can unlock, the content will remain locked for that amount of time. The Windows implementation is not yet tested. In P2P.Annex, a duration of 10 minutes is used. This way, when p2pstdio or remotedaemon is serving the P2P protocol, and is asked to LOCKCONTENT, and that process gets killed, the content will not be subject to deletion. This is not a perfect solution to doc/todo/P2P_locking_connection_drop_safety.mdwn yet, but it gets most of the way there, without needing any P2P protocol changes. This is only done in v10 and higher repositories (or on Windows). It might be possible to backport it to v8 or earlier, but it would complicate locking even further, and without a separate lock file, might be hard. I think that by the time this fix reaches a given user, they will probably have been running git-annex 10.x long enough that their v8 repositories will have upgraded to v10 after the 1 year wait. And it's not as if git-annex hasn't already been subject to this problem (though I have not heard of any data loss caused by it) for 6 years already, so waiting another fraction of a year on top of however long it takes this fix to reach users is unlikely to be a problem.	2024-07-03 14:58:39 -04:00
Joey Hess	fa5e7463eb	fix display when proxied GET yields ERROR The error message is not displayed to the use, but this mirrors the behavior when a regular get from a special remote fails. At least now there is not a protocol error.	2024-07-01 11:19:02 -04:00
Joey Hess	8b5fc94d50	add optional object file location to storeKey This will be used by the next commit to simplify the proxy.	2024-07-01 10:42:27 -04:00
Joey Hess	0dfdc9f951	dup stdio handles for P2P proxy Special remotes might output to stdout, or read from stdin, which would mess up the P2P protocol. So dup the handles to avoid any such problem.	2024-07-01 10:06:29 -04:00
Joey Hess	711a5166e2	PUT to proxied special remote working Still needs some work. The reason that the waitv is necessary is because without it, runNet loops back around and reads the next protocol message. But it's not finished reading the whole bytestring yet, and so it reads some part of it.	2024-06-28 17:10:58 -04:00
Joey Hess	2e5af38f86	GET from proxied special remote Working, but lots of room for improvement... Without streaming, so there is a delay before download begins as the file is retreived from the special remote. And when resuming it retrieves the whole file from the special remote again. Also, if the special remote throws an exception, currently it shows as "protocol error".	2024-06-28 15:44:48 -04:00
Joey Hess	158d7bc933	fix handling of ERROR in response to REMOVE This allows an error message from a proxied special remote to be displayed to the client. In the case where removal from several nodes of a cluster fails, there can be several errors. What to do? I decided to only show the first error to the user. Probably in this case the user is not in a position to do anything about an error message, so best keep it simple. If the problem with the first node is fixed, they'll see the error from the next node.	2024-06-28 14:10:25 -04:00
Joey Hess	a6ea057f6b	fix handling of ERROR in response to CHECKPRESENT That error is now rethrown on the client, so it will be displayed. For example: $ git-annex fsck x --fast --from AMS-dir fsck x (special remote reports: directory /home/joey/tmp/bench2/dir is not accessible) failed No protocol version check is needed. Because in order to talk to a proxied special remote, the client has to be running the upcoming git-annex release. Which has this fix in it.	2024-06-28 13:46:27 -04:00
Joey Hess	62750f0102	shut down RemoteSides cleanly Before it just exited without actually shutting down the RemoteSides, when the client hung up.	2024-06-28 13:19:57 -04:00
Joey Hess	c3a785204e	support a P2PConnection that uses TMVars rather than Handles This will allow having an internal thread speaking P2P protocol, which will be needed to support proxying to external special remotes. No serialization is done on the internal P2P protocol of course. When a ByteString is being exchanged, it may or may not be exactly the length indicated by DATA. While that has to be carefully managed for the serialized P2P protocol, here it would require buffering the whole lazy bytestring in memory to check its length when sending, so it's better to do length checks on the receiving side.	2024-06-28 11:22:29 -04:00
Joey Hess	cf59d7f92c	GET and CHECKPRESENT amoung lowest cost cluster nodes Before it was using a node that might have had a higher cost. Also threw in a random selection from amoung the low cost nodes. Of course this is a poor excuse for load balancing, but it's better than nothing. Most of the time...	2024-06-27 14:36:55 -04:00
Joey Hess	3dad9446ce	distributed cluster cycle prevention Added BYPASS to P2P protocol, and use it to avoid cycling between cluster gateways. Distributed clusters are working well now!	2024-06-27 12:20:22 -04:00
Joey Hess	cec2848e8a	support annex.jobs for clusters	2024-06-25 14:54:20 -04:00
Joey Hess	818030e4d3	improve handling of cluster nodes disconnecting	2024-06-25 14:10:06 -04:00
Joey Hess	1bfe7f8a53	honor preferred content settings of cluster nodes Except when no nodes want a file, it has to be stored somewhere, so store it on all. Which is not really desirable, but neither is having to pick one. ProtoAssociatedFile deserialization is rather broken, and this could possibly affect preferred content expressions that match on filenames. The inability to roundtrip whitespace like tabs and newlines through is not a problem because preferred content expressions can't be written that match on whitespace such as a tab. For example: joey@darkstar:~/tmp/bench/z>git-annex wanted origin-node2 'exclude=CTRL-VTab' wanted origin-node2 git-annex: Parse error: Parse failure: near "*" But, the filtering of control characters could perhaps be a problem. I think that filtering is now obsolete, git-annex has comprehensive filtering of control characters when displaying filenames, that happens at a higher level. However, I don't want to risk a security hole so am leaving in that filtering in ProtoAssociatedFile deserialization for now.	2024-06-25 11:43:09 -04:00
Joey Hess	8a341cd195	fix comparison With this a PUT to two remotes that have different partial amounts transferred works reliably. I'm not sure though that it doesn't have fencepost errors.	2024-06-23 16:01:58 -04:00
Joey Hess	466c972913	don't use SUCCESS-PLUS unncessarily When dropping from a proxied remote that is not a cluster, SUCCESS-PLUS is not needed, so don't use it.	2024-06-23 10:07:26 -04:00
Joey Hess	2762f9c4ce	fix location log update for copy to 1-node cluster	2024-06-23 09:53:33 -04:00
Joey Hess	5b332a87be	dropping from clusters Dropping from a cluster drops from every node of the cluster. Including nodes that the cluster does not think have the content. This is different from GET and CHECKPRESENT, which do trust the cluster's location log. The difference is that removing from a cluster should make 100% the content is gone from every node. So doing extra work is ok. Compare with CHECKPRESENT where checking every node could make it very expensive, and the worst that can happen in a false negative is extra work being done. Extended the P2P protocol with FAILURE-PLUS to handle the case where a drop from one node succeeds, but a drop from another node fails. In that case the entire cluster drop has failed. Note that SUCCESS-PLUS is returned when dropping from a proxied remote that is not a cluster, when the protocol version supports it. This is because P2P.Proxy does not know when it's proxying for a single node cluster vs for a remote that is not a cluster.	2024-06-23 09:43:40 -04:00
Joey Hess	ecab2e03b9	working PUT fanout to multiple remotes for clusters Still need to check for fencepost errors on resume when different nodes have different amounts of data.	2024-06-20 10:04:26 -04:00
Joey Hess	f18740699e	P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS is complete, when a PUT stores to additional repositories than the expected on, the location log is updated with the additional UUIDs that contain the content. Started implementing PUT fanout to multiple remotes for clusters. It is untested, and I fear fencepost errors in the relative offset calculations. And it is missing proxying for the protocol after DATA.	2024-06-18 16:21:40 -04:00
Joey Hess	f049156a03	checkpresent support for clusters This assumes that the proxy for a cluster has up-to-date location logs. If it didn't, it might proxy the checkpresent to a node that no longer has the content, while some other node still does, and so it would incorrectly appear that the cluster no longer contains the content. Since cluster UUIDs are not stored to location logs, git-annex fsck --fast when claiming to fix a location log when that occurred would not cause any problems. And presumably the location tracking would later get sorted out. At least usually, changes to the content of nodes goes via the proxy, and it will update its location logs, so they will be accurate. However, if there were multiple proxies to the same cluster, or nodes were accessed directly (or via proxy to the node and not the cluster), the proxy's location log could certainly be wrong. (The location log access for GET has the same issues.)	2024-06-18 11:16:16 -04:00
Joey Hess	88d9a02f7c	initial, working support for getting from clusters Currently tends to put all the load on a single node, which will need to be improved.	2024-06-18 11:01:10 -04:00
Joey Hess	ef26470810	ProxySelector data type	2024-06-17 19:19:15 -04:00
Joey Hess	7a839a983a	preparing for cluster node selection Support selecting what remote to proxy for each top-level P2P protocol message. This only needs to be extended now to support fanout to multiple nodes for PUT and REMOVE, and with a remote that fails for LOCKCONTENT and UNLOCKCONTENT. But a good first step would be to implement CHECKPRESENT and GET for clusters. Both should select a node that actually does have the content. That will allow a cluster to work for GET even when location tracking is out of date.	2024-06-17 15:51:10 -04:00
Joey Hess	291280ced2	started on git-annex-shell cluster support Works down to P2P protocol. The question now is, how to handle protocol version negotiation for clusters? Connecting to each node to find their protocol versions and using the lowest would be too expensive with a lot of nodes. So it seems that the cluster needs to pick its own protocol version to use with the client. Then it can either negotiate that same version with the nodes when it comes time to use them, or it can translate between multiple protocol versions. That seems complicated. Thinking it would be ok to refuse to use a node if it is not able to negotiate the same protocol version with it as with the client. That will mean that sometimes need nodes to be upgraded when upgrading the cluster's proxy. But protocol versions rarely change.	2024-06-17 15:10:04 -04:00
Joey Hess	c7ad44e4d1	work toward supporting proxying to multiple remotes at once For eg, upload fanout. Delay connecting to a remote until it's needed. When there are many proxied remotes, it would not do for the proxy to connect to each of them on startup; that could take a long time.	2024-06-17 14:16:44 -04:00
Joey Hess	83a1db8d17	more specific type	2024-06-17 13:04:40 -04:00
Joey Hess	b72ccc6f0c	improve types	2024-06-17 12:44:08 -04:00
Joey Hess	dfdda95053	proxy updates location tracking information This does mean a redundant write to the git-annex branch. But, it means that two clients can be using the same proxy, and after one sends a file to a proxied remote, the other only has to pull from the proxy to learn about that. It does not need to pull from every remote behind the proxy (which it couldn't do anyway as git repo access is not currently proxied). Anyway, the overhead of this in git-annex branch writes is no worse than eg, sending a file to a repository where git-annex assistant is running, which then sends the file on to a remote, and updates the git-annex branch then. Indeed, when the assistant also drops the local copy, that results in more writes to the git-annex branch.	2024-06-12 11:37:14 -04:00
Joey Hess	96853cd833	finish P2P protocol proxying CONNECT is not supported by git-annex-shell p2pstdio, but for proxying to tor-annex remotes, it will be supported, and will make a git pull/push to a proxied remote work the same with that as it does over ssh, eg it accesses the proxy's git repo not the proxied remote's git repo. The p2p protocol docs say that NOTIFYCHANGES is not always supported, and it looked annoying to implement it for this, and it also seems pretty useless, so make it be a protocol error. git-annex remotedaemon will already be getting change notifications from the proxy's git repo, so there's no need to get additional redundant change notifications for proxied remotes that would be for changes to the same git repo.	2024-06-12 10:40:51 -04:00
Joey Hess	6e1df33960	minimized code duplication due to type checker limitations	2024-06-11 17:16:49 -04:00
Joey Hess	5beaffb412	proxying PUT now working The almost identical code duplication between relayDATA and relayDATA' is very annoying. I tried quite a few things to parameterize them, but the type checker is having fits when I try it.	2024-06-11 16:56:52 -04:00
Joey Hess	a2f4a8eddf	proxying GET now working Memory use is small and constant; receiveBytes returns a lazy bytestring and it does stream. Comparing speed of a get of a 500 mb file over proxy from origin-origin, vs from the same remote over a direct ssh: joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from origin-origin get bigfile (from origin-origin...) ok (recording state in git...) 1.89user 0.67system 0:10.79elapsed 23%CPU (0avgtext+0avgdata 68716maxresident)k 0inputs+984320outputs (0major+10779minor)pagefaults 0swaps joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from direct-ssh get bigfile (from direct-ssh...) ok 1.79user 0.63system 0:10.49elapsed 23%CPU (0avgtext+0avgdata 65776maxresident)k 0inputs+1024312outputs (0major+9773minor)pagefaults 0swaps So the proxy doesn't add much overhead even when run on the same machine as the client and remote. Still, piping receiveBytes into sendBytes like this does suggest that the proxy could be made to use less CPU resouces by using `sendfile()`.	2024-06-11 15:09:43 -04:00
Joey Hess	58d8ba5a4f	implement simple proxy actions (untested) Still need to implement GET and PUT, and will implement CONNECT and NOTIFYCHANGE for completeness. All ServerMode checking is implemented for the proxy. There are two possible approaches for how the proxy sends back messages from the remote to the client. One would be to have a background thread that reads messages and sends them back as they come in. The other, which is being implemented so far, is to read messages from the remote at points where it is expected to send them, and relay back to the client before reading the next message from the client. At this point, I'm unsure which approach would be better. The need for proxynoresponse to be used by UNLOCKCONTENT, for example, builds protocol knowledge into the proxy which it would not need with the other method.	2024-06-11 12:56:20 -04:00
Joey Hess	373ae49c87	factor out helper functions These will be used by the proxy, which needs to check the ServerMode in the same way.	2024-06-11 12:04:58 -04:00
Joey Hess	92c83a417f	refactoring	2024-06-11 10:22:05 -04:00
Joey Hess	501d65eeab	started implementing git-annex-shell proxy So far, it negotiates VERSION with both parties. This is a tricky dance. Untested.	2024-06-10 18:01:36 -04:00
Joey Hess	9a8391078a	git-annex-shell: block relay requests connRepo is only used when relaying git upload-pack and receive-pack. That's only supposed to be used when git-annex-remotedaemon is serving git-remote-tor-annex connections over tor. But, it was always set, and so could be used in other places possibly. Fixed by making connRepo optional in the P2P protocol interface. In Command.EnableTor, it's not needed, because it only speaks the protocol in order to check that it's able to connect back to itself via the hidden service. So changed that to pass Nothing rather than the git repo. In Remote.Helper.Ssh, it's connecting to git-annex-shell p2pstdio, so is making the requests, so will never need connRepo. In git-annex-shell p2pstdio, it was accepting git upload-pack and receive-pack requests over the P2P protocol, even though nothing sent them. This is arguably a security hole, particularly if the user has set environment variables like GIT_ANNEX_SHELL_LIMITED to prevent git push/pull via git-annex-shell.	2024-06-10 14:16:27 -04:00
Joey Hess	f6cf2dec4c	disk free checking for unsized keys Improve disk free space checking when transferring unsized keys to local git remotes. Since the size of the object file is known, can check that instead. Getting unsized keys from local git remotes does not check the actual object size. It would be harder to handle that direction because the size check is run locally, before anything involving the remote is done. So it doesn't know the size of the file on the remote. Also, transferring unsized keys to other remotes, including ssh remotes and p2p remotes don't do disk size checking for unsized keys. This would need a change in protocol. (It does seem like it would be possible to implement the same thing for directory special remotes though.) In some sense, it might be better to not ever do disk free checking for unsized keys, than to do it only sometimes. A user might notice this direction working and consider it a bug that the other direction does not. On the other hand, disk reserve checking is not implemented for most special remotes at all, and yet it is implemented for a few, which is also inconsistent, but best effort. And so doing this best effort seems to make some sense. Fundamentally, if the user wants the size to always be checked, they should not use unsized keys. Sponsored-by: Brock Spratlen on Patreon	2024-01-16 14:29:10 -04:00
Joey Hess	cd544e548b	filter out control characters in error messages giveup changed to filter out control characters. (It is too low level to make it use StringContainingQuotedPath.) error still does not, but it should only be used for internal errors, where the message is not attacker-controlled. Changed a lot of existing error to giveup when it is not strictly an internal error. Of course, other exceptions can still be thrown, either by code in git-annex, or a library, that include some attacker-controlled value. This does not guard against those. Sponsored-by: Noam Kremen on Patreon	2023-04-10 13:50:51 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Joey Hess	33a05a1a91	small RawFilePath optimisation	2023-03-02 10:53:12 -04:00
Joey Hess	54ad1b4cfb	Windows: Support long filenames in more (possibly all) of the code Works around this bug in unix-compat: https://github.com/jacobstanley/unix-compat/issues/56 getFileStatus and other FilePath using functions in unix-compat do not do UNC conversion on Windows. Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the necessary conversion on windows to support long filenames. Audited all imports of System.PosixCompat.Files to make sure that no functions that operate on FilePath were imported from it. Instead, use the equvilants from Utility.RawFilePath. In particular the re-export of that module in Common had to be removed, which led to lots of other changes throughout the code. The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig make Utility.Directory not be needed to build setup. And so let it use Utility.RawFilePath, which depends on unix, which cannot be in setup-depends. Sponsored-by: Dartmouth College's Datalad project	2023-03-01 15:55:58 -04:00
Joey Hess	e95747a149	fix handling of corrupted data received from git remote Recover from corrupted content being received from a git remote due eg to a wire error, by deleting the temporary file when it fails to verify. This prevents a retry from failing again. Reversion introduced in version 8.20210903, when incremental verification was added. Only the git remote seems to be affected, although it is certianly possible that other remotes could later have the same issue. This only affects things passed to getViaTmp that return (False, UnVerified) due to verification failing. As far as getViaTmp can tell, that could just as well mean that the transfer failed in a way that would resume, so it cannot delete the temp file itself. Remote.Git and P2P.Annex use getViaTmp internally, while other remotes do not, which is why only it seems affected. A better fix perhaps would be to improve the types of the callback passed to getViaTmp, so that some other value could be used to indicate the state where the transfer succeeded but verification failed. Sponsored-by: Boyd Stephen Smith Jr.	2022-01-07 13:25:33 -04:00
Joey Hess	8034f2e9bb	factor out IncrementalHasher from IncrementalVerifier	2021-11-09 12:33:22 -04:00
Joey Hess	88b63a43fa	distinguish between incremental verification failing and not being done Sponsored-by: Dartmouth College's DANDI project	2021-08-18 14:38:02 -04:00
Joey Hess	449851225a	refactor IncrementalVerifier moved to Utility.Hash, which will let Utility.Url use it later. It's perhaps not really specific to hashing, but making a separate module just for the data type seemed unncessary. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 13:19:02 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	de482c7eeb	move verifyKeyContent to Annex.Verify The goal is that Database.Keys be able to use it; it can't use Annex.Content.Presence due to an import loop. Several other things also needed to be moved to Annex.Verify as a conseqence.	2021-07-27 14:07:23 -04:00
Joey Hess	aaba83795b	switch from hslogger to purpose-built Utility.Debug This uses a DebugSelector, rather than debug levels, which will allow for a later option like --debug-from=Process to only see debuging about running processes. The module name that contains the thing being debugged is used as the DebugSelector (in most cases; does not need to be a hard and fast rule). Debug calls were changed to add that. hslogger did not display that first parameter to debugM, but the DebugSelector does get displayed. Also fastDebug will allow doing debugging in places that are used in tight loops, with the DebugSelector coming from the Annex Reader essentially for free. Not done yet.	2021-04-05 13:40:31 -04:00
Joey Hess	6940d4ad40	chance case of "transfer failed" to match "Transfer stalled"	2021-03-06 17:47:05 -04:00
Joey Hess	4b63e932f3	incremental checksum on upload to ssh or p2p	2021-02-10 12:41:05 -04:00
Joey Hess	94f6210b68	deal with possibility of short read by S.hGet It may read less than requested, and may yield an empty string if the file was somehow shorter than expected.	2021-02-09 22:20:05 -04:00
Joey Hess	62e152f210	incremental checksum on download from ssh or p2p Checksum as content is received from a remote git-annex repository, rather than doing it in a second pass. Not tested at all yet, but I imagine it will work! Not implemented for any special remotes, and also not implemented for copies from local remotes. It may be that, for local remotes, it will suffice to use rsync, rely on its checksumming, and simply return Verified. (It would still make a checksumming pass when cp is used for COW, I guess.)	2021-02-09 17:03:27 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	94b323a8e8	use TotalSize more extensively	2020-12-11 12:10:43 -04:00
Joey Hess	a422a056f2	make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process.	2020-12-11 11:50:13 -04:00
Joey Hess	7e25c643cf	avoid P2P.Protocol defining instance Proto.Serializable AssociatedFile It's got its own specific hacks, so it needs it own specific data type.	2020-12-09 13:46:42 -04:00
Joey Hess	4c47568876	refactoring This is groundwork for using git-annex transferkeys to run transfers, in order to allow stalled transfers to be interrupted and retried. The new upload and download are closer to what git-annex transferkeys does, so the plan is to make them use it. Then things that were left using upload' and download' won't recover from stalls. Notably, that includes import and export. But at least get/move/copy will be able to. (Also the assistant hopefully, but not yet.) This commit was sponsored by Jake Vosloo on Patreon.	2020-12-07 14:49:17 -04:00
Joey Hess	0540e987b3	improve p2p protocol handling of requested object not available Avoid spurious "verification of content failed" message when downloading content from a ssh or tor remote fails due to the remote no longer having a copy of the content. The P2P protocol already handled this case by sending DATA 0, followed by VALID. But VALID was not really right, because the data is not the requested data. So, send DATA 0, followed by INVALID. Old versions of git-annex handle INVALID the same as VALID in this case. Now new versions avoid displaying an incorrect message. It would be better for the P2P protocol to have a different way to indicate this, like perhaps sending INVALID without DATA. But that would be a breaking change and need a new protocol verison. Since INVALID already is part of the protocol and already needs to be handled, using it for this special case too seems ok, and avoids the complication of another protocol version. This commit was sponsored by Jochen Bartl on Patreon.	2020-12-01 16:05:55 -04:00
Joey Hess	a3b714ddd9	finish fixing removeLink on windows `9cb250f7be` got the ones in RawFilePath, but there were others that used the one from unix-compat, which fails at runtime on windows. To avoid this, import System.PosixCompat.Files hiding removeLink This commit was sponsored by Ethan Aubin.	2020-11-24 13:20:44 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	2c8cf06e75	more RawFilePath conversion Converted file mode setting to it, and follow-on changes. Compiles up through 369/646. This commit was sponsored by Ethan Aubin.	2020-11-05 18:45:37 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	ca80c3154c	more RawFilePath conversion removeFile changed to removeLink, because AFAICS it should be fine to remove non-file things here. In particular, it's fine to remove a symlink, since we're about to write a symlink. (removeLink does not remove directories, so file, symlink, and unix socket are the only possibilities.)	2020-10-30 13:07:41 -04:00
Joey Hess	e505c03bcc	more RawFilePath conversion nukeFile replaced with removeWhenExistsWith removeLink, which allows using RawFilePath. Utility.Directory cannot use RawFilePath since setup does not depend on posix. This commit was sponsored by Graham Spencer on Patreon.	2020-10-29 10:50:29 -04:00
Joey Hess	2a45b5ae9a	avoid failure to lock content of removed file causing drop etc to fail This was already prevented in other ways, but as seen in commit `c30fd24d91`, those were a bit fragile. And I'm not sure races were avoided in every case before. At least a race between two separate git-annex processes, dropping the same content, seemed possible. This way, if locking fails, and the content is not present, it will always do the right thing. Also, it avoids the overhead of an unncessary inAnnex check for every file. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-07-25 11:59:33 -04:00
Joey Hess	aa1ad0b7ca	remove redundant imports Clean build under ghc 8.8.3, which seems to do better at finding cases where two imports both provide the same symbol, and warns about one of them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:05:34 -04:00
Joey Hess	a477f7253c	async exception safety	2020-06-05 14:42:11 -04:00
Joey Hess	e683207123	make runRelayService async exception safe Use withCreateProcess so the helper process will be shut down if the thread is killed. Use withAsync to ensure the helper threads get shut down too.	2020-06-03 13:51:56 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	e035bc5324	minor typos	2019-03-27 11:15:20 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	e89bb4361b	distinguish between cached and uncached creds p2p and multicast creds are not cached the same way that s3 and webdav creds are. The difference is that p2p and multicast obtain the creds themselves, as part of a process like pairing. So they're storing the only extant copy of the creds. In s3 and webdav etc the creds are provided by the cloud storage provider. This is a fine difference, but I do think it's a reasonable difference. If the user wants to prevent s3 and webdav etc creds from being stored unencrypted on disk, they won't feel the same about p2p auth tokens used for tor, or a multicast encryption key, or for that matter their local ssh private key. This commit was sponsored by Fernando Jimenez on Patreon.	2018-12-04 14:09:18 -04:00
Joey Hess	a0e573a3cc	comment typo	2018-11-12 11:53:44 -04:00
Joey Hess	6ecd55a9fa	Fixed some other potential hangs in the P2P protocol Finishes the start made in `983c9d5a53`, by handling the case where `transfer` fails for some other reason, and so the ReadContent callback does not get run. I don't know of a case where `transfer` does fail other than the locking dealt with in that commit, but it's good to have a guarantee. StoreContent and StoreContentTo had a similar problem. Things like `getViaTmp` may decide not to run the transfer action. And `transfer` could certianly fail, if another transfer of the same object was in progress. (Or a different object when annex.pidlock is set.) If the transfer action was not run, the content of the object would not all get consumed, and so would get interpreted as protocol commands, which would not go well. My approach to fixing all of these things is to set a TVar only once all the data in the transfer is known to have been read/written. This way the internals of `transfer`, `getViaTmp` etc don't matter. So in ReadContent, it checks if the transfer completed. If not, as long as it didn't throw an exception, send empty and Invalid data to the callback. On an exception the state of the protocol is unknown so it has to raise ProtoFailureException and close the connection, same as before. In StoreContent, if the transfer did not complete some portion of the DATA has been read, so the protocol is in an unknown state and it has to close the conection as well. (The ProtoFailureMessage used here matches the one in Annex.Transfer, which is the most likely reason. Not ideal to duplicate it..) StoreContent did not ever close the protocol connection before. So this is a protocol change, but only in an exceptional circumstance, and it's not going to break anything, because clients already need to deal with the connection breaking at any point. The way this new behavior looks (here origin has annex.pidlock = true so will only accept one upload to it at a time): git annex copy --to origin -J2 copy x (to origin...) ok copy y (to origin...) Lost connection (fd:25: hGetChar: end of file) This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-06 14:52:32 -04:00
Joey Hess	983c9d5a53	git-annex-shell: fix transfer hang Fix hang when transferring the same objects to two different clients at the same time. (Or when annex.pidlock is used, two different objects to the same or different clients.) Could also potentially occur if a client was downloading an object and somehow lost connection but that git-annex-shell was still running and holding the transfer lock. This does not guarantee that, if `transfer` fails for some other reason, a DATA response will be made. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-06 13:00:37 -04:00
Joey Hess	0b053b9611	Fix a P2P protocol hang When readContent got Nothing from prepSendAnnex, it did not run its callback, and the callback is what sends the DATA reply. sendContent checks with contentSize that the object file is present, but that doesn't really guarantee that prepSendAnnex won't return Nothing. So, it was possible for a P2P protocol GET to not receive a response, and appear to hang. When what it's really doing is waiting for the next protocol command. This seems most likely to happen when the annex is in direct mode, and the file being requested has been modified. It could also happen in an indirect mode repository if genInodeCache somehow failed. Perhaps due to a race with a drop of the content file. Fixed by making readContent behave the way its spec said it should, and run the callback with L.empty in this case. Note that, it's finee for readContent to send any amount of data to the callback, including L.empty. sendBytes deals with that by making sure it sends exactly the specified number of bytes, aborting the protocol if it's too short. So, when L.empty is sent, the protocol will end up aborting. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-02 13:41:50 -04:00
Joey Hess	7b33e6c9f3	simplify	2018-10-22 15:54:12 -04:00
Joey Hess	fcca7adaff	instrument P2P --debug with connection and thread info For debugging http://git-annex.branchable.com/bugs/annex_get_-J_16_via_ssh_stalls_/ This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-10-22 15:52:11 -04:00
Joey Hess	6134431254	clean P2P protocol shutdown on EOF try 2 Same goal as `b18fb1e343` but without breaking backwards compatability. Just return IO exceptions when running the P2P protocol, so that git-annex-shell can detect eof and avoid the ugly message. This commit was sponsored by Ethan Aubin.	2018-09-25 16:49:59 -04:00
Joey Hess	16cbecbd09	Revert "clean P2P protocol shutdown on EOF" This reverts commit `b18fb1e343`. That broke support for old git-annex-shell before p2pstdio was added. The immediate problem is that postAuth had a fallthrough case that sent an error back to the peer, but sending an error back when the connection is closed is surely not going to work. But thinking about it some more, making every function that uses receiveMessage need to handle ProtocolEOF adds a lot of complication, so I don't want to do that. The commit only cleaned up the test suite output a tiny bit, so I'm just gonna revert it for now.	2018-09-25 14:04:12 -04:00
Joey Hess	b18fb1e343	clean P2P protocol shutdown on EOF Avoids "git-annex-shell: <stdin>: hGetChar: end of file" being displayed by the test suite, due to the way it runs git-annex-shell without using ssh. git-annex-shell over ssh was not affected because git-annex hangs up the ssh connection and so never sees the error message that git-annnex-shell probably did emit. This commit was sponsored by Ryan Newton on Patreon.	2018-09-13 10:46:37 -04:00
Joey Hess	b657242f5d	enforce retrievalSecurityPolicy Leveraged the existing verification code by making it also check the retrievalSecurityPolicy. Also, prevented getViaTmp from running the download action at all when the retrievalSecurityPolicy is going to prevent verifying and so storing it. Added annex.security.allow-unverified-downloads. A per-remote version would be nice to have too, but would need more plumbing, so KISS. (Bill the Cat reference not too over the top I hope. The point is to make this something the user reads the documentation for before using.) A few calls to verifyKeyContent and getViaTmp, that don't involve downloads from remotes, have RetrievalAllKeysSecure hard-coded. It was also hard-coded for P2P.Annex and Command.RecvKey, to match the values of the corresponding remotes. A few things use retrieveKeyFile/retrieveKeyFileCheap without going through getViaTmp. * Command.Fsck when downloading content from a remote to verify it. That content does not get into the annex, so this is ok. * Command.AddUrl when using a remote to download an url; this is new content being added, so this is ok. This commit was sponsored by Fernando Jimenez on Patreon.	2018-06-21 13:37:01 -04:00
Joey Hess	4a3f1a15c5	improve indent	2018-06-14 11:40:23 -04:00
Joey Hess	85f9360d9b	GIT_ANNEX_SHELL_APPENDONLY Makes it allow writes, but not deletion of annexed content. Note that securing pushes to the git repository is left up to the user. This commit was sponsored by Jack Hill on Patreon.	2018-05-25 13:17:56 -04:00
Joey Hess	891d6d97f7	squash -Wsimplifiable-class-constraints warnings I have not tested this with older ghc than 8.2.2.	2018-04-22 13:42:21 -04:00
Joey Hess	46d4316954	implement annex.retry et al Added annex.retry, annex.retry-delay, and per-remote versions to configure transfer retries. This commit was supported by the NSF-funded DataLad project.	2018-03-29 13:04:07 -04:00
Joey Hess	31e1adc005	deal with unlocked files P2P protocol version 1 adds VALID\|INVALID after DATA; INVALID means the file was detected to change content while it was being sent and so we may not have received the valid content of the file. Added new MustVerify constructor for Verification, which forces verification even when annex.verify=false etc. This is used when INVALID and in protocol version 0. As well as changing git-annex-shell p2psdio, this makes git-annex tor remotes always force verification, since they don't yet use protocol version 1. Previously, annex.verify=false could skip verification when using tor remotes, and let bad data into the repository. This commit was sponsored by Jack Hill on Patreon.	2018-03-13 14:27:14 -04:00
Joey Hess	e16b069331	use total size from DATA Noticed that getting a key whose size is not known resulted in a progress display that didn't include the percent complete. Fixed for P2P by making the size sent with DATA be used to update the meter's total size. In order for rateLimitMeterUpdate to also learn the total size, had to make it be passed the Meter, and some other reorg in Utility.Metered was also done so that --json-progress can construct a Meter to pass to rateLimitMeterUpdate. When the fallback rsync is done, the progress display still doesn't include the percent complete. Only way to fix that seems to be to let rsync display its output again, but that would conflict with git-annex's own progress meter, which is also being displayed. This commit was sponsored by Henrik Riomar on Patreon.	2018-03-12 21:46:58 -04:00
Joey Hess	28589c92d2	no protocol 1 yet	2018-03-12 15:42:15 -04:00
Joey Hess	596af7cbc4	move protocol version stuff to the Net free monad Needs to be in Net not Local, so that Net actions can take the protocol version into account. This commit was sponsored by an anonymous bitcoin donor.	2018-03-12 15:20:51 -04:00
Joey Hess	c81768d425	version the P2P protocol Unfortunately ReceiveMessage didn't handle unknown messages the way it was documented to; client sending VERSION would cause the server to return an ERROR and hang up. Fixed that, but old releases of git-annex use the P2P protocol for tor and will still have that behavior. So, version is not negotiated for Remote.P2P connections, only for Remote.Git connections, which will support VERSION from their first release. There will need to be a later flag day to change Remote.P2P; left a commented out line that is the only thing that will need to be changed then. Version 1 of the P2P protocol is not implemented yet, but updated the docs for the DATA change that will be allowed by that version. This commit was sponsored by Jeff Goeke-Smith on Patreon.	2018-03-12 14:36:35 -04:00
Joey Hess	c036a380b2	p2p ssh connection pools Much like Remote.P2P, there's a pool of connections to a peer, in order to support concurrent operations. Deals with old git-annex-ssh on the remote that does not support p2pstdio, by only trying once to use it, and remembering if it's not supported. Made p2pstdio send an AUTH_SUCCESS with its uuid, which serves the dual purposes of something to detect to see that the connection is working, and a way to verify that it's connected to the right uuid. (There's a redundant uuid check since the uuid field is sent by git_annex_shell, but I anticipate that being removed later when the legacy git-annex-shell stuff gets removed.) Not entirely happy with Remote.Git.runSsh's behavior when the proto action fails. Running the fallback will work ok, but what will we do when the fallbacks later get removed? It might be better to try to reconnect, in case the connection got closed. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-03-08 15:11:31 -04:00
Joey Hess	6ddfa9807b	implemented git-annex-shell p2pstdio Not yet used by git-annex, but this will allow faster transfers etc than using individual ssh connections and rsync. Not called git-annex-shell p2p, because git-annex p2p does something else and I don't want two subcommands with the same name between the two for sanity reasons. This commit was sponsored by Øyvind Andersen Holm.	2018-03-07 15:38:01 -04:00
Joey Hess	f4103744c3	make sure that lockContentShared is always paired with an inAnnex check lockContentShared had a screwy caveat that it didn't verify that the content was present when locking it, but in the most common case, eg indirect mode, it failed to lock when the content is not present. That led to a few callers forgetting to check inAnnex when using it, but the potential data loss was unlikely to be noticed because it only affected direct mode I think. Fix data loss bug when the local repository uses direct mode, and a locally modified file is dropped from a remote repsitory. The bug caused the modified file to be counted as a copy of the original file. (This is not a severe bug because in such a situation, dropping from the remote and then modifying the file is allowed and has the same end result.) And, in content locking over tor, when the remote repository is in direct mode, it neglected to check that the content was actually present when locking it. This could cause git annex drop to remove the only copy of a file when it thought the tor remote had a copy. So, make lockContentShared do its own inAnnex check. This could perhaps be optimised for direct mode, to avoid the check then, since locking the content necessarily verifies it exists there, but I have not bothered with that. This commit was sponsored by Jeff Goeke-Smith on Patreon.	2018-03-07 14:23:52 -04:00
Joey Hess	572a45ae00	add readonly mode to serve P2P protocol This will be used by git-annex-shell when configured to be readonly. This commit was sponsored by Nick Daly on Patreon.	2018-03-07 13:15:55 -04:00
Joey Hess	ba53f60801	refactor	2018-03-06 15:14:53 -04:00
Joey Hess	73704b22a9	comment typo	2018-03-06 14:58:24 -04:00
Joey Hess	c8e1e3dada	AssociatedFile newtype To prevent any further mistakes like `301aff34c4` This commit was sponsored by Francois Marier on Patreon.	2017-03-10 13:35:31 -04:00
Joey Hess	00be07070c	fix build on windows	2016-12-30 12:31:51 -04:00

1 2 3 4 5 ...

304 commits