git-annex

Author	SHA1	Message	Date
Joey Hess	84d1bb746b	LiveUpdate for clusters	2024-08-24 10:20:12 -04:00
Joey Hess	c3d40b9ec3	plumb in LiveUpdate (WIP) Each command that first checks preferred content (and/or required content) and then does something that can change the sizes of repositories needs to call prepareLiveUpdate, and plumb it through the preferred content check and the location log update. So far, only Command.Drop is done. Many other commands that don't need to do this have been updated to keep working. There may be some calls to NoLiveUpdate in places where that should be done. All will need to be double checked. Not currently in a compilable state.	2024-08-23 16:35:12 -04:00
Joey Hess	1038567881	proxy stores received keys to known export locations This handles the workflow where the branch is first pushed to the proxy, and then files in the exported tree are later are copied to the proxied remote. Turns out that the way the export log is structured, nothing needs to be done to finalize the export once the last key is sent to it. Which is great because that would have been a lot of complication. On receiving the push, Command.Export runs and calls recordExportBeginning, does as much as it can to update the export with the files currently on it, and then calls recordExportUnderway. At that point, the export.log records the export as "complete", but it's not really. And that's fine. The same happens when using `git-annex export` when some files are not available to send. Other repositories that have access to the special remote can already retrieve files from it. As the missing files get copied to the exported remote, all that needs to be done is record each in the export db. At this point, proxying to exporttree=yes annexobjects=yes special remotes is fully working. Except for in the case where multiple files in the tree use the same key, and the files are sent to the proxied remote before pushing the tree. It seems that even special remotes without annexobjects=yes will work if used with the workflow where the git-annex branch is pushed before copying files. But not with the `git-annex push` workflow.	2024-08-07 09:47:34 -04:00
Joey Hess	c53f61e93f	Merge branch 'master' into exportreeplus	2024-08-06 14:46:33 -04:00
Joey Hess	3cc03b4c96	fix file corruption when proxying an upload to a special remote The file corruption consists of each chunk of the file being duplicated. Since chunks are typically a fixed size, it would certianly be possible to get from a corrupted file back to the original file. But this is still bad data loss. Reversion was in commit `fcc052bed8`. Luckily that did not make the most recent release.	2024-08-06 14:41:19 -04:00
Joey Hess	3289b1ad02	proxying to exporttree=yes annexobjects=yes basically working It works when using git-annex sync/push/assist, or when manually sending all content to the proxied remote before pushing to the proxy remote. But when the push comes before the content is sent, sending content does not update the exported tree.	2024-08-06 14:21:23 -04:00
Joey Hess	fcc052bed8	When proxying an upload to a special remote, verify the hash. While usually uploading to a special remote does not verify the content, the content in a repository is assumed to be valid, and there is no trust boundary. But with a proxied special remote, there may be users who are allowed to store objects, but are not really trusted. Another way to look at this is it's the equivilant of git-annex-shell checking the hash of received data, which it does (see StoreContent implementation).	2024-07-29 13:40:51 -04:00
Joey Hess	0dc064a9ad	When proxying for a special remote, avoid unncessary hashing Like the comment says, the client will do its own verification. But it was calling verifyKeyContentPostRetrieval, which was hashing the file.	2024-07-29 11:18:03 -04:00
Joey Hess	dfe65b92c8	avoid repeatedly parsing the proxy log	2024-07-28 16:04:20 -04:00
Joey Hess	cdc4bd7443	fix hang in PUT of large file to a special remote node of a cluster over http	2024-07-28 15:34:59 -04:00
Joey Hess	18ed4e5b20	use closedv rather than separate endv Doesn't fix any known problem, but this way if the connection does get closed, it will notice.	2024-07-28 15:11:31 -04:00
Joey Hess	66679c9bb4	remove temp file after upload to special remote	2024-07-28 14:36:45 -04:00
Joey Hess	ef8f24f28c	fix PUT to http proxied special remote It was hanging because it never sent FAILURE in the INVALID case. And putoffset always triggers the INVALID case.	2024-07-28 09:14:42 -04:00
Joey Hess	267a202e72	clean up after http p2p proxy GET is interrupted There was an annex worker thread that did not get stopped. It was stuck in ReceiveMessage from the P2PHandleTMVar. Fixed by making P2PHandleTMVar closeable. In serveGet, releaseP2PConnection has to come first, else the annexworker may not shut down, if it's waiting to read from it. In proxyConnection, call closeRemoteSide in order to wait for the ssh process (for example).	2024-07-26 15:33:20 -04:00
Joey Hess	cc1da2d516	http p2p proxy is now largely working	2024-07-26 10:44:10 -04:00
Joey Hess	b391756b32	remove some debugging	2024-07-25 21:36:10 -04:00
Joey Hess	3d14e2cf58	http server support for proxies, incomplete Refactored git-annex-shell code so this can use checkCanProxy'. At this point all that remains is opening a proxy connection, and using a proxy connection.	2024-07-25 13:19:24 -04:00
Joey Hess	4826a3745d	servePut and clientPut implementation Made the data-length header required even for v0. This simplifies the implementation, and doesn't preclude extra verification being done for v0. The connectionWaitVar is an ugly hack. In servePut, nothing waits on the waitvar, and I could not find a good way to make anything wait on it.	2024-07-22 10:27:44 -04:00
Joey Hess	74c6175795	fix serveGet early handle close Needed that waitv after all..	2024-07-11 09:55:17 -04:00
Joey Hess	3b37b9e53f	fix serveGet hang This came down to SendBytes waiting on the waitv. Nothing ever filled it. Only Annex.Proxy needs the waitv, and it handles filling it. So make it optional.	2024-07-11 07:46:52 -04:00
Joey Hess	1243af4a18	toward SafeDropProof expiry checking Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is based on a LockedCopy. If there are several LockedCopies, it uses the closest expiry time. That is not optimal, it may be that the proof expires based on one LockedCopy but another one has not expired. But that seems unlikely to really happen, and anyway the user can just re-run a drop if it fails due to expiry. Pass the SafeDropProof to removeKey, which is responsible for checking it for expiry in situations where that could be a problem. Which really only means in Remote.Git. Made Remote.Git check expiry when dropping from a local remote. Checking expiry when dropping from a P2P remote is not yet implemented. P2P.Protocol.remove has SafeDropProof plumbed through to it for that purpose. Fixing the remaining 2 build warnings should complete this work. Note that the use of a POSIXTime here means that if the clock gets set forward while git-annex is in the middle of a drop, it may say that dropping took too long. That seems ok. Less ok is that if the clock gets turned back a sufficient amount (eg 5 minutes), proof expiry won't be noticed. It might be better to use the Monotonic clock, but that doesn't advance when a laptop is suspended, and while there is the linux Boottime clock, that is not available on other systems. Perhaps a combination of POSIXTime and the Monotonic clock could detect laptop suspension and also detect clock being turned back? There is a potential future flag day where p2pDefaultLockContentRetentionDuration is not assumed, but is probed using the P2P protocol, and peers that don't support it can no longer produce a LockedCopy. Until that happens, when git-annex is communicating with older peers there is a risk of data loss when a ssh connection closes during LOCKCONTENT.	2024-07-04 12:39:06 -04:00
Joey Hess	fa5e7463eb	fix display when proxied GET yields ERROR The error message is not displayed to the use, but this mirrors the behavior when a regular get from a special remote fails. At least now there is not a protocol error.	2024-07-01 11:19:02 -04:00
Joey Hess	dce3848ad8	avoid populating proxy's object file when storing on special remote Now that storeKey can have a different object file passed to it, this complication is not needed. This avoids a lot of strange situations, and will also be needed if streaming is eventually supported.	2024-07-01 10:53:49 -04:00
Joey Hess	8b5fc94d50	add optional object file location to storeKey This will be used by the next commit to simplify the proxy.	2024-07-01 10:42:27 -04:00
Joey Hess	711a5166e2	PUT to proxied special remote working Still needs some work. The reason that the waitv is necessary is because without it, runNet loops back around and reads the next protocol message. But it's not finished reading the whole bytestring yet, and so it reads some part of it.	2024-06-28 17:10:58 -04:00
Joey Hess	2e5af38f86	GET from proxied special remote Working, but lots of room for improvement... Without streaming, so there is a delay before download begins as the file is retreived from the special remote. And when resuming it retrieves the whole file from the special remote again. Also, if the special remote throws an exception, currently it shows as "protocol error".	2024-06-28 15:44:48 -04:00
Joey Hess	158d7bc933	fix handling of ERROR in response to REMOVE This allows an error message from a proxied special remote to be displayed to the client. In the case where removal from several nodes of a cluster fails, there can be several errors. What to do? I decided to only show the first error to the user. Probably in this case the user is not in a position to do anything about an error message, so best keep it simple. If the problem with the first node is fixed, they'll see the error from the next node.	2024-06-28 14:10:25 -04:00
Joey Hess	a6ea057f6b	fix handling of ERROR in response to CHECKPRESENT That error is now rethrown on the client, so it will be displayed. For example: $ git-annex fsck x --fast --from AMS-dir fsck x (special remote reports: directory /home/joey/tmp/bench2/dir is not accessible) failed No protocol version check is needed. Because in order to talk to a proxied special remote, the client has to be running the upcoming git-annex release. Which has this fix in it.	2024-06-28 13:46:27 -04:00
Joey Hess	d3c75c003a	proxying special remotes This is early, but already working for CHECKPRESENT. However, when the special remote throws an exception on checkPresent, this happens: [2024-06-28 13:22:18.520884287] (P2P.IO) [ThreadId 4] P2P > ERROR directory /home/joey/tmp/bench2/dir is not accessible [2024-06-28 13:22:18.521053135] (P2P.IO) [ThreadId 4] P2P < ERROR expected SUCCESS or FAILURE git-annex: client error: expected SUCCESS or FAILURE (fixing location log) p2pstdio: 1 failed Based on the location log, x was expected to be present, but its content is missing. failed	2024-06-28 13:31:19 -04:00
Joey Hess	cf59d7f92c	GET and CHECKPRESENT amoung lowest cost cluster nodes Before it was using a node that might have had a higher cost. Also threw in a random selection from amoung the low cost nodes. Of course this is a poor excuse for load balancing, but it's better than nothing. Most of the time...	2024-06-27 14:36:55 -04:00
Joey Hess	3dad9446ce	distributed cluster cycle prevention Added BYPASS to P2P protocol, and use it to avoid cycling between cluster gateways. Distributed clusters are working well now!	2024-06-27 12:20:22 -04:00
Joey Hess	d34326ab76	factor out Annex.Proxy	2024-06-18 10:51:37 -04:00

32 commits