git-annex

Author	SHA1	Message	Date
Joey Hess	e73bbdd95c	enable servant in windows build	2024-07-29 17:24:31 -04:00
Joey Hess	1467fed572	fix build with old text Don't need decodeUtf8Lenient here because B64.encode surely always generates utf8. So decodeUtf8 is safe, it will never throw an exception.	2024-07-29 17:21:41 -04:00
Joey Hess	dc11c5f493	final fix to windows build	2024-07-29 16:32:24 -04:00
Joey Hess	73703d1bef	close	2024-07-29 15:15:40 -04:00
Joey Hess	be8e1ab512	more fixes to windows build for content retention files Will probably build successfully now. Still untested.	2024-07-29 15:14:12 -04:00
Joey Hess	647bff9770	more fixes to windows build for content retention files	2024-07-29 13:58:40 -04:00
Joey Hess	fcc052bed8	When proxying an upload to a special remote, verify the hash. While usually uploading to a special remote does not verify the content, the content in a repository is assumed to be valid, and there is no trust boundary. But with a proxied special remote, there may be users who are allowed to store objects, but are not really trusted. Another way to look at this is it's the equivilant of git-annex-shell checking the hash of received data, which it does (see StoreContent implementation).	2024-07-29 13:40:51 -04:00
Joey Hess	960daf210b	run with noMessages This avoids extraneous output from p2phttp, including eg, progress displays when transferring to proxied special remotes.	2024-07-29 13:35:08 -04:00
Joey Hess	65da672ae6	add libghc-servant-client-core-dev dep	2024-07-29 13:10:40 -04:00
Joey Hess	074fad819d	changelog	2024-07-29 13:09:19 -04:00
Joey Hess	380af6ac5f	update github badges Seems the urls changed and the old ones will be falsely green forever. Found new ones in readme at https://github.com/datalad/git-annex	2024-07-29 13:00:00 -04:00
Joey Hess	f397296739	add missing do on windows	2024-07-29 12:54:52 -04:00
Joey Hess	b4eb6e3ced	comment	2024-07-29 11:59:33 -04:00
Joey Hess	321e2adf66	don't think I ever implementned the 422 idea, it will 404	2024-07-29 11:49:40 -04:00
Joey Hess	d3f584fcdb	wording	2024-07-29 11:44:44 -04:00
Joey Hess	5f5c29fbe7	link	2024-07-29 11:43:30 -04:00
Joey Hess	f3b207a4b9	wording	2024-07-29 11:37:13 -04:00
Joey Hess	6068379e80	typo	2024-07-29 11:34:46 -04:00
Joey Hess	db66612b8f	Merge branch 'httpproto'	2024-07-29 11:33:39 -04:00
Joey Hess	74f81ebd04	Merge remote-tracking branch 'origin/httpproto'	2024-07-29 11:25:27 -04:00
Joey Hess	6f20085a60	update	2024-07-29 11:25:07 -04:00
Joey Hess	60b1c53df5	preparing to merge	2024-07-29 11:22:27 -04:00
Joey Hess	0dc064a9ad	When proxying for a special remote, avoid unncessary hashing Like the comment says, the client will do its own verification. But it was calling verifyKeyContentPostRetrieval, which was hashing the file.	2024-07-29 11:18:03 -04:00
Joey Hess	7402ae61d9	fix reversion in GET from proxy over http `4f3ae96666` caused a hang in GET, which git-annex testremote could reliably cause. The problem is that closing both P2P handles before waiting on the asyncworker prevents all the DATA from getting sent. The solution is to only close the P2P handles early when the P2PConnection is being closed. When it's being released, let the asyncworker finish. closeP2PConnection is called in GET when it was unable to send all data, and in PUT when it did not receive all the data, and in both cases closing the P2P handles early is ok.	2024-07-29 11:07:09 -04:00
Joey Hess	6af44b9de6	p2phttp remotes are not readonly That prevented testremote from working when remote.name.url = http://..	2024-07-29 10:54:14 -04:00
Joey Hess	4f3ae96666	cleanly close proxy connection on interrupted PUT An interrupted PUT to cluster that has a node that is a special remote over http left open the connection to the cluster, so the next request opens another one. So did an interrupted PUT directly to the proxied special remote over http. proxySpecialRemote was stuck waiting for all the DATA. Its connection remained open so it kept waiting. In servePut, checktooshort handles closing the P2P connection when too short a data is received from PUT. But, checktooshort was only called after the protoaction, which is what runs the proxy, which is what was getting stuck. Modified it to run as a background thread, which waits for the tooshortv to be written to, which gather always does once it gets to the end of the data received from the http client. That makes proxyConnection's releaseconn run once all data is received from the http client. Made it close the connection handles before waiting on the asyncworker thread. This lets proxySpecialRemote finish processing any data from the handle, and then it will give up, more or less cleanly, if it didn't receive enough data. I say "more or less cleanly" because with both sides of the P2P connection taken down, some protocol unhappyness results. Which can lead to some ugly debug messages. But also can cause the asyncworker thread to throw an exception. So made withP2PConnections not crash when it receives an exception from releaseconn. This did have a small change to the behavior of an interrupted PUT when proxying to a regular remote. proxyConnection has a protoerrorhandler that closes the proxy connection on a protocol error. But the proxy connection is also closed by checktooshort when it closes the P2P connection. Closing the same proxy connection twice is not a problem, it just results in duplicated debug messages about it.	2024-07-29 10:37:19 -04:00
Joey Hess	c8e7231f48	add debugging of opening and closing connections to proxies	2024-07-29 09:52:26 -04:00
Joey Hess	7ac8d36f38	idea	2024-07-29 09:11:27 -04:00
stv0g	6352cebb92	Added a comment: importtree=yes Support	2024-07-29 06:50:01 +00:00
Joey Hess	5ef3f1e703	remove unused imports	2024-07-28 21:11:23 -04:00
Joey Hess	cd89f91aa5	remove uuid from annex+http urls Not needed it turns out.	2024-07-28 20:29:42 -04:00
Joey Hess	bc9cc79e85	set remote's annexUrl automatically When the remote repository's git config file has annex.url set to an annex+http url.	2024-07-28 20:13:41 -04:00
Joey Hess	c87cfe1e00	todo	2024-07-28 17:29:32 -04:00
Joey Hess	ccbdaf0448	documentation for p2phttp	2024-07-28 17:19:27 -04:00
Joey Hess	dfe65b92c8	avoid repeatedly parsing the proxy log	2024-07-28 16:04:20 -04:00
Joey Hess	2fdec6b4e1	update	2024-07-28 15:55:24 -04:00
Joey Hess	ddabc138ec	todo	2024-07-28 15:41:31 -04:00
Joey Hess	cdc4bd7443	fix hang in PUT of large file to a special remote node of a cluster over http	2024-07-28 15:34:59 -04:00
Joey Hess	18ed4e5b20	use closedv rather than separate endv Doesn't fix any known problem, but this way if the connection does get closed, it will notice.	2024-07-28 15:11:31 -04:00
Joey Hess	66679c9bb4	remove temp file after upload to special remote	2024-07-28 14:36:45 -04:00
Joey Hess	9461793ffc	Merge remote-tracking branch 'origin/master' into httpproto	2024-07-28 14:24:15 -04:00
Joey Hess	ccd102cd19	update	2024-07-28 14:22:44 -04:00
Joey Hess	5e205f215d	clean shut down of cluster connection when PUT is interrupted An interrupted `git-annex copy --to` a cluster via the http server, when repeated, failed. The http server output "transfer already in progress, or unable to take transfer lock". Apparently a second connection was opened to the cluster, because the first connection never got shut down. Turned out the problem was that when proxying to a cluster, it would read a short ByteString from the client, and send that to the nodes. But that left the nodes warning more. Meanwhile, the proxy was expecting a SUCCESS/FAILURE message from the nodes. So it didn't return, and so the cluster connection stayed open.	2024-07-28 14:20:11 -04:00
Joey Hess	bdde6d829c	fix http proxying for a local git remote with a relative path git-annex-shell expects an absolute path	2024-07-28 13:35:51 -04:00
Joey Hess	41667ad36b	found some bugs with clusters	2024-07-28 13:00:05 -04:00
Joey Hess	6722a61a21	clusters need enableInteractiveBranchAccess As seen in commit `770aac97a7`, a cluster relies accurate location logs. If long-running processes are serving a cluster, and one process puts a file, the other process needs to see what nodes it was stored on when checking if the file is present.	2024-07-28 12:39:42 -04:00
Joey Hess	bd3d327d8a	smarter BranchState cache invalidation Only invalidate a just-written file in the cache, not the whole cache. This will avoid the possibly performance impact of cache invalidation mentioned in commit `770aac97a7`	2024-07-28 12:33:32 -04:00
Joey Hess	770aac97a7	share single BranchState amoung all threads This fixes a problem when git-annex testremote is run against a cluster accessed via the http server. Annex.Cluster uses the location log to find nodes that contain a key when checking if the key is present or getting it. Just after a key was stored to a cluster node, reading the location log was not getting the UUID of that node. Apparently the Annex action that wrote to the location log, and the one that read from it were run with two different Annex states. The http server does use several different Annex threads. BranchState was part of the AnnexState, and so two threads could have different BranchStates. Moved BranchState to the AnnexRead, so all threads will see the common state. This might possibly impact performance. If one thread is writing changes to the branch, and another thread is reading from the branch, the writing thread will now invalidate the BranchState's cache, which will cause the reading thread to need to do extra work. But correctness is surely more important. If did is found to have impacted performance, it could probably be dealt with by doing smarter BranchState cache invalidation. Another way this might impact performance is that the BranchState has a small cache. If several threads were reading from the branch and relying on the value they just read still being in the case, now a cache miss will be more likely. Increasing the BranchState cache to the number of jobs might be a good idea to amelorate that. But the cache is currently an innefficient list, so making it large would need changes to the data types. (Commit `4304f1b6ae` dealt with a follow-on effect of the bug fixed here.)	2024-07-28 12:30:27 -04:00
Joey Hess	4304f1b6ae	better handling of content not available from cluster Sending ERROR caused the client to get confused and protocol to freeze. Better to send empty DATA and indicate it's not valid. This fixes a hang in git-annex testremote of a cluster accessed via the http server. That testremote is still failing, for some reason after storing a test key, the cluster reports it as not present.	2024-07-28 11:09:07 -04:00
Joey Hess	fbbedae497	add --clusterjobs option and default to 1 The default of 1 is not ideal at all, but it avoids an accidental M*N causing so much concurrency it becomes unusable.	2024-07-28 10:36:22 -04:00

1 2 3 4 5 ...

45348 commits