git-annex

Author	SHA1	Message	Date
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	e035bc5324	minor typos	2019-03-27 11:15:20 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	e89bb4361b	distinguish between cached and uncached creds p2p and multicast creds are not cached the same way that s3 and webdav creds are. The difference is that p2p and multicast obtain the creds themselves, as part of a process like pairing. So they're storing the only extant copy of the creds. In s3 and webdav etc the creds are provided by the cloud storage provider. This is a fine difference, but I do think it's a reasonable difference. If the user wants to prevent s3 and webdav etc creds from being stored unencrypted on disk, they won't feel the same about p2p auth tokens used for tor, or a multicast encryption key, or for that matter their local ssh private key. This commit was sponsored by Fernando Jimenez on Patreon.	2018-12-04 14:09:18 -04:00
Joey Hess	a0e573a3cc	comment typo	2018-11-12 11:53:44 -04:00
Joey Hess	6ecd55a9fa	Fixed some other potential hangs in the P2P protocol Finishes the start made in `983c9d5a53`, by handling the case where `transfer` fails for some other reason, and so the ReadContent callback does not get run. I don't know of a case where `transfer` does fail other than the locking dealt with in that commit, but it's good to have a guarantee. StoreContent and StoreContentTo had a similar problem. Things like `getViaTmp` may decide not to run the transfer action. And `transfer` could certianly fail, if another transfer of the same object was in progress. (Or a different object when annex.pidlock is set.) If the transfer action was not run, the content of the object would not all get consumed, and so would get interpreted as protocol commands, which would not go well. My approach to fixing all of these things is to set a TVar only once all the data in the transfer is known to have been read/written. This way the internals of `transfer`, `getViaTmp` etc don't matter. So in ReadContent, it checks if the transfer completed. If not, as long as it didn't throw an exception, send empty and Invalid data to the callback. On an exception the state of the protocol is unknown so it has to raise ProtoFailureException and close the connection, same as before. In StoreContent, if the transfer did not complete some portion of the DATA has been read, so the protocol is in an unknown state and it has to close the conection as well. (The ProtoFailureMessage used here matches the one in Annex.Transfer, which is the most likely reason. Not ideal to duplicate it..) StoreContent did not ever close the protocol connection before. So this is a protocol change, but only in an exceptional circumstance, and it's not going to break anything, because clients already need to deal with the connection breaking at any point. The way this new behavior looks (here origin has annex.pidlock = true so will only accept one upload to it at a time): git annex copy --to origin -J2 copy x (to origin...) ok copy y (to origin...) Lost connection (fd:25: hGetChar: end of file) This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-06 14:52:32 -04:00
Joey Hess	983c9d5a53	git-annex-shell: fix transfer hang Fix hang when transferring the same objects to two different clients at the same time. (Or when annex.pidlock is used, two different objects to the same or different clients.) Could also potentially occur if a client was downloading an object and somehow lost connection but that git-annex-shell was still running and holding the transfer lock. This does not guarantee that, if `transfer` fails for some other reason, a DATA response will be made. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-06 13:00:37 -04:00
Joey Hess	0b053b9611	Fix a P2P protocol hang When readContent got Nothing from prepSendAnnex, it did not run its callback, and the callback is what sends the DATA reply. sendContent checks with contentSize that the object file is present, but that doesn't really guarantee that prepSendAnnex won't return Nothing. So, it was possible for a P2P protocol GET to not receive a response, and appear to hang. When what it's really doing is waiting for the next protocol command. This seems most likely to happen when the annex is in direct mode, and the file being requested has been modified. It could also happen in an indirect mode repository if genInodeCache somehow failed. Perhaps due to a race with a drop of the content file. Fixed by making readContent behave the way its spec said it should, and run the callback with L.empty in this case. Note that, it's finee for readContent to send any amount of data to the callback, including L.empty. sendBytes deals with that by making sure it sends exactly the specified number of bytes, aborting the protocol if it's too short. So, when L.empty is sent, the protocol will end up aborting. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-02 13:41:50 -04:00
Joey Hess	7b33e6c9f3	simplify	2018-10-22 15:54:12 -04:00
Joey Hess	fcca7adaff	instrument P2P --debug with connection and thread info For debugging http://git-annex.branchable.com/bugs/annex_get_-J_16_via_ssh_stalls_/ This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-10-22 15:52:11 -04:00
Joey Hess	6134431254	clean P2P protocol shutdown on EOF try 2 Same goal as `b18fb1e343` but without breaking backwards compatability. Just return IO exceptions when running the P2P protocol, so that git-annex-shell can detect eof and avoid the ugly message. This commit was sponsored by Ethan Aubin.	2018-09-25 16:49:59 -04:00
Joey Hess	16cbecbd09	Revert "clean P2P protocol shutdown on EOF" This reverts commit `b18fb1e343`. That broke support for old git-annex-shell before p2pstdio was added. The immediate problem is that postAuth had a fallthrough case that sent an error back to the peer, but sending an error back when the connection is closed is surely not going to work. But thinking about it some more, making every function that uses receiveMessage need to handle ProtocolEOF adds a lot of complication, so I don't want to do that. The commit only cleaned up the test suite output a tiny bit, so I'm just gonna revert it for now.	2018-09-25 14:04:12 -04:00
Joey Hess	b18fb1e343	clean P2P protocol shutdown on EOF Avoids "git-annex-shell: <stdin>: hGetChar: end of file" being displayed by the test suite, due to the way it runs git-annex-shell without using ssh. git-annex-shell over ssh was not affected because git-annex hangs up the ssh connection and so never sees the error message that git-annnex-shell probably did emit. This commit was sponsored by Ryan Newton on Patreon.	2018-09-13 10:46:37 -04:00
Joey Hess	b657242f5d	enforce retrievalSecurityPolicy Leveraged the existing verification code by making it also check the retrievalSecurityPolicy. Also, prevented getViaTmp from running the download action at all when the retrievalSecurityPolicy is going to prevent verifying and so storing it. Added annex.security.allow-unverified-downloads. A per-remote version would be nice to have too, but would need more plumbing, so KISS. (Bill the Cat reference not too over the top I hope. The point is to make this something the user reads the documentation for before using.) A few calls to verifyKeyContent and getViaTmp, that don't involve downloads from remotes, have RetrievalAllKeysSecure hard-coded. It was also hard-coded for P2P.Annex and Command.RecvKey, to match the values of the corresponding remotes. A few things use retrieveKeyFile/retrieveKeyFileCheap without going through getViaTmp. * Command.Fsck when downloading content from a remote to verify it. That content does not get into the annex, so this is ok. * Command.AddUrl when using a remote to download an url; this is new content being added, so this is ok. This commit was sponsored by Fernando Jimenez on Patreon.	2018-06-21 13:37:01 -04:00
Joey Hess	4a3f1a15c5	improve indent	2018-06-14 11:40:23 -04:00
Joey Hess	85f9360d9b	GIT_ANNEX_SHELL_APPENDONLY Makes it allow writes, but not deletion of annexed content. Note that securing pushes to the git repository is left up to the user. This commit was sponsored by Jack Hill on Patreon.	2018-05-25 13:17:56 -04:00
Joey Hess	891d6d97f7	squash -Wsimplifiable-class-constraints warnings I have not tested this with older ghc than 8.2.2.	2018-04-22 13:42:21 -04:00
Joey Hess	46d4316954	implement annex.retry et al Added annex.retry, annex.retry-delay, and per-remote versions to configure transfer retries. This commit was supported by the NSF-funded DataLad project.	2018-03-29 13:04:07 -04:00
Joey Hess	31e1adc005	deal with unlocked files P2P protocol version 1 adds VALID\|INVALID after DATA; INVALID means the file was detected to change content while it was being sent and so we may not have received the valid content of the file. Added new MustVerify constructor for Verification, which forces verification even when annex.verify=false etc. This is used when INVALID and in protocol version 0. As well as changing git-annex-shell p2psdio, this makes git-annex tor remotes always force verification, since they don't yet use protocol version 1. Previously, annex.verify=false could skip verification when using tor remotes, and let bad data into the repository. This commit was sponsored by Jack Hill on Patreon.	2018-03-13 14:27:14 -04:00
Joey Hess	e16b069331	use total size from DATA Noticed that getting a key whose size is not known resulted in a progress display that didn't include the percent complete. Fixed for P2P by making the size sent with DATA be used to update the meter's total size. In order for rateLimitMeterUpdate to also learn the total size, had to make it be passed the Meter, and some other reorg in Utility.Metered was also done so that --json-progress can construct a Meter to pass to rateLimitMeterUpdate. When the fallback rsync is done, the progress display still doesn't include the percent complete. Only way to fix that seems to be to let rsync display its output again, but that would conflict with git-annex's own progress meter, which is also being displayed. This commit was sponsored by Henrik Riomar on Patreon.	2018-03-12 21:46:58 -04:00
Joey Hess	28589c92d2	no protocol 1 yet	2018-03-12 15:42:15 -04:00
Joey Hess	596af7cbc4	move protocol version stuff to the Net free monad Needs to be in Net not Local, so that Net actions can take the protocol version into account. This commit was sponsored by an anonymous bitcoin donor.	2018-03-12 15:20:51 -04:00
Joey Hess	c81768d425	version the P2P protocol Unfortunately ReceiveMessage didn't handle unknown messages the way it was documented to; client sending VERSION would cause the server to return an ERROR and hang up. Fixed that, but old releases of git-annex use the P2P protocol for tor and will still have that behavior. So, version is not negotiated for Remote.P2P connections, only for Remote.Git connections, which will support VERSION from their first release. There will need to be a later flag day to change Remote.P2P; left a commented out line that is the only thing that will need to be changed then. Version 1 of the P2P protocol is not implemented yet, but updated the docs for the DATA change that will be allowed by that version. This commit was sponsored by Jeff Goeke-Smith on Patreon.	2018-03-12 14:36:35 -04:00
Joey Hess	c036a380b2	p2p ssh connection pools Much like Remote.P2P, there's a pool of connections to a peer, in order to support concurrent operations. Deals with old git-annex-ssh on the remote that does not support p2pstdio, by only trying once to use it, and remembering if it's not supported. Made p2pstdio send an AUTH_SUCCESS with its uuid, which serves the dual purposes of something to detect to see that the connection is working, and a way to verify that it's connected to the right uuid. (There's a redundant uuid check since the uuid field is sent by git_annex_shell, but I anticipate that being removed later when the legacy git-annex-shell stuff gets removed.) Not entirely happy with Remote.Git.runSsh's behavior when the proto action fails. Running the fallback will work ok, but what will we do when the fallbacks later get removed? It might be better to try to reconnect, in case the connection got closed. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-03-08 15:11:31 -04:00
Joey Hess	6ddfa9807b	implemented git-annex-shell p2pstdio Not yet used by git-annex, but this will allow faster transfers etc than using individual ssh connections and rsync. Not called git-annex-shell p2p, because git-annex p2p does something else and I don't want two subcommands with the same name between the two for sanity reasons. This commit was sponsored by Øyvind Andersen Holm.	2018-03-07 15:38:01 -04:00
Joey Hess	f4103744c3	make sure that lockContentShared is always paired with an inAnnex check lockContentShared had a screwy caveat that it didn't verify that the content was present when locking it, but in the most common case, eg indirect mode, it failed to lock when the content is not present. That led to a few callers forgetting to check inAnnex when using it, but the potential data loss was unlikely to be noticed because it only affected direct mode I think. Fix data loss bug when the local repository uses direct mode, and a locally modified file is dropped from a remote repsitory. The bug caused the modified file to be counted as a copy of the original file. (This is not a severe bug because in such a situation, dropping from the remote and then modifying the file is allowed and has the same end result.) And, in content locking over tor, when the remote repository is in direct mode, it neglected to check that the content was actually present when locking it. This could cause git annex drop to remove the only copy of a file when it thought the tor remote had a copy. So, make lockContentShared do its own inAnnex check. This could perhaps be optimised for direct mode, to avoid the check then, since locking the content necessarily verifies it exists there, but I have not bothered with that. This commit was sponsored by Jeff Goeke-Smith on Patreon.	2018-03-07 14:23:52 -04:00
Joey Hess	572a45ae00	add readonly mode to serve P2P protocol This will be used by git-annex-shell when configured to be readonly. This commit was sponsored by Nick Daly on Patreon.	2018-03-07 13:15:55 -04:00
Joey Hess	ba53f60801	refactor	2018-03-06 15:14:53 -04:00
Joey Hess	73704b22a9	comment typo	2018-03-06 14:58:24 -04:00
Joey Hess	c8e1e3dada	AssociatedFile newtype To prevent any further mistakes like `301aff34c4` This commit was sponsored by Francois Marier on Patreon.	2017-03-10 13:35:31 -04:00
Joey Hess	00be07070c	fix build on windows	2016-12-30 12:31:51 -04:00
Joey Hess	b219be5100	refactor	2016-12-30 12:31:17 -04:00
Joey Hess	8484c0c197	Always use filesystem encoding for all file and handle reads and writes. This is a big scary change. I have convinced myself it should be safe. I hope!	2016-12-24 14:46:31 -04:00
Joey Hess	e08691b393	enable-tor: When run as a regular user, test a connection back to the hidden service over tor. This way we know that after enable-tor, the tor hidden service is fully published and working, and so there should be no problems with it at pairing time. It has to start up its own temporary listener on the hidden service. It would be nice to have it start the remotedaemon running, so that extra step is not needed afterwards. But, there may already be a remotedaemon running, in communication with the assistant and we don't want to start another one. I thought about trying to HUP any running remotedaemon, but Windows does not make it easy to do that. In any case, having the user start the remotedaemon themselves lets them know it needs to be running to serve the hidden service. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2016-12-24 12:50:23 -04:00
Joey Hess	f3a4b9191c	refactor	2016-12-24 12:14:14 -04:00
Joey Hess	22252e8e4c	Revert "close" This reverts commit `3aaabc906b`. Commit contained incomplete work.	2016-12-24 12:07:15 -04:00
Joey Hess	3aaabc906b	close	2016-12-22 13:59:21 -04:00
Joey Hess	405fbd25e1	include tor-annex in hidden service directory names To make it easier to manage/delete them etc. Backwards compatablity is preserved for existing tor configs.	2016-12-21 14:39:32 -04:00
Joey Hess	38f9337e16	Revert "p2p --link now defaults to setting up a bi-directional link" This reverts commit `3037feb1bf`. On second thought, this was an overcomplication of what should be the lowest-level primitive. Let's build bi-directional links at the pairing level with eg magic wormhole.	2016-12-16 18:26:07 -04:00
Joey Hess	3037feb1bf	p2p --link now defaults to setting up a bi-directional link Both the local and remote git repositories get remotes added pointing at one-another. Makes pairing twice as easy! Security: The new LINK command in the protocol can be sent repeatedly, but only by a peer who has authenticated with us. So, it's entirely safe to add a link back to that peer, or to some other peer it knows about. Anything we receive over such a link, the peer could send us over the current connection. There is some risk of being flooded with LINKs, and adding too many remotes. To guard against that, there's a hard cap on the number of remotes that can be set up this way. This will only be a problem if setting up large p2p networks that have exceptional interconnectedness. A new, dedicated authtoken is created when sending LINK. This also allows, in theory, using a p2p network like tor, to learn about links on other networks, like telehash. This commit was sponsored by Bruno BEAUFILS on Patreon.	2016-12-16 16:38:06 -04:00
Joey Hess	16c6333f09	fix build with old ghc	2016-12-10 11:12:18 -04:00
Joey Hess	fa1b3a19f9	hang up connection after relaying Seems that git upload-pack outputs a "ONCDN " that is not read by the remote git receive-pack. This fixes: [2016-12-09 17:08:32.77159731] P2P > ERROR protocol parse error: "ONCDN "	2016-12-09 17:11:16 -04:00
Joey Hess	52ccd44812	avoid exposing auth tokens in debug	2016-12-09 16:55:48 -04:00
Joey Hess	217c3b0a21	debug dump P2P messages	2016-12-09 16:45:36 -04:00
Joey Hess	9dd510bf29	make tor hidden service work when directory watching is not available Avoid crashing when built w/o inotify..	2016-12-09 16:40:47 -04:00
Joey Hess	2c907fff51	remotedaemon: git change detection over tor hidden service	2016-12-09 16:02:43 -04:00
Joey Hess	f7687e0876	only start ref change watcher thread once per P2P connection This is more efficient. Note that the peer will get CHANGED messages for all refs changed since the connection opened, even if those changes happened before it sent NOTIFYCHANGE.	2016-12-09 15:08:54 -04:00
Joey Hess	e152c322f8	refactor ref change watching Added to change notification to P2P protocol. Switched to a TBChan so that a single long-running thread can be started, and serve perhaps intermittent requests for change notifications, without buffering all changes in memory. The P2P runner currently starts up a new thread each times it waits for a change, but that should allow later reusing a thread. Although each connection from a peer will still need a new watcher thread to run. The dependency on stm-chans is more or less free; some stuff in yesod uses it, so it was already indirectly pulled in when building with the webapp. This commit was sponsored by Francois Marier on Patreon.	2016-12-09 15:01:09 -04:00
Joey Hess	15be5c04a6	git-annex-shell, remotedaemon, git remote: Fix some memory DOS attacks. The attacker could just send a very lot of data, with no \n and it would all be buffered in memory until the kernel killed git-annex or perhaps OOM killed some other more valuable process. This is a low impact security hole, only affecting communication between local git-annex and git-annex-shell on the remote system. (With either able to be the attacker). Only those with the right ssh key can do it. And, there are probably lots of ways to construct git repositories that make git use a lot of memory in various ways, which would have similar impact as this attack. The fix in P2P/IO.hs would have been higher impact, if it had made it to a released version, since it would have allowed DOSing the tor hidden service without needing to authenticate. (The LockContent and NotifyChanges instances may not be really exploitable; since the line is read and ignored, it probably gets read lazily and does not end up staying buffered in memory.)	2016-12-09 13:34:32 -04:00
Joey Hess	bdf2a31424	typo	2016-12-09 12:54:12 -04:00

1 2

85 commits