git-annex/CmdLine/GitRemoteTorAnnex.hs

{- git-remote-tor-annex program
 -
 - Copyright 2016 Joey Hess <id@joeyh.name>
 -
 - Licensed under the GNU AGPL version 3 or higher.
 -}

module CmdLine.GitRemoteTorAnnex where

import Common
import qualified Annex
import qualified Git.CurrentRepo
import P2P.Protocol
import P2P.IO
import Utility.Tor
import Utility.AuthToken
import Annex.UUID
import P2P.Address
import P2P.Auth
import Annex.Action

run :: [String] -> IO ()
run (_remotename:address:[]) = forever $
	getLine >>= \case
		"capabilities" -> putStrLn "connect" >> ready
		"connect git-upload-pack" -> go UploadPack
		"connect git-receive-pack" -> go ReceivePack
		l -> error $ "git-remote-helpers protocol error at " ++ show l
  where
	(onionaddress, onionport)
		| '/' `elem` address = parseAddressPort $
			reverse $ takeWhile (/= '/') $ reverse address
		| otherwise = parseAddressPort address
	go service = do
		ready
		connectService onionaddress onionport service >>= \case
			Right exitcode -> exitWith exitcode
			Left e -> giveup $ describeProtoFailure e
	ready = do
		putStrLn ""
		hFlush stdout
		
run (_remotename:[]) = giveup "remote address not configured"
run _ = giveup "expected remote name and address parameters"

parseAddressPort :: String -> (OnionAddress, OnionPort)
parseAddressPort s = 
	let (a, sp) = separate (== ':') s
	in case readish sp of
		Nothing -> giveup "onion address must include port number"
		Just p -> (OnionAddress a, p)

connectService :: OnionAddress -> OnionPort -> Service -> IO (Either ProtoFailure ExitCode)
connectService address port service = do
	state <- Annex.new =<< Git.CurrentRepo.get
	Annex.eval state $ do
		authtoken <- fromMaybe nullAuthToken
			<$> loadP2PRemoteAuthToken (TorAnnex address port)
		myuuid <- getUUID
		g <- Annex.gitRepo
		conn <- liftIO $ connectPeer g (TorAnnex address port)
		runst <- liftIO $ mkRunState Client
		r <- liftIO $ runNetProto runst conn $ auth myuuid authtoken noop >>= \case
			Just _theiruuid -> connect service stdin stdout
			Nothing -> giveup $ "authentication failed, perhaps you need to set " ++ p2pAuthTokenEnv
		quiesce False
		return r
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service. Almost working, but there's a bug in the relaying. Also, made tor hidden service setup pick a random port, to make it harder to port scan. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2016-11-21 21:27:38 +00:00			`{- git-remote-tor-annex program`
			`-`
			`- Copyright 2016 Joey Hess <id@joeyh.name>`
			`-`
update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.) 2019-03-13 19:48:14 +00:00			`- Licensed under the GNU AGPL version 3 or higher.`
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service. Almost working, but there's a bug in the relaying. Also, made tor hidden service setup pick a random port, to make it harder to port scan. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2016-11-21 21:27:38 +00:00			`-}`

			`module CmdLine.GitRemoteTorAnnex where`

			`import Common`
			`import qualified Annex`
			`import qualified Git.CurrentRepo`
fix build 2016-11-24 20:36:16 +00:00			`import P2P.Protocol`
			`import P2P.IO`
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service. Almost working, but there's a bug in the relaying. Also, made tor hidden service setup pick a random port, to make it harder to port scan. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2016-11-21 21:27:38 +00:00			`import Utility.Tor`
unified AuthToken type between webapp and tor 2016-11-22 18:18:34 +00:00			`import Utility.AuthToken`
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service. Almost working, but there's a bug in the relaying. Also, made tor hidden service setup pick a random port, to make it harder to port scan. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2016-11-21 21:27:38 +00:00			`import Annex.UUID`
use P2P auth for git-remote-tor-annex This changes the environment variable name to the more generic GIT_ANNEX_P2P_AUTHTOKEN. This commit was sponsored by andrea rota. 2016-11-30 19:26:16 +00:00			`import P2P.Address`
			`import P2P.Auth`
avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project 2022-10-12 17:50:46 +00:00			`import Annex.Action`
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service. Almost working, but there's a bug in the relaying. Also, made tor hidden service setup pick a random port, to make it harder to port scan. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2016-11-21 21:27:38 +00:00
			`run :: [String] -> IO ()`
more lambda-case conversion 2017-12-05 19:00:50 +00:00			`run (_remotename:address:[]) = forever $`
			`getLine >>= \case`
pull/push over tor working now Still a couple bugs: * Closing the connection to the server leaves git upload-pack / receive-pack running, which could be used to DOS. * Sometimes the data is transferred, but it fails at the end, sometimes with: git-remote-tor-annex: <socket: 10>: commitBuffer: resource vanished (Broken pipe) Must be a race condition around shutdown. 2016-11-21 23:24:55 +00:00			`"capabilities" -> putStrLn "connect" >> ready`
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service. Almost working, but there's a bug in the relaying. Also, made tor hidden service setup pick a random port, to make it harder to port scan. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2016-11-21 21:27:38 +00:00			`"connect git-upload-pack" -> go UploadPack`
			`"connect git-receive-pack" -> go ReceivePack`
more lambda-case conversion 2017-12-05 19:00:50 +00:00			`l -> error $ "git-remote-helpers protocol error at " ++ show l`
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service. Almost working, but there's a bug in the relaying. Also, made tor hidden service setup pick a random port, to make it harder to port scan. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2016-11-21 21:27:38 +00:00			`where`
			`(onionaddress, onionport)`
			\| '/' `elem` address = parseAddressPort $
			`reverse $ takeWhile (/= '/') $ reverse address`
			`\| otherwise = parseAddressPort address`
			`go service = do`
pull/push over tor working now Still a couple bugs: * Closing the connection to the server leaves git upload-pack / receive-pack running, which could be used to DOS. * Sometimes the data is transferred, but it fails at the end, sometimes with: git-remote-tor-annex: <socket: 10>: commitBuffer: resource vanished (Broken pipe) Must be a race condition around shutdown. 2016-11-21 23:24:55 +00:00			`ready`
clean P2P protocol shutdown on EOF try 2 Same goal as b18fb1e343e9654207fbebacf686659c75d0fb4c but without breaking backwards compatability. Just return IO exceptions when running the P2P protocol, so that git-annex-shell can detect eof and avoid the ugly message. This commit was sponsored by Ethan Aubin. 2018-09-25 20:49:59 +00:00			`connectService onionaddress onionport service >>= \case`
			`Right exitcode -> exitWith exitcode`
			`Left e -> giveup $ describeProtoFailure e`
pull/push over tor working now Still a couple bugs: * Closing the connection to the server leaves git upload-pack / receive-pack running, which could be used to DOS. * Sometimes the data is transferred, but it fails at the end, sometimes with: git-remote-tor-annex: <socket: 10>: commitBuffer: resource vanished (Broken pipe) Must be a race condition around shutdown. 2016-11-21 23:24:55 +00:00			`ready = do`
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service. Almost working, but there's a bug in the relaying. Also, made tor hidden service setup pick a random port, to make it harder to port scan. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2016-11-21 21:27:38 +00:00			`putStrLn ""`
			`hFlush stdout`
pull/push over tor working now Still a couple bugs: * Closing the connection to the server leaves git upload-pack / receive-pack running, which could be used to DOS. * Sometimes the data is transferred, but it fails at the end, sometimes with: git-remote-tor-annex: <socket: 10>: commitBuffer: resource vanished (Broken pipe) Must be a race condition around shutdown. 2016-11-21 23:24:55 +00:00
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service. Almost working, but there's a bug in the relaying. Also, made tor hidden service setup pick a random port, to make it harder to port scan. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2016-11-21 21:27:38 +00:00			`run (_remotename:[]) = giveup "remote address not configured"`
			`run _ = giveup "expected remote name and address parameters"`

			`parseAddressPort :: String -> (OnionAddress, OnionPort)`
			`parseAddressPort s =`
			`let (a, sp) = separate (== ':') s`
			`in case readish sp of`
			`Nothing -> giveup "onion address must include port number"`
			`Just p -> (OnionAddress a, p)`

clean P2P protocol shutdown on EOF try 2 Same goal as b18fb1e343e9654207fbebacf686659c75d0fb4c but without breaking backwards compatability. Just return IO exceptions when running the P2P protocol, so that git-annex-shell can detect eof and avoid the ugly message. This commit was sponsored by Ethan Aubin. 2018-09-25 20:49:59 +00:00			`connectService :: OnionAddress -> OnionPort -> Service -> IO (Either ProtoFailure ExitCode)`
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service. Almost working, but there's a bug in the relaying. Also, made tor hidden service setup pick a random port, to make it harder to port scan. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2016-11-21 21:27:38 +00:00			`connectService address port service = do`
			`state <- Annex.new =<< Git.CurrentRepo.get`
			`Annex.eval state $ do`
			`authtoken <- fromMaybe nullAuthToken`
use P2P auth for git-remote-tor-annex This changes the environment variable name to the more generic GIT_ANNEX_P2P_AUTHTOKEN. This commit was sponsored by andrea rota. 2016-11-30 19:26:16 +00:00			`<$> loadP2PRemoteAuthToken (TorAnnex address port)`
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service. Almost working, but there's a bug in the relaying. Also, made tor hidden service setup pick a random port, to make it harder to port scan. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2016-11-21 21:27:38 +00:00			`myuuid <- getUUID`
			`g <- Annex.gitRepo`
refactor 2016-12-06 19:40:31 +00:00			`conn <- liftIO $ connectPeer g (TorAnnex address port)`
move protocol version stuff to the Net free monad Needs to be in Net not Local, so that Net actions can take the protocol version into account. This commit was sponsored by an anonymous bitcoin donor. 2018-03-12 19:19:40 +00:00			`runst <- liftIO $ mkRunState Client`
avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project 2022-10-12 17:50:46 +00:00			`r <- liftIO $ runNetProto runst conn $ auth myuuid authtoken noop >>= \case`
more lambda-case conversion 2017-12-05 19:00:50 +00:00			`Just _theiruuid -> connect service stdin stdout`
			`Nothing -> giveup $ "authentication failed, perhaps you need to set " ++ p2pAuthTokenEnv`
avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project 2022-10-12 17:50:46 +00:00			`quiesce False`
			`return r`