2016-11-21 21:27:38 +00:00
|
|
|
{- git-remote-tor-annex program
|
|
|
|
-
|
|
|
|
- Copyright 2016 Joey Hess <id@joeyh.name>
|
|
|
|
-
|
2019-03-13 19:48:14 +00:00
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
2016-11-21 21:27:38 +00:00
|
|
|
-}
|
|
|
|
|
|
|
|
module CmdLine.GitRemoteTorAnnex where
|
|
|
|
|
|
|
|
import Common
|
|
|
|
import qualified Annex
|
|
|
|
import qualified Git.CurrentRepo
|
2016-11-24 20:36:16 +00:00
|
|
|
import P2P.Protocol
|
|
|
|
import P2P.IO
|
2016-11-21 21:27:38 +00:00
|
|
|
import Utility.Tor
|
2016-11-22 18:18:34 +00:00
|
|
|
import Utility.AuthToken
|
2016-11-21 21:27:38 +00:00
|
|
|
import Annex.UUID
|
2016-11-30 19:26:16 +00:00
|
|
|
import P2P.Address
|
|
|
|
import P2P.Auth
|
avoid flushing keys db queue after each Annex action
The flush was only done Annex.run' to make sure that the queue was flushed
before git-annex exits. But, doing it there means that as soon as one
change gets queued, it gets flushed soon after, which contributes to
excessive writes to the database, slowing git-annex down.
(This does not yet speed git-annex up, but it is a stepping stone to
doing so.)
Database queues do not autoflush when garbage collected, so have to
be flushed explicitly. I don't think it's possible to make them
autoflush (except perhaps if git-annex sqitched to using ResourceT..).
The comment in Database.Keys.closeDb used to be accurate, since the
automatic flushing did mean that all writes reached the database even
when closeDb was not called. But now, closeDb or flushDb needs to be
called before stopping using an Annex state. So, removed that comment.
In Remote.Git, change to using quiesce everywhere that it used to use
stopCoProcesses. This means that uses on onLocal in there are just as
slow as before. I considered only calling closeDb on the local git remotes
when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses
in each onLocal is so as not to leave git processes running that have files
open on the remote repo, when it's on removable media. So, it seemed to make
sense to also closeDb after each one, since sqlite may also keep files
open. Although that has not seemed to cause problems with removable
media so far. It was also just easier to quiesce in each onLocal than
once at the end. This does likely leave performance on the floor, so
could be revisited.
In Annex.Content.saveState, there was no reason to close the db,
flushing it is enough.
The rest of the changes are from auditing for Annex.new, and making
sure that quiesce is called, after any action that might possibly need
it.
After that audit, I'm pretty sure that the change to Annex.run' is
safe. The only concern might be that this does let more changes get
queued for write to the db, and if git-annex is interrupted, those will be
lost. But interrupting git-annex can obviously already prevent it from
writing the most recent change to the db, so it must recover from such
lost data... right?
Sponsored-by: Dartmouth College's Datalad project
2022-10-12 17:50:46 +00:00
|
|
|
import Annex.Action
|
2016-11-21 21:27:38 +00:00
|
|
|
|
|
|
|
run :: [String] -> IO ()
|
2017-12-05 19:00:50 +00:00
|
|
|
run (_remotename:address:[]) = forever $
|
|
|
|
getLine >>= \case
|
2016-11-21 23:24:55 +00:00
|
|
|
"capabilities" -> putStrLn "connect" >> ready
|
2016-11-21 21:27:38 +00:00
|
|
|
"connect git-upload-pack" -> go UploadPack
|
|
|
|
"connect git-receive-pack" -> go ReceivePack
|
2024-05-06 16:07:05 +00:00
|
|
|
l -> giveup $ "gitremote-helpers protocol error at " ++ show l
|
2016-11-21 21:27:38 +00:00
|
|
|
where
|
|
|
|
(onionaddress, onionport)
|
|
|
|
| '/' `elem` address = parseAddressPort $
|
|
|
|
reverse $ takeWhile (/= '/') $ reverse address
|
|
|
|
| otherwise = parseAddressPort address
|
|
|
|
go service = do
|
2016-11-21 23:24:55 +00:00
|
|
|
ready
|
2018-09-25 20:49:59 +00:00
|
|
|
connectService onionaddress onionport service >>= \case
|
|
|
|
Right exitcode -> exitWith exitcode
|
|
|
|
Left e -> giveup $ describeProtoFailure e
|
2016-11-21 23:24:55 +00:00
|
|
|
ready = do
|
2016-11-21 21:27:38 +00:00
|
|
|
putStrLn ""
|
|
|
|
hFlush stdout
|
2016-11-21 23:24:55 +00:00
|
|
|
|
2016-11-21 21:27:38 +00:00
|
|
|
run (_remotename:[]) = giveup "remote address not configured"
|
|
|
|
run _ = giveup "expected remote name and address parameters"
|
|
|
|
|
|
|
|
parseAddressPort :: String -> (OnionAddress, OnionPort)
|
|
|
|
parseAddressPort s =
|
|
|
|
let (a, sp) = separate (== ':') s
|
|
|
|
in case readish sp of
|
|
|
|
Nothing -> giveup "onion address must include port number"
|
|
|
|
Just p -> (OnionAddress a, p)
|
|
|
|
|
2018-09-25 20:49:59 +00:00
|
|
|
connectService :: OnionAddress -> OnionPort -> Service -> IO (Either ProtoFailure ExitCode)
|
2016-11-21 21:27:38 +00:00
|
|
|
connectService address port service = do
|
|
|
|
state <- Annex.new =<< Git.CurrentRepo.get
|
|
|
|
Annex.eval state $ do
|
|
|
|
authtoken <- fromMaybe nullAuthToken
|
2016-11-30 19:26:16 +00:00
|
|
|
<$> loadP2PRemoteAuthToken (TorAnnex address port)
|
2016-11-21 21:27:38 +00:00
|
|
|
myuuid <- getUUID
|
|
|
|
g <- Annex.gitRepo
|
git-annex-shell: block relay requests
connRepo is only used when relaying git upload-pack and receive-pack.
That's only supposed to be used when git-annex-remotedaemon is serving
git-remote-tor-annex connections over tor. But, it was always set, and
so could be used in other places possibly.
Fixed by making connRepo optional in the P2P protocol interface.
In Command.EnableTor, it's not needed, because it only speaks the
protocol in order to check that it's able to connect back to itself via
the hidden service. So changed that to pass Nothing rather than the git
repo.
In Remote.Helper.Ssh, it's connecting to git-annex-shell p2pstdio,
so is making the requests, so will never need connRepo.
In git-annex-shell p2pstdio, it was accepting git upload-pack and
receive-pack requests over the P2P protocol, even though nothing sent
them. This is arguably a security hole, particularly if the user has
set environment variables like GIT_ANNEX_SHELL_LIMITED to prevent
git push/pull via git-annex-shell.
2024-06-10 17:53:28 +00:00
|
|
|
conn <- liftIO $ connectPeer (Just g) (TorAnnex address port)
|
2018-03-12 19:19:40 +00:00
|
|
|
runst <- liftIO $ mkRunState Client
|
avoid flushing keys db queue after each Annex action
The flush was only done Annex.run' to make sure that the queue was flushed
before git-annex exits. But, doing it there means that as soon as one
change gets queued, it gets flushed soon after, which contributes to
excessive writes to the database, slowing git-annex down.
(This does not yet speed git-annex up, but it is a stepping stone to
doing so.)
Database queues do not autoflush when garbage collected, so have to
be flushed explicitly. I don't think it's possible to make them
autoflush (except perhaps if git-annex sqitched to using ResourceT..).
The comment in Database.Keys.closeDb used to be accurate, since the
automatic flushing did mean that all writes reached the database even
when closeDb was not called. But now, closeDb or flushDb needs to be
called before stopping using an Annex state. So, removed that comment.
In Remote.Git, change to using quiesce everywhere that it used to use
stopCoProcesses. This means that uses on onLocal in there are just as
slow as before. I considered only calling closeDb on the local git remotes
when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses
in each onLocal is so as not to leave git processes running that have files
open on the remote repo, when it's on removable media. So, it seemed to make
sense to also closeDb after each one, since sqlite may also keep files
open. Although that has not seemed to cause problems with removable
media so far. It was also just easier to quiesce in each onLocal than
once at the end. This does likely leave performance on the floor, so
could be revisited.
In Annex.Content.saveState, there was no reason to close the db,
flushing it is enough.
The rest of the changes are from auditing for Annex.new, and making
sure that quiesce is called, after any action that might possibly need
it.
After that audit, I'm pretty sure that the change to Annex.run' is
safe. The only concern might be that this does let more changes get
queued for write to the db, and if git-annex is interrupted, those will be
lost. But interrupting git-annex can obviously already prevent it from
writing the most recent change to the db, so it must recover from such
lost data... right?
Sponsored-by: Dartmouth College's Datalad project
2022-10-12 17:50:46 +00:00
|
|
|
r <- liftIO $ runNetProto runst conn $ auth myuuid authtoken noop >>= \case
|
2017-12-05 19:00:50 +00:00
|
|
|
Just _theiruuid -> connect service stdin stdout
|
|
|
|
Nothing -> giveup $ "authentication failed, perhaps you need to set " ++ p2pAuthTokenEnv
|
avoid flushing keys db queue after each Annex action
The flush was only done Annex.run' to make sure that the queue was flushed
before git-annex exits. But, doing it there means that as soon as one
change gets queued, it gets flushed soon after, which contributes to
excessive writes to the database, slowing git-annex down.
(This does not yet speed git-annex up, but it is a stepping stone to
doing so.)
Database queues do not autoflush when garbage collected, so have to
be flushed explicitly. I don't think it's possible to make them
autoflush (except perhaps if git-annex sqitched to using ResourceT..).
The comment in Database.Keys.closeDb used to be accurate, since the
automatic flushing did mean that all writes reached the database even
when closeDb was not called. But now, closeDb or flushDb needs to be
called before stopping using an Annex state. So, removed that comment.
In Remote.Git, change to using quiesce everywhere that it used to use
stopCoProcesses. This means that uses on onLocal in there are just as
slow as before. I considered only calling closeDb on the local git remotes
when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses
in each onLocal is so as not to leave git processes running that have files
open on the remote repo, when it's on removable media. So, it seemed to make
sense to also closeDb after each one, since sqlite may also keep files
open. Although that has not seemed to cause problems with removable
media so far. It was also just easier to quiesce in each onLocal than
once at the end. This does likely leave performance on the floor, so
could be revisited.
In Annex.Content.saveState, there was no reason to close the db,
flushing it is enough.
The rest of the changes are from auditing for Annex.new, and making
sure that quiesce is called, after any action that might possibly need
it.
After that audit, I'm pretty sure that the change to Annex.run' is
safe. The only concern might be that this does let more changes get
queued for write to the db, and if git-annex is interrupted, those will be
lost. But interrupting git-annex can obviously already prevent it from
writing the most recent change to the db, so it must recover from such
lost data... right?
Sponsored-by: Dartmouth College's Datalad project
2022-10-12 17:50:46 +00:00
|
|
|
quiesce False
|
|
|
|
return r
|