git-annex/CmdLine/GitRemoteTorAnnex.hs

68 lines
2.1 KiB
Haskell
Raw Normal View History

{- git-remote-tor-annex program
-
- Copyright 2016 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
module CmdLine.GitRemoteTorAnnex where
import Common
import qualified Annex
import qualified Git.CurrentRepo
2016-11-24 20:36:16 +00:00
import P2P.Protocol
import P2P.IO
import Utility.Tor
import Utility.AuthToken
import Annex.UUID
import P2P.Address
import P2P.Auth
avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project
2022-10-12 17:50:46 +00:00
import Annex.Action
run :: [String] -> IO ()
2017-12-05 19:00:50 +00:00
run (_remotename:address:[]) = forever $
getLine >>= \case
"capabilities" -> putStrLn "connect" >> ready
"connect git-upload-pack" -> go UploadPack
"connect git-receive-pack" -> go ReceivePack
l -> giveup $ "gitremote-helpers protocol error at " ++ show l
where
(onionaddress, onionport)
| '/' `elem` address = parseAddressPort $
reverse $ takeWhile (/= '/') $ reverse address
| otherwise = parseAddressPort address
go service = do
ready
connectService onionaddress onionport service >>= \case
Right exitcode -> exitWith exitcode
Left e -> giveup $ describeProtoFailure e
ready = do
putStrLn ""
hFlush stdout
run (_remotename:[]) = giveup "remote address not configured"
run _ = giveup "expected remote name and address parameters"
parseAddressPort :: String -> (OnionAddress, OnionPort)
parseAddressPort s =
let (a, sp) = separate (== ':') s
in case readish sp of
Nothing -> giveup "onion address must include port number"
Just p -> (OnionAddress a, p)
connectService :: OnionAddress -> OnionPort -> Service -> IO (Either ProtoFailure ExitCode)
connectService address port service = do
state <- Annex.new =<< Git.CurrentRepo.get
Annex.eval state $ do
authtoken <- fromMaybe nullAuthToken
<$> loadP2PRemoteAuthToken (TorAnnex address port)
myuuid <- getUUID
g <- Annex.gitRepo
conn <- liftIO $ connectPeer (Just g) (TorAnnex address port)
runst <- liftIO $ mkRunState Client
avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project
2022-10-12 17:50:46 +00:00
r <- liftIO $ runNetProto runst conn $ auth myuuid authtoken noop >>= \case
2017-12-05 19:00:50 +00:00
Just _theiruuid -> connect service stdin stdout
Nothing -> giveup $ "authentication failed, perhaps you need to set " ++ p2pAuthTokenEnv
avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project
2022-10-12 17:50:46 +00:00
quiesce False
return r