2016-12-06 16:19:47 +00:00
|
|
|
{- git remotes using the git-annex P2P protocol
|
|
|
|
-
|
2018-03-08 20:11:00 +00:00
|
|
|
- Copyright 2016-2018 Joey Hess <id@joeyh.name>
|
2016-12-06 16:19:47 +00:00
|
|
|
-
|
2019-03-13 19:48:14 +00:00
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
2016-12-06 16:19:47 +00:00
|
|
|
-}
|
|
|
|
|
filter out control characters in warning messages
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.
Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.
When json is being output, no quoting is done, since json gets its own
quoting.
This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.
Sponsored-by: Brett Eisenberg on Patreon
2023-04-10 18:47:32 +00:00
|
|
|
{-# LANGUAGE OverloadedStrings #-}
|
|
|
|
|
2016-12-06 16:19:47 +00:00
|
|
|
module Remote.P2P (
|
|
|
|
remote,
|
|
|
|
chainGen
|
|
|
|
) where
|
|
|
|
|
|
|
|
import Annex.Common
|
2016-12-06 19:08:00 +00:00
|
|
|
import qualified Annex
|
|
|
|
import qualified P2P.Protocol as P2P
|
2016-12-06 16:19:47 +00:00
|
|
|
import P2P.Address
|
2016-12-06 19:08:00 +00:00
|
|
|
import P2P.Annex
|
2016-12-06 19:40:31 +00:00
|
|
|
import P2P.IO
|
2016-12-06 19:49:39 +00:00
|
|
|
import P2P.Auth
|
2016-12-06 16:19:47 +00:00
|
|
|
import Types.Remote
|
|
|
|
import qualified Git
|
2016-12-06 19:49:39 +00:00
|
|
|
import Annex.UUID
|
2016-12-06 16:19:47 +00:00
|
|
|
import Config
|
|
|
|
import Config.Cost
|
|
|
|
import Remote.Helper.Git
|
2019-02-20 19:55:01 +00:00
|
|
|
import Remote.Helper.ExportImport
|
2018-03-08 20:11:00 +00:00
|
|
|
import Remote.Helper.P2P
|
2016-12-06 19:49:39 +00:00
|
|
|
import Utility.AuthToken
|
fix encryption of content to gcrypt and git-lfs
Fix serious regression in gcrypt and encrypted git-lfs remotes.
Since version 7.20200202.7, git-annex incorrectly stored content
on those remotes without encrypting it.
Problem was, Remote.Git enumerates all git remotes, including git-lfs
and gcrypt. It then dispatches to those. So, Remote.List used the
RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt,
and that parser does not know about encryption fields, so did not
include them in the ParsedRemoteConfig. (Also didn't include other
fields specific to those remotes, perhaps chunking etc also didn't
get through.)
To fix, had to move RemoteConfig parsing down into the generate methods
of each remote, rather than doing it in Remote.List.
And a consequence of that was that ParsedRemoteConfig had to change to
include the RemoteConfig that got parsed, so that testremote can
generate a new remote based on an existing remote.
(I would have rather fixed this just inside Remote.Git, but that was not
practical, at least not w/o re-doing work that Remote.List already did.
Big ugly mostly mechanical patch seemed preferable to making git-annex
slower.)
2020-02-26 21:20:56 +00:00
|
|
|
import Annex.SpecialRemote.Config
|
2016-12-06 19:08:00 +00:00
|
|
|
|
|
|
|
import Control.Concurrent.STM
|
2016-12-06 16:19:47 +00:00
|
|
|
|
|
|
|
remote :: RemoteType
|
2017-09-07 17:45:31 +00:00
|
|
|
remote = RemoteType
|
|
|
|
{ typename = "p2p"
|
2016-12-06 16:19:47 +00:00
|
|
|
-- Remote.Git takes care of enumerating P2P remotes,
|
|
|
|
-- and will call chainGen on them.
|
2017-09-07 17:45:31 +00:00
|
|
|
, enumerate = const (return [])
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
, generate = \_ _ _ _ _ -> return Nothing
|
2020-01-14 17:18:15 +00:00
|
|
|
, configParser = mkRemoteConfigParser []
|
2023-04-10 17:38:14 +00:00
|
|
|
, setup = giveup "P2P remotes are set up using git-annex p2p"
|
2017-09-07 17:45:31 +00:00
|
|
|
, exportSupported = exportUnsupported
|
2019-02-20 19:55:01 +00:00
|
|
|
, importSupported = importUnsupported
|
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.
So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that the tree is recorded in the export log, and gets grafted
into the git-annex branch.
importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.
It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.
Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.
(Untested and unused as of yet.)
This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 18:52:57 +00:00
|
|
|
, thirdPartyPopulated = False
|
2017-09-07 17:45:31 +00:00
|
|
|
}
|
2016-12-06 16:19:47 +00:00
|
|
|
|
fix encryption of content to gcrypt and git-lfs
Fix serious regression in gcrypt and encrypted git-lfs remotes.
Since version 7.20200202.7, git-annex incorrectly stored content
on those remotes without encrypting it.
Problem was, Remote.Git enumerates all git remotes, including git-lfs
and gcrypt. It then dispatches to those. So, Remote.List used the
RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt,
and that parser does not know about encryption fields, so did not
include them in the ParsedRemoteConfig. (Also didn't include other
fields specific to those remotes, perhaps chunking etc also didn't
get through.)
To fix, had to move RemoteConfig parsing down into the generate methods
of each remote, rather than doing it in Remote.List.
And a consequence of that was that ParsedRemoteConfig had to change to
include the RemoteConfig that got parsed, so that testremote can
generate a new remote based on an existing remote.
(I would have rather fixed this just inside Remote.Git, but that was not
practical, at least not w/o re-doing work that Remote.List already did.
Big ugly mostly mechanical patch seemed preferable to making git-annex
slower.)
2020-02-26 21:20:56 +00:00
|
|
|
chainGen :: P2PAddress -> Git.Repo -> UUID -> RemoteConfig -> RemoteGitConfig -> RemoteStateHandle -> Annex (Maybe Remote)
|
|
|
|
chainGen addr r u rc gc rs = do
|
|
|
|
c <- parsedRemoteConfig remote rc
|
2016-12-06 19:08:00 +00:00
|
|
|
connpool <- mkConnectionPool
|
2023-01-12 17:42:28 +00:00
|
|
|
cst <- remoteCost gc c veryExpensiveRemoteCost
|
2018-03-08 20:11:00 +00:00
|
|
|
let protorunner = runProto u addr connpool
|
|
|
|
let withconn = withConnection u addr connpool
|
2016-12-06 16:19:47 +00:00
|
|
|
let this = Remote
|
|
|
|
{ uuid = u
|
|
|
|
, cost = cst
|
|
|
|
, name = Git.repoDescribe r
|
2024-06-23 12:21:11 +00:00
|
|
|
, storeKey = store u gc protorunner
|
remove git-annex-shell compat code
* Removed support for accessing git remotes that use versions of
git-annex older than 6.20180312.
* git-annex-shell: Removed several commands that were only needed to
support git-annex versions older than 6.20180312.
(lockcontent, recvkey, sendkey, transferinfo, commit)
The P2P protocol was added in that version, and used ever since, so
this code was only needed for interop with older versions.
"git-annex-shell commit" is used by newer git-annex versions, though
unnecessarily so, because the p2pstdio command makes a single commit at
shutdown. Luckily, it was run with stderr and stdout sent to /dev/null,
and non-zero exit status or other exceptions are caught and ignored. So,
that was able to be removed from git-annex-shell too.
git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by
gcrypt special remotes accessed over ssh, so those had to be kept.
It would probably be possible to convert that to using the P2P protocol,
but it would be another multi-year transition.
Some git-annex-shell fields were able to be removed. I hoped to remove
all of them, and the very concept of them, but unfortunately autoinit
is used by git-annex sync, and gcrypt uses remoteuuid.
The main win here is really in Remote.Git, removing piles of hairy fallback
code.
Sponsored-by: Luke Shumaker
2021-10-11 19:35:54 +00:00
|
|
|
, retrieveKeyFile = retrieve gc protorunner
|
2020-05-13 21:05:56 +00:00
|
|
|
, retrieveKeyFileCheap = Nothing
|
2018-06-21 15:35:27 +00:00
|
|
|
, retrievalSecurityPolicy = RetrievalAllKeysSecure
|
2024-06-23 13:28:18 +00:00
|
|
|
, removeKey = remove u protorunner
|
2018-03-08 20:11:00 +00:00
|
|
|
, lockContent = Just $ lock withconn runProtoConn u
|
|
|
|
, checkPresent = checkpresent protorunner
|
2016-12-06 16:19:47 +00:00
|
|
|
, checkPresentCheap = False
|
2017-09-01 17:02:07 +00:00
|
|
|
, exportActions = exportUnsupported
|
2019-02-20 19:55:01 +00:00
|
|
|
, importActions = importUnsupported
|
2016-12-06 16:19:47 +00:00
|
|
|
, whereisKey = Nothing
|
|
|
|
, remoteFsck = Nothing
|
|
|
|
, repairRepo = Nothing
|
|
|
|
, config = c
|
|
|
|
, localpath = Nothing
|
2018-06-04 18:31:55 +00:00
|
|
|
, getRepo = return r
|
removal of the rest of remoteGitConfig
In keyUrls, the GitConfig is used only by annexLocations
to support configured Differences. Since such configurations affect all
clones of a repository, the local repo's GitConfig must have the same
information as the remote's GitConfig would have. So, used getGitConfig
to get the local GitConfig, which is cached and so available cheaply.
That actually fixed a bug noone had ever noticed: keyUrls is
used for remotes accessed over http. The full git config of such a
remote is normally not available, so the remoteGitConfig that keyUrls
used would not have the necessary information in it.
In copyFromRemoteCheap', it uses gitAnnexLocation,
which does need the GitConfig of the remote repo itself in order to
check if it's crippled, supports symlinks, etc. So, made the
State include that GitConfig, cached. The use of gitAnnexLocation is
within a (not $ Git.repoIsUrl repo) guard, so it's local, and so
its git config will always be read and available.
(Note that gitAnnexLocation in turn calls annexLocations, so the
Differences config it uses in this case comes from the remote repo's
GitConfig and not from the local repo's GitConfig. As explained above
this is ok since they must have the same value.)
Not very happy with this mess of different GitConfigs not type-safe and
some read only sometimes etc. Very hairy. Think I got it this change
right. Test suite passes..
This commit was sponsored by Ethan Aubin.
2018-06-05 18:23:34 +00:00
|
|
|
, gitconfig = gc
|
2016-12-06 16:19:47 +00:00
|
|
|
, readonly = False
|
2018-08-30 15:12:18 +00:00
|
|
|
, appendonly = False
|
2020-12-28 19:08:53 +00:00
|
|
|
, untrustworthy = False
|
2023-08-16 18:31:31 +00:00
|
|
|
, availability = pure GloballyAvailable
|
2016-12-06 16:19:47 +00:00
|
|
|
, remotetype = remote
|
|
|
|
, mkUnavailable = return Nothing
|
|
|
|
, getInfo = gitRepoInfo this
|
|
|
|
, claimUrl = Nothing
|
|
|
|
, checkUrl = Nothing
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
, remoteStateHandle = rs
|
2016-12-06 16:19:47 +00:00
|
|
|
}
|
2016-12-06 19:08:00 +00:00
|
|
|
return (Just this)
|
|
|
|
|
2018-03-08 18:02:18 +00:00
|
|
|
-- | A connection to the peer, which can be closed.
|
2018-03-12 17:43:19 +00:00
|
|
|
type Connection = ClosableConnection (RunState, P2PConnection)
|
2016-12-06 19:08:00 +00:00
|
|
|
|
|
|
|
type ConnectionPool = TVar [Connection]
|
2016-12-06 16:19:47 +00:00
|
|
|
|
2016-12-06 19:08:00 +00:00
|
|
|
mkConnectionPool :: Annex ConnectionPool
|
|
|
|
mkConnectionPool = liftIO $ newTVarIO []
|
2016-12-06 16:19:47 +00:00
|
|
|
|
2016-12-06 19:08:00 +00:00
|
|
|
-- Runs the Proto action.
|
2016-12-07 16:39:28 +00:00
|
|
|
runProto :: UUID -> P2PAddress -> ConnectionPool -> P2P.Proto a -> Annex (Maybe a)
|
2018-03-08 20:11:00 +00:00
|
|
|
runProto u addr connpool a = withConnection u addr connpool (runProtoConn a)
|
2016-12-06 16:19:47 +00:00
|
|
|
|
2018-03-08 20:11:00 +00:00
|
|
|
runProtoConn :: P2P.Proto a -> Connection -> Annex (Connection, Maybe a)
|
|
|
|
runProtoConn _ ClosedConnection = return (ClosedConnection, Nothing)
|
2018-03-12 17:43:19 +00:00
|
|
|
runProtoConn a c@(OpenConnection (runst, conn)) = do
|
|
|
|
v <- runFullProto runst conn a
|
2016-12-06 19:08:00 +00:00
|
|
|
-- When runFullProto fails, the connection is no longer usable,
|
|
|
|
-- so close it.
|
2016-12-08 19:47:49 +00:00
|
|
|
case v of
|
|
|
|
Left e -> do
|
filter out control characters in warning messages
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.
Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.
When json is being output, no quoting is done, since json gets its own
quoting.
This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.
Sponsored-by: Brett Eisenberg on Patreon
2023-04-10 18:47:32 +00:00
|
|
|
warning $ UnquotedString $ "Lost connection to peer (" ++ describeProtoFailure e ++ ")"
|
2016-12-06 19:49:39 +00:00
|
|
|
liftIO $ closeConnection conn
|
2016-12-08 19:47:49 +00:00
|
|
|
return (ClosedConnection, Nothing)
|
2018-03-12 17:43:19 +00:00
|
|
|
Right r -> return (c, Just r)
|
2016-12-06 16:19:47 +00:00
|
|
|
|
2016-12-06 19:08:00 +00:00
|
|
|
-- Uses an open connection if one is available in the ConnectionPool;
|
|
|
|
-- otherwise opens a new connection.
|
|
|
|
--
|
|
|
|
-- Once the action is done, the connection is added back to the
|
|
|
|
-- ConnectionPool, unless it's no longer open.
|
2016-12-07 16:39:28 +00:00
|
|
|
withConnection :: UUID -> P2PAddress -> ConnectionPool -> (Connection -> Annex (Connection, a)) -> Annex a
|
|
|
|
withConnection u addr connpool a = bracketOnError get cache go
|
2016-12-06 19:08:00 +00:00
|
|
|
where
|
|
|
|
get = do
|
|
|
|
mc <- liftIO $ atomically $ do
|
|
|
|
l <- readTVar connpool
|
|
|
|
case l of
|
|
|
|
[] -> do
|
|
|
|
writeTVar connpool []
|
|
|
|
return Nothing
|
|
|
|
(c:cs) -> do
|
|
|
|
writeTVar connpool cs
|
|
|
|
return (Just c)
|
2016-12-07 16:39:28 +00:00
|
|
|
maybe (openConnection u addr) return mc
|
2016-12-06 19:08:00 +00:00
|
|
|
|
|
|
|
cache ClosedConnection = return ()
|
|
|
|
cache conn = liftIO $ atomically $ modifyTVar' connpool (conn:)
|
2016-12-06 16:19:47 +00:00
|
|
|
|
2016-12-06 19:08:00 +00:00
|
|
|
go conn = do
|
|
|
|
(conn', r) <- a conn
|
|
|
|
cache conn'
|
|
|
|
return r
|
2016-12-06 16:19:47 +00:00
|
|
|
|
2016-12-07 16:39:28 +00:00
|
|
|
openConnection :: UUID -> P2PAddress -> Annex Connection
|
|
|
|
openConnection u addr = do
|
2016-12-06 19:40:31 +00:00
|
|
|
g <- Annex.gitRepo
|
git-annex-shell: block relay requests
connRepo is only used when relaying git upload-pack and receive-pack.
That's only supposed to be used when git-annex-remotedaemon is serving
git-remote-tor-annex connections over tor. But, it was always set, and
so could be used in other places possibly.
Fixed by making connRepo optional in the P2P protocol interface.
In Command.EnableTor, it's not needed, because it only speaks the
protocol in order to check that it's able to connect back to itself via
the hidden service. So changed that to pass Nothing rather than the git
repo.
In Remote.Helper.Ssh, it's connecting to git-annex-shell p2pstdio,
so is making the requests, so will never need connRepo.
In git-annex-shell p2pstdio, it was accepting git upload-pack and
receive-pack requests over the P2P protocol, even though nothing sent
them. This is arguably a security hole, particularly if the user has
set environment variables like GIT_ANNEX_SHELL_LIMITED to prevent
git push/pull via git-annex-shell.
2024-06-10 17:53:28 +00:00
|
|
|
v <- liftIO $ tryNonAsync $ connectPeer (Just g) addr
|
2016-12-06 19:08:00 +00:00
|
|
|
case v of
|
2016-12-06 19:49:39 +00:00
|
|
|
Right conn -> do
|
|
|
|
myuuid <- getUUID
|
|
|
|
authtoken <- fromMaybe nullAuthToken
|
|
|
|
<$> loadP2PRemoteAuthToken addr
|
2018-03-12 17:43:19 +00:00
|
|
|
let proto = P2P.auth myuuid authtoken $
|
2021-10-11 19:53:49 +00:00
|
|
|
P2P.negotiateProtocolVersion P2P.maxProtocolVersion
|
2018-03-12 17:43:19 +00:00
|
|
|
runst <- liftIO $ mkRunState Client
|
2018-03-12 19:19:40 +00:00
|
|
|
res <- liftIO $ runNetProto runst conn proto
|
2016-12-06 19:49:39 +00:00
|
|
|
case res of
|
2016-12-08 19:47:49 +00:00
|
|
|
Right (Just theiruuid)
|
2018-03-12 17:43:19 +00:00
|
|
|
| u == theiruuid -> return (OpenConnection (runst, conn))
|
2016-12-07 16:39:28 +00:00
|
|
|
| otherwise -> do
|
|
|
|
liftIO $ closeConnection conn
|
|
|
|
warning "Remote peer uuid seems to have changed."
|
|
|
|
return ClosedConnection
|
2016-12-08 19:47:49 +00:00
|
|
|
Right Nothing -> do
|
2016-12-07 16:39:28 +00:00
|
|
|
warning "Unable to authenticate with peer."
|
2016-12-08 19:47:49 +00:00
|
|
|
liftIO $ closeConnection conn
|
|
|
|
return ClosedConnection
|
|
|
|
Left e -> do
|
filter out control characters in warning messages
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.
Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.
When json is being output, no quoting is done, since json gets its own
quoting.
This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.
Sponsored-by: Brett Eisenberg on Patreon
2023-04-10 18:47:32 +00:00
|
|
|
warning $ UnquotedString $ "Problem communicating with peer. (" ++ describeProtoFailure e ++ ")"
|
2016-12-08 19:47:49 +00:00
|
|
|
liftIO $ closeConnection conn
|
2016-12-06 19:49:39 +00:00
|
|
|
return ClosedConnection
|
2016-12-08 19:47:49 +00:00
|
|
|
Left e -> do
|
filter out control characters in warning messages
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.
Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.
When json is being output, no quoting is done, since json gets its own
quoting.
This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.
Sponsored-by: Brett Eisenberg on Patreon
2023-04-10 18:47:32 +00:00
|
|
|
warning $ UnquotedString $ "Unable to connect to peer. (" ++ show e ++ ")"
|
2016-12-07 16:39:28 +00:00
|
|
|
return ClosedConnection
|