When proxying an upload to a special remote, verify the hash.

While usually uploading to a special remote does not verify the content,
the content in a repository is assumed to be valid, and there is no trust
boundary. But with a proxied special remote, there may be users who are
allowed to store objects, but are not really trusted.

Another way to look at this is it's the equivilant of git-annex-shell
checking the hash of received data, which it does (see StoreContent
implementation).
This commit is contained in:
Joey Hess 2024-07-29 13:39:28 -04:00
parent 960daf210b
commit fcc052bed8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 41 additions and 35 deletions

View file

@ -18,6 +18,7 @@ import P2P.IO
import Remote.Helper.Ssh (openP2PShellConnection', closeP2PShellConnection) import Remote.Helper.Ssh (openP2PShellConnection', closeP2PShellConnection)
import Annex.Concurrent import Annex.Concurrent
import Annex.Tmp import Annex.Tmp
import Annex.Verify
import Logs.Proxy import Logs.Proxy
import Logs.Cluster import Logs.Cluster
import Logs.UUID import Logs.UUID
@ -136,9 +137,9 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv = go
(Left <$> readTMVar oclosedv) (Left <$> readTMVar oclosedv)
receivebytestring = atomically recv >>= \case receivebytestring = atomically recv >>= \case
Right (Left b) -> return b Right (Left b) -> return (Just b)
Right (Right _m) -> giveup "did not receive ByteString from P2P MVar" Right (Right _m) -> giveup "did not receive ByteString from P2P MVar"
Left () -> giveup "connection closed" Left () -> return Nothing
where where
recv = recv =
(Right <$> takeTMVar ohdl) (Right <$> takeTMVar ohdl)
@ -161,6 +162,8 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv = go
withTmpDirIn (fromRawFilePath othertmpdir) "proxy" $ \tmpdir -> withTmpDirIn (fromRawFilePath othertmpdir) "proxy" $ \tmpdir ->
a (toRawFilePath tmpdir P.</> keyFile k) a (toRawFilePath tmpdir P.</> keyFile k)
-- Verify the content received from the client, to avoid bad content
-- being stored in the special remote.
proxyput af k = do proxyput af k = do
liftIO $ sendmessage $ PUT_FROM (Offset 0) liftIO $ sendmessage $ PUT_FROM (Offset 0)
withproxytmpfile k $ \tmpfile -> do withproxytmpfile k $ \tmpfile -> do
@ -169,13 +172,18 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv = go
Left err -> liftIO $ propagateerror err Left err -> liftIO $ propagateerror err
liftIO receivemessage >>= \case liftIO receivemessage >>= \case
Just (DATA (Len len)) -> do Just (DATA (Len len)) -> do
iv <- startVerifyKeyContentIncrementally Remote.AlwaysVerify k
h <- liftIO $ openFile (fromRawFilePath tmpfile) WriteMode h <- liftIO $ openFile (fromRawFilePath tmpfile) WriteMode
liftIO $ receivetofile h len gotall <- liftIO $ receivetofile iv h len
liftIO $ hClose h liftIO $ hClose h
verified <- if gotall
then fst <$> finishVerifyKeyContentIncrementally' True iv
else pure False
if protoversion > ProtocolVersion 1 if protoversion > ProtocolVersion 1
then liftIO receivemessage >>= \case then liftIO receivemessage >>= \case
Just (VALIDITY Valid) -> Just (VALIDITY Valid)
store | verified -> store
| otherwise -> liftIO $ sendmessage FAILURE
Just (VALIDITY Invalid) -> Just (VALIDITY Invalid) ->
liftIO $ sendmessage FAILURE liftIO $ sendmessage FAILURE
_ -> giveup "protocol error" _ -> giveup "protocol error"
@ -183,25 +191,27 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv = go
_ -> giveup "protocol error" _ -> giveup "protocol error"
liftIO $ removeWhenExistsWith removeFile (fromRawFilePath tmpfile) liftIO $ removeWhenExistsWith removeFile (fromRawFilePath tmpfile)
receivetofile h n = do receivetofile iv h n = liftIO receivebytestring >>= \case
b <- liftIO receivebytestring Just b -> do
liftIO $ atomically $ liftIO $ atomically $
putTMVar owaitv () putTMVar owaitv ()
`orElse` `orElse`
readTMVar oclosedv readTMVar oclosedv
n' <- storetofile h n (L.toChunks b) n' <- storetofile iv h n (L.toChunks b)
-- Normally all the data is sent in a single -- Normally all the data is sent in a single
-- lazy bytestring. However, when the special -- lazy bytestring. However, when the special
-- remote is a node in a cluster, a PUT is -- remote is a node in a cluster, a PUT is
-- streamed to it in multiple chunks. -- streamed to it in multiple chunks.
if n' == 0 if n' == 0
then return () then return True
else receivetofile h n' else receivetofile iv h n'
Nothing -> return False
storetofile _ n [] = pure n storetofile _ _ n [] = pure n
storetofile h n (b:bs) = do storetofile iv h n (b:bs) = do
writeVerifyChunk iv h b
B.hPut h b B.hPut h b
storetofile h (n - fromIntegral (B.length b)) bs storetofile iv h (n - fromIntegral (B.length b)) bs
proxyget offset af k = withproxytmpfile k $ \tmpfile -> do proxyget offset af k = withproxytmpfile k $ \tmpfile -> do
-- Don't verify the content from the remote, -- Don't verify the content from the remote,

View file

@ -19,6 +19,7 @@ module Annex.Verify (
isVerifiable, isVerifiable,
startVerifyKeyContentIncrementally, startVerifyKeyContentIncrementally,
finishVerifyKeyContentIncrementally, finishVerifyKeyContentIncrementally,
finishVerifyKeyContentIncrementally',
verifyKeyContentIncrementally, verifyKeyContentIncrementally,
IncrementalVerifier(..), IncrementalVerifier(..),
writeVerifyChunk, writeVerifyChunk,
@ -199,13 +200,17 @@ startVerifyKeyContentIncrementally verifyconfig k =
) )
finishVerifyKeyContentIncrementally :: Maybe IncrementalVerifier -> Annex (Bool, Verification) finishVerifyKeyContentIncrementally :: Maybe IncrementalVerifier -> Annex (Bool, Verification)
finishVerifyKeyContentIncrementally Nothing = finishVerifyKeyContentIncrementally = finishVerifyKeyContentIncrementally' False
finishVerifyKeyContentIncrementally' :: Bool -> Maybe IncrementalVerifier -> Annex (Bool, Verification)
finishVerifyKeyContentIncrementally' _ Nothing =
return (True, UnVerified) return (True, UnVerified)
finishVerifyKeyContentIncrementally (Just iv) = finishVerifyKeyContentIncrementally' quiet (Just iv) =
liftIO (finalizeIncrementalVerifier iv) >>= \case liftIO (finalizeIncrementalVerifier iv) >>= \case
Just True -> return (True, Verified) Just True -> return (True, Verified)
Just False -> do Just False -> do
warning "verification of content failed" unless quiet $
warning "verification of content failed"
return (False, UnVerified) return (False, UnVerified)
-- Incremental verification was not able to be done. -- Incremental verification was not able to be done.
Nothing -> return (True, UnVerified) Nothing -> return (True, UnVerified)

View file

@ -15,6 +15,7 @@ git-annex (10.20240731) UNRELEASED; urgency=medium
git-annex remotedaemon is killed while locking a key to prevent its git-annex remotedaemon is killed while locking a key to prevent its
removal. removal.
* When proxying a download from a special remote, avoid unncessary hashing. * When proxying a download from a special remote, avoid unncessary hashing.
* When proxying an upload to a special remote, verify the hash.
* Propagate --force to git-annex transferrer. * Propagate --force to git-annex transferrer.
* Added a build flag for servant, enabling annex+http urls and * Added a build flag for servant, enabling annex+http urls and
git-annex p2phttp. git-annex p2phttp.

View file

@ -40,16 +40,6 @@ Planned schedule of work:
When using ssh and not the http server, the node that had the incomplete When using ssh and not the http server, the node that had the incomplete
copy also doesn't get the file, altough no error is displayed. copy also doesn't get the file, altough no error is displayed.
* When proxying a PUT to a special remote, no verification of the received
content is done, it's just written to a file and that is sent to the
special remote. This violates a usual invariant that any data being
received into a repository gets verified in passing. Although on the
other hand, when sending data to a special remote normally, there is also
no verification. On the third hand, a p2p http proxy (or for that matter
a ssh server) may have users who are allowed to store objects, but are
not really trusted, and if they can upload garbage without verification,
that could be bad.
## items deferred until later for p2p protocol over http ## items deferred until later for p2p protocol over http
* `git-annex p2phttp` should support serving several repositories at the same * `git-annex p2phttp` should support serving several repositories at the same