When proxying an upload to a special remote, verify the hash.

While usually uploading to a special remote does not verify the content,
the content in a repository is assumed to be valid, and there is no trust
boundary. But with a proxied special remote, there may be users who are
allowed to store objects, but are not really trusted.

Another way to look at this is it's the equivilant of git-annex-shell
checking the hash of received data, which it does (see StoreContent
implementation).
This commit is contained in:
Joey Hess 2024-07-29 13:39:28 -04:00
parent 960daf210b
commit fcc052bed8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 41 additions and 35 deletions

View file

@ -18,6 +18,7 @@ import P2P.IO
import Remote.Helper.Ssh (openP2PShellConnection', closeP2PShellConnection)
import Annex.Concurrent
import Annex.Tmp
import Annex.Verify
import Logs.Proxy
import Logs.Cluster
import Logs.UUID
@ -136,9 +137,9 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv = go
(Left <$> readTMVar oclosedv)
receivebytestring = atomically recv >>= \case
Right (Left b) -> return b
Right (Left b) -> return (Just b)
Right (Right _m) -> giveup "did not receive ByteString from P2P MVar"
Left () -> giveup "connection closed"
Left () -> return Nothing
where
recv =
(Right <$> takeTMVar ohdl)
@ -161,6 +162,8 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv = go
withTmpDirIn (fromRawFilePath othertmpdir) "proxy" $ \tmpdir ->
a (toRawFilePath tmpdir P.</> keyFile k)
-- Verify the content received from the client, to avoid bad content
-- being stored in the special remote.
proxyput af k = do
liftIO $ sendmessage $ PUT_FROM (Offset 0)
withproxytmpfile k $ \tmpfile -> do
@ -169,13 +172,18 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv = go
Left err -> liftIO $ propagateerror err
liftIO receivemessage >>= \case
Just (DATA (Len len)) -> do
iv <- startVerifyKeyContentIncrementally Remote.AlwaysVerify k
h <- liftIO $ openFile (fromRawFilePath tmpfile) WriteMode
liftIO $ receivetofile h len
gotall <- liftIO $ receivetofile iv h len
liftIO $ hClose h
verified <- if gotall
then fst <$> finishVerifyKeyContentIncrementally' True iv
else pure False
if protoversion > ProtocolVersion 1
then liftIO receivemessage >>= \case
Just (VALIDITY Valid) ->
store
Just (VALIDITY Valid)
| verified -> store
| otherwise -> liftIO $ sendmessage FAILURE
Just (VALIDITY Invalid) ->
liftIO $ sendmessage FAILURE
_ -> giveup "protocol error"
@ -183,25 +191,27 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv = go
_ -> giveup "protocol error"
liftIO $ removeWhenExistsWith removeFile (fromRawFilePath tmpfile)
receivetofile h n = do
b <- liftIO receivebytestring
liftIO $ atomically $
putTMVar owaitv ()
`orElse`
readTMVar oclosedv
n' <- storetofile h n (L.toChunks b)
-- Normally all the data is sent in a single
-- lazy bytestring. However, when the special
-- remote is a node in a cluster, a PUT is
-- streamed to it in multiple chunks.
if n' == 0
then return ()
else receivetofile h n'
receivetofile iv h n = liftIO receivebytestring >>= \case
Just b -> do
liftIO $ atomically $
putTMVar owaitv ()
`orElse`
readTMVar oclosedv
n' <- storetofile iv h n (L.toChunks b)
-- Normally all the data is sent in a single
-- lazy bytestring. However, when the special
-- remote is a node in a cluster, a PUT is
-- streamed to it in multiple chunks.
if n' == 0
then return True
else receivetofile iv h n'
Nothing -> return False
storetofile _ n [] = pure n
storetofile h n (b:bs) = do
storetofile _ _ n [] = pure n
storetofile iv h n (b:bs) = do
writeVerifyChunk iv h b
B.hPut h b
storetofile h (n - fromIntegral (B.length b)) bs
storetofile iv h (n - fromIntegral (B.length b)) bs
proxyget offset af k = withproxytmpfile k $ \tmpfile -> do
-- Don't verify the content from the remote,

View file

@ -19,6 +19,7 @@ module Annex.Verify (
isVerifiable,
startVerifyKeyContentIncrementally,
finishVerifyKeyContentIncrementally,
finishVerifyKeyContentIncrementally',
verifyKeyContentIncrementally,
IncrementalVerifier(..),
writeVerifyChunk,
@ -199,13 +200,17 @@ startVerifyKeyContentIncrementally verifyconfig k =
)
finishVerifyKeyContentIncrementally :: Maybe IncrementalVerifier -> Annex (Bool, Verification)
finishVerifyKeyContentIncrementally Nothing =
finishVerifyKeyContentIncrementally = finishVerifyKeyContentIncrementally' False
finishVerifyKeyContentIncrementally' :: Bool -> Maybe IncrementalVerifier -> Annex (Bool, Verification)
finishVerifyKeyContentIncrementally' _ Nothing =
return (True, UnVerified)
finishVerifyKeyContentIncrementally (Just iv) =
finishVerifyKeyContentIncrementally' quiet (Just iv) =
liftIO (finalizeIncrementalVerifier iv) >>= \case
Just True -> return (True, Verified)
Just False -> do
warning "verification of content failed"
unless quiet $
warning "verification of content failed"
return (False, UnVerified)
-- Incremental verification was not able to be done.
Nothing -> return (True, UnVerified)

View file

@ -15,6 +15,7 @@ git-annex (10.20240731) UNRELEASED; urgency=medium
git-annex remotedaemon is killed while locking a key to prevent its
removal.
* When proxying a download from a special remote, avoid unncessary hashing.
* When proxying an upload to a special remote, verify the hash.
* Propagate --force to git-annex transferrer.
* Added a build flag for servant, enabling annex+http urls and
git-annex p2phttp.

View file

@ -40,16 +40,6 @@ Planned schedule of work:
When using ssh and not the http server, the node that had the incomplete
copy also doesn't get the file, altough no error is displayed.
* When proxying a PUT to a special remote, no verification of the received
content is done, it's just written to a file and that is sent to the
special remote. This violates a usual invariant that any data being
received into a repository gets verified in passing. Although on the
other hand, when sending data to a special remote normally, there is also
no verification. On the third hand, a p2p http proxy (or for that matter
a ssh server) may have users who are allowed to store objects, but are
not really trusted, and if they can upload garbage without verification,
that could be bad.
## items deferred until later for p2p protocol over http
* `git-annex p2phttp` should support serving several repositories at the same