deal with unlocked files

P2P protocol version 1 adds VALID|INVALID after DATA; INVALID means the
file was detected to change content while it was being sent and so we
may not have received the valid content of the file.

Added new MustVerify constructor for Verification, which forces
verification even when annex.verify=false etc. This is used when INVALID
and in protocol version 0.

As well as changing git-annex-shell p2psdio, this makes git-annex tor
remotes always force verification, since they don't yet use protocol
version 1. Previously, annex.verify=false could skip verification when
using tor remotes, and let bad data into the repository.

This commit was sponsored by Jack Hill on Patreon.
This commit is contained in:
Joey Hess 2018-03-13 14:18:30 -04:00
parent 9930b1f140
commit 31e1adc005
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
10 changed files with 141 additions and 78 deletions

View file

@ -16,7 +16,7 @@ module Annex.Content (
lockContentForRemoval,
ContentRemovalLock,
getViaTmp,
getViaTmp',
getViaTmpFromDisk,
checkDiskSpaceToGet,
prepTmp,
withTmp,
@ -295,13 +295,13 @@ lockContentUsing locker key a = do
- the key and moves the file into the annex as a key's content. -}
getViaTmp :: VerifyConfig -> Key -> (FilePath -> Annex (Bool, Verification)) -> Annex Bool
getViaTmp v key action = checkDiskSpaceToGet key False $
getViaTmp' v key action
getViaTmpFromDisk v key action
{- Like getViaTmp, but does not check that there is enough disk space
- for the incoming key. For use when the key content is already on disk
- and not being copied into place. -}
getViaTmp' :: VerifyConfig -> Key -> (FilePath -> Annex (Bool, Verification)) -> Annex Bool
getViaTmp' v key action = do
getViaTmpFromDisk :: VerifyConfig -> Key -> (FilePath -> Annex (Bool, Verification)) -> Annex Bool
getViaTmpFromDisk v key action = do
tmpfile <- prepTmp key
(ok, verification) <- action tmpfile
if ok
@ -331,12 +331,15 @@ getViaTmp' v key action = do
- it is checked.
-}
verifyKeyContent :: VerifyConfig -> Verification -> Key -> FilePath -> Annex Bool
verifyKeyContent _ Verified _ _ = return True
verifyKeyContent v UnVerified k f = ifM (shouldVerify v)
( verifysize <&&> verifycontent
, return True
)
verifyKeyContent v verification k f = case verification of
Verified -> return True
UnVerified -> ifM (shouldVerify v)
( verify
, return True
)
MustVerify -> verify
where
verify = verifysize <&&> verifycontent
verifysize = case keySize k of
Nothing -> return True
Just size -> do

View file

@ -213,7 +213,7 @@ storeReceived f = do
warning $ "Received a file " ++ f ++ " that is not a git-annex key. Deleting this file."
liftIO $ nukeFile f
Just k -> void $
getViaTmp' AlwaysVerify k $ \dest -> unVerified $
getViaTmpFromDisk AlwaysVerify k $ \dest -> unVerified $
liftIO $ catchBoolIO $ do
rename f dest
return True

View file

@ -79,11 +79,11 @@ perform file oldkey newkey = do
linkKey :: FilePath -> Key -> Key -> Annex Bool
linkKey file oldkey newkey = ifM (isJust <$> isAnnexLink file)
{- If the object file is already hardlinked to elsewhere, a hard
- link won't be made by getViaTmp', but a copy instead.
- link won't be made by getViaTmpFromDisk, but a copy instead.
- This avoids hard linking to content linked to an
- unlocked file, which would leave the new key unlocked
- and vulnerable to corruption. -}
( getViaTmp' DefaultVerify newkey $ \tmp -> unVerified $ do
( getViaTmpFromDisk DefaultVerify newkey $ \tmp -> unVerified $ do
oldobj <- calcRepo (gitAnnexLocation oldkey)
linkOrCopy' (return True) newkey oldobj tmp Nothing
, do

View file

@ -51,14 +51,10 @@ runLocal runst runner a = case a of
ReadContent k af o sender next -> do
v <- tryNonAsync $ prepSendAnnex k
case v of
-- The check can detect if the file
-- changed while it was transferred, but we don't
-- use it. Instead, the receiving peer must
-- AlwaysVerify the content it receives.
Right (Just (f, _check)) -> do
Right (Just (f, checkchanged)) -> do
v' <- tryNonAsync $
transfer upload k af $
sinkfile f o sender
sinkfile f o checkchanged sender
case v' of
Left e -> return (Left (show e))
Right (Left e) -> return (Left (show e))
@ -66,16 +62,16 @@ runLocal runst runner a = case a of
-- content not available
Right Nothing -> runner (next False)
Left e -> return (Left (show e))
StoreContent k af o l getb next -> do
StoreContent k af o l getb validitycheck next -> do
ok <- flip catchNonAsync (const $ return False) $
transfer download k af $ \p ->
getViaTmp AlwaysVerify k $ \tmp ->
unVerified $ storefile tmp o l getb p
runner (next ok)
StoreContentTo dest o l getb next -> do
ok <- flip catchNonAsync (const $ return False) $
storefile dest o l getb nullMeterUpdate
getViaTmp DefaultVerify k $ \tmp -> do
storefile tmp o l getb validitycheck p
runner (next ok)
StoreContentTo dest o l getb validitycheck next -> do
res <- flip catchNonAsync (const $ return (False, UnVerified)) $
storefile dest o l getb validitycheck nullMeterUpdate
runner (next res)
SetPresent k u next -> do
v <- tryNonAsync $ logChange k u InfoPresent
case v of
@ -120,6 +116,7 @@ runLocal runst runner a = case a of
UpdateMeterTotalSize m sz next -> do
liftIO $ setMeterTotalSize m sz
runner next
RunValidityCheck check next -> runner . next =<< check
where
transfer mk k af ta = case runst of
-- Update transfer logs when serving.
@ -129,20 +126,31 @@ runLocal runst runner a = case a of
-- a client.
Client _ -> ta nullMeterUpdate
storefile dest (Offset o) (Len l) getb p = do
storefile dest (Offset o) (Len l) getb validitycheck p = do
let p' = offsetMeterUpdate p (toBytesProcessed o)
v <- runner getb
case v of
Right b -> liftIO $ do
withBinaryFile dest ReadWriteMode $ \h -> do
Right b -> do
liftIO $ withBinaryFile dest ReadWriteMode $ \h -> do
when (o /= 0) $
hSeek h AbsoluteSeek o
meteredWrite p' h b
sz <- getFileSize dest
return (toInteger sz == l + o)
rightsize <- do
sz <- liftIO $ getFileSize dest
return (toInteger sz == l + o)
runner validitycheck >>= \case
Right (Just Valid) ->
return (rightsize, UnVerified)
_ -> do
-- Invalid, or old protocol
-- version. Validity is not
-- known. Force content
-- verification.
return (rightsize, MustVerify)
Left e -> error e
sinkfile f (Offset o) sender p = bracket setup cleanup go
sinkfile f (Offset o) checkchanged sender p = bracket setup cleanup go
where
setup = liftIO $ openBinaryFile f ReadMode
cleanup = liftIO . hClose
@ -151,4 +159,8 @@ runLocal runst runner a = case a of
when (o /= 0) $
liftIO $ hSeek h AbsoluteSeek o
b <- liftIO $ hGetContentsMetered h p'
runner (sender b)
let validitycheck = local $ runValidityCheck $
checkchanged >>= return . \case
False -> Invalid
True -> Valid
runner (sender b validitycheck)

View file

@ -14,8 +14,10 @@
module P2P.Protocol where
import qualified Utility.SimpleProtocol as Proto
import Types (Annex)
import Types.Key
import Types.UUID
import Types.Remote (Verification(..), unVerified)
import Utility.AuthToken
import Utility.Applicative
import Utility.PartialPrelude
@ -48,12 +50,15 @@ defaultProtocolVersion :: ProtocolVersion
defaultProtocolVersion = ProtocolVersion 0
maxProtocolVersion :: ProtocolVersion
maxProtocolVersion = ProtocolVersion 0
maxProtocolVersion = ProtocolVersion 1
-- | Service as used by the connect message in gitremote-helpers(1)
data Service = UploadPack | ReceivePack
deriving (Show)
data Validity = Valid | Invalid
deriving (Show)
-- | Messages in the protocol. The peer that makes the connection
-- always initiates requests, and the other peer makes responses to them.
data Message
@ -76,6 +81,7 @@ data Message
| SUCCESS
| FAILURE
| DATA Len -- followed by bytes of data
| VALIDITY Validity
| ERROR String
deriving (Show)
@ -98,6 +104,8 @@ instance Proto.Sendable Message where
formatMessage ALREADY_HAVE = ["ALREADY-HAVE"]
formatMessage SUCCESS = ["SUCCESS"]
formatMessage FAILURE = ["FAILURE"]
formatMessage (VALIDITY Valid) = ["VALID"]
formatMessage (VALIDITY Invalid) = ["INVALID"]
formatMessage (DATA len) = ["DATA", Proto.serialize len]
formatMessage (ERROR err) = ["ERROR", Proto.serialize err]
@ -122,6 +130,8 @@ instance Proto.Receivable Message where
parseCommand "FAILURE" = Proto.parse0 FAILURE
parseCommand "DATA" = Proto.parse1 DATA
parseCommand "ERROR" = Proto.parse1 ERROR
parseCommand "VALID" = Proto.parse0 (VALIDITY Valid)
parseCommand "INVALID" = Proto.parse0 (VALIDITY Invalid)
parseCommand _ = Proto.parseFail
instance Proto.Serializable ProtocolVersion where
@ -226,25 +236,26 @@ data LocalF c
| ContentSize Key (Maybe Len -> c)
-- ^ Gets size of the content of a key, when the full content is
-- present.
| ReadContent Key AssociatedFile Offset (L.ByteString -> Proto Bool) (Bool -> c)
| ReadContent Key AssociatedFile Offset (L.ByteString -> Proto Validity -> Proto Bool) (Bool -> c)
-- ^ Reads the content of a key and sends it to the callback.
-- Note that the content may change while it's being sent.
-- If the content is not available, sends L.empty to the callback.
| StoreContent Key AssociatedFile Offset Len (Proto L.ByteString) (Bool -> c)
-- Note that the content may change while it's being sent.
-- The callback is passed a validity check that it can run after
-- sending the content to detect when this happened.
| StoreContent Key AssociatedFile Offset Len (Proto L.ByteString) (Proto (Maybe Validity)) (Bool -> c)
-- ^ Stores content to the key's temp file starting at an offset.
-- Once the whole content of the key has been stored, moves the
-- temp file into place as the content of the key, and returns True.
--
-- Note: The ByteString may not contain the entire remaining content
-- of the key. Only once the temp file size == Len has the whole
-- content been transferred.
| StoreContentTo FilePath Offset Len (Proto L.ByteString) (Bool -> c)
-- ^ Stores the content to a temp file starting at an offset.
-- Once the whole content of the key has been stored, returns True.
-- If the validity check is provided and fails, the content was
-- changed while it was being sent, so verificiation of the
-- received content should be forced.
--
-- Note: The ByteString may not contain the entire remaining content
-- of the key. Only once the temp file size == Len has the whole
-- content been transferred.
| StoreContentTo FilePath Offset Len (Proto L.ByteString) (Proto (Maybe Validity)) ((Bool, Verification) -> c)
-- ^ Like StoreContent, but stores the content to a temp file.
| SetPresent Key UUID c
| CheckContentPresent Key (Bool -> c)
-- ^ Checks if the whole content of the key is locally present.
@ -261,6 +272,8 @@ data LocalF c
| UpdateMeterTotalSize Meter Integer c
-- ^ Updates the total size of a Meter, for cases where the size is
-- not known until the data is being received.
| RunValidityCheck (Annex Validity) (Validity -> c)
-- ^ Runs a deferred validity check.
deriving (Functor)
type Local = Free LocalF
@ -326,7 +339,7 @@ remove key = do
net $ sendMessage (REMOVE key)
checkSuccess
get :: FilePath -> Key -> AssociatedFile -> Meter -> MeterUpdate -> Proto Bool
get :: FilePath -> Key -> AssociatedFile -> Meter -> MeterUpdate -> Proto (Bool, Verification)
get dest key af m p =
receiveContent (Just m) p sizer storer (\offset -> GET offset af key)
where
@ -436,8 +449,9 @@ serveAuthed servermode myuuid = void $ serverLoop handler
then net $ sendMessage ALREADY_HAVE
else do
let sizer = tmpContentSize key
let storer = storeContent key af
ok <- receiveContent Nothing nullMeterUpdate sizer storer PUT_FROM
let storer = \o l b v -> unVerified $
storeContent key af o l b v
(ok, _v) <- receiveContent Nothing nullMeterUpdate sizer storer PUT_FROM
when ok $
local $ setPresent key myuuid
return ServerContinue
@ -468,20 +482,29 @@ serveAuthed servermode myuuid = void $ serverLoop handler
sendContent :: Key -> AssociatedFile -> Offset -> MeterUpdate -> Proto Bool
sendContent key af offset@(Offset n) p = go =<< local (contentSize key)
where
go Nothing = sender (Len 0) L.empty
go Nothing = sender (Len 0) L.empty (return Valid)
go (Just (Len totallen)) = do
let len = totallen - n
if len <= 0
then sender (Len 0) L.empty
then sender (Len 0) L.empty (return Valid)
else local $ readContent key af offset $
sender (Len len)
sender len content = do
sender len content validitycheck = do
let p' = offsetMeterUpdate p (toBytesProcessed n)
net $ sendMessage (DATA len)
net $ sendBytes len content p'
ver <- net getProtocolVersion
when (ver >= ProtocolVersion 1) $
net . sendMessage . VALIDITY =<< validitycheck
checkSuccess
receiveContent :: Maybe Meter -> MeterUpdate -> Local Len -> (Offset -> Len -> Proto L.ByteString -> Local Bool) -> (Offset -> Message) -> Proto Bool
receiveContent
:: Maybe Meter
-> MeterUpdate
-> Local Len
-> (Offset -> Len -> Proto L.ByteString -> Proto (Maybe Validity) -> Local (Bool, Verification))
-> (Offset -> Message)
-> Proto (Bool, Verification)
receiveContent mm p sizer storer mkmsg = do
Len n <- local sizer
let p' = offsetMeterUpdate p (toBytesProcessed n)
@ -493,13 +516,22 @@ receiveContent mm p sizer storer mkmsg = do
local $ case mm of
Nothing -> return ()
Just m -> updateMeterTotalSize m (n+l)
ok <- local $ storer offset len
ver <- net getProtocolVersion
let validitycheck = if ver >= ProtocolVersion 1
then net receiveMessage >>= \case
Just (VALIDITY v) -> return (Just v)
_ -> do
net $ sendMessage (ERROR "expected VALID or INVALID")
return Nothing
else return Nothing
(ok, v) <- local $ storer offset len
(net (receiveBytes len p'))
validitycheck
sendSuccess ok
return ok
return (ok, v)
_ -> do
net $ sendMessage (ERROR "expected DATA")
return False
return (False, UnVerified)
checkSuccess :: Proto Bool
checkSuccess = do

View file

@ -470,13 +470,13 @@ copyFromRemote' forcersync r (State connpool _) key file dest meterupdate
file forwardRetry
(\p -> copier object dest (combineMeterUpdate p meterupdate) checksuccess)
| Git.repoIsSsh (repo r) = if forcersync
then unVerified $ fallback meterupdate
then fallback meterupdate
else P2PHelper.retrieve
(\p -> Ssh.runProto r connpool False (fallback p))
(\p -> Ssh.runProto r connpool (False, UnVerified) (fallback p))
key file dest meterupdate
| otherwise = giveup "copying from non-ssh, non-http remote not supported"
where
fallback p = feedprogressback $ \p' -> do
fallback p = unVerified $ feedprogressback $ \p' -> do
oh <- mkOutputHandlerQuiet
Ssh.rsyncHelper oh (Just (combineMeterUpdate p' p))
=<< Ssh.rsyncParamsRemote False r Download key dest file

View file

@ -37,10 +37,11 @@ store runner k af p = do
fromMaybe False
<$> runner p' (P2P.put k af p')
retrieve :: (MeterUpdate -> ProtoRunner Bool) -> Key -> AssociatedFile -> FilePath -> MeterUpdate -> Annex (Bool, Verification)
retrieve runner k af dest p = unVerified $
metered (Just p) k (return Nothing) $ \m p' -> fromMaybe False
<$> runner p' (P2P.get dest k af m p')
retrieve :: (MeterUpdate -> ProtoRunner (Bool, Verification)) -> Key -> AssociatedFile -> FilePath -> MeterUpdate -> Annex (Bool, Verification)
retrieve runner k af dest p =
metered (Just p) k (return Nothing) $ \m p' ->
fromMaybe (False, UnVerified)
<$> runner p' (P2P.get dest k af m p')
remove :: ProtoRunner Bool -> Key -> Annex Bool
remove runner k = fromMaybe False <$> runner (P2P.remove k)

View file

@ -149,9 +149,15 @@ instance Ord (RemoteA a) where
instance ToUUID (RemoteA a) where
toUUID = uuid
-- Use Verified when the content of a key is verified as part of a
-- transfer, and so a separate verification step is not needed.
data Verification = UnVerified | Verified
data Verification
= UnVerified
-- ^ content was not verified during transfer, but is probably
-- ok, so if verification is disabled, don't verify it
| Verified
-- ^ content of key was verified during transfer
| MustVerify
-- ^ content likely to have been altered during transfer,
-- verify even if verification is normally disabled
unVerified :: Monad m => m Bool -> m (Bool, Verification)
unVerified a = do

View file

@ -138,6 +138,13 @@ the client to start. This allows resuming transfers.
The client then sends a DATA message with content of the file from
the offset to the end of file.
In protocol version 1, after the data, the client sends an additional
message, to indicate if the content of the file has changed while it
was being sent.
INVALID
VALID
If the server successfully receives the data and stores the content,
it replies with SUCCESS. Otherwise, FAILURE.
@ -154,6 +161,13 @@ See description of AssociatedFile above.
The server then sends a DATA message with the content of the file
from the offset to end of file.
In protocol version 1, after the data, the server sends an additional
message, to indicate if the content of the file has changed while it
was being sent.
INVALID
VALID
The client replies with SUCCESS or FAILURE.
## Connection to services

View file

@ -3,19 +3,15 @@ git-annex-shell recvkey has a speed optimisation, when it's told the file
being sent is locked, it can avoid an expensive verification, when
annex.verify=false. (Similar for transfers in the other direction.)
The P2P protocol does not have a way to communicate when that happens,
and forces AlwaysVerify.
The P2P protocol does not have a way to communicate when that happens.
File content can be modified while it's sent, and if annex.verify=false
is allowed to take effect, bad data will get into the repository.
It would be nice to support that, but if it added an extra round trip
It would be nice to support annex.verify=false when it's safe but not
when the file got modified, but if it added an extra round trip
to the P2P protocol, that could lose some of the speed gains.
The best way seems to be to add a new protocol version, where DATA
has an extra byte at the end that is "1" when the file didn't change
as it was transferred, and "0" when it did.
My first attempt to implement this failed miserably due to a Free monad
type check problem I could not see a way around.
Also, resumes make this difficult. What if a file starts to be transferred,
Resumes make this difficult. What if a file starts to be transferred,
gets changed while it's transferred so some bad bytes are sent, then the
transfer is interrupted, and later is resumed from a different remote
that has the correct content. How can it tell that the bad data was sent
@ -33,9 +29,10 @@ repository was unlocked, and the second is locked, it's safe for recvkey to
treat it locked and skip verification.
Seems the best we could do with the P2P protocol, barring adding
rsync-style rolling hashing to it, is to allow skipping verification
when the sender is locked.. But not when resuming, since we don't know
where that resumed data comes from.
rsync-style rolling hashing to it, is to detect when a file got modified
as it was being sent, and inform the peer that the data it got is bad.
It can then throw it away rather than putting the bad data into the
repository.
This is not really unique to the P2P protocol -- special remotes
can be written to support resuming. The web special remote does; there may
@ -48,9 +45,7 @@ the repository.
So, let's solve this broadly. Whenever a download is resumed, force
AlwaysVerify, unless the remote returns Verified. This can be done in
Annex.Content.getViaTmp, so it will affect all downloads involving the tmp
key for a file. (The P2P protocol still needs to prevent skipping
verification when a download is not being resumed, if the sender is
locked.)
key for a file.
This would change handling of resumes of downloads using rsync too.
But those are always safe to skip verification of, although they don't