use P2P protocol for checkpresent, retrieve, and store

Note that, due to not using rsync to transfer files to ssh remotes
any longer, permissions and other file metadata of annexed files
will no longer be preserved when copying them to ssh remotes.
Other remotes never supported preserving that information, so
this is not considered a regression. Added NEWS item about this.

Another significant side effect of this is that, even when rsync is run to
retrieve a file, its progress display will no longer be shown, and
instead the native git-annex progress display will appear. It would be
possible to use the rsync process display when rsync is used (old
git-annex-shell and also retrieval from a local repository), but it
would have complicated the code unncessarily, and been inconsistent
behavior.

(I'd been thinking for a while about eliminating the rsync progress
display, since it's got some annoying verbosities, including display of
the key and the "(xfr#1, to-chk=0/1)" bit and was already somewhat
inconsistent.)

retrieveKeyFileCheap still uses rsync, since that ensures that it gets
the actual file content from the remote. Using the P2P protocol would
use the local content, as long as the local and remote size are the
same.

This commit was sponsored by John Pellman on Patreon.
This commit is contained in:
Joey Hess 2018-03-09 12:57:32 -04:00
parent 26febb4e58
commit 08814327ff
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 88 additions and 65 deletions

View file

@ -1,5 +1,15 @@
git-annex (6.20180228) UNRELEASED; urgency=medium git-annex (6.20180228) UNRELEASED; urgency=medium
* New protocol for communicating with git-annex-shell increases speed
of operations involving ssh remotes. When not transferring large files,
git-annex is between 200% and 400% faster using the new protocol.
(When the remote has an old git-annex-shell, git-annex falls back
to the old slower code.)
* Note that, due to not using rsync to transfer files to ssh remotes
any longer, permissions and other file metadata of annexed files
will no longer be preserved when copying them to and from ssh remotes.
Other remotes never supported preserving that information, so
this is not considered a regression.
* Support exporttree=yes for rsync special remotes. * Support exporttree=yes for rsync special remotes.
* Dial back optimisation when building on arm, which prevents * Dial back optimisation when building on arm, which prevents
ghc and llc from running out of memory when optimising some files. ghc and llc from running out of memory when optimising some files.
@ -16,7 +26,6 @@ git-annex (6.20180228) UNRELEASED; urgency=medium
in direct mode, it neglected to check that the content was actually in direct mode, it neglected to check that the content was actually
present when locking it. This could cause git annex drop to remove present when locking it. This could cause git annex drop to remove
the only copy of a file when it thought the tor remote had a copy. the only copy of a file when it thought the tor remote had a copy.
* git-annex-shell: Added p2pstdio mode.
* Better ssh connection warmup when using -J for concurrency. * Better ssh connection warmup when using -J for concurrency.
Avoids ugly messages when forced ssh command is not git-annex-shell. Avoids ugly messages when forced ssh command is not git-annex-shell.
* Note that Remote/Git.hs now contains AGPL licensed code, * Note that Remote/Git.hs now contains AGPL licensed code,

View file

@ -77,15 +77,6 @@ metered othermeter key getsrcfile a = withMessageState $ \st ->
Nothing -> return Nothing Nothing -> return Nothing
Just f -> catchMaybeIO $ liftIO $ getFileSize f Just f -> catchMaybeIO $ liftIO $ getFileSize f
{- Use when the command's own progress output is preferred.
- The command's output will be suppressed and git-annex's progress meter
- used for concurrent output, and json progress. -}
commandMetered :: Maybe MeterUpdate -> Key -> Annex (Maybe FilePath) -> (MeterUpdate -> Annex a) -> Annex a
commandMetered combinemeterupdate key getsrcfile a =
withMessageState $ \s -> if needOutputMeter s
then metered combinemeterupdate key getsrcfile a
else a (fromMaybe nullMeterUpdate combinemeterupdate)
{- Poll file size to display meter, but only when concurrent output or {- Poll file size to display meter, but only when concurrent output or
- json progress needs the information. -} - json progress needs the information. -}
meteredFile :: FilePath -> Maybe MeterUpdate -> Key -> Annex a -> Annex a meteredFile :: FilePath -> Maybe MeterUpdate -> Key -> Annex a -> Annex a

22
NEWS
View file

@ -1,4 +1,14 @@
git-annex (6.20170228) unstable; urgency=medium git-annex (6.20180309) upstream; urgency=medium
Note that, due to not using rsync to transfer files to ssh remotes
any longer, permissions and other file metadata of annexed files
will no longer be preserved when copying them to and from ssh remotes.
Other remotes never supported preserving that information, so
this is not considered a regression.
-- Joey Hess <id@joeyh.name> Fri, 09 Mar 2018 13:22:47 -0400
git-annex (6.20170228) upstream; urgency=medium
This version of git-annex has mitigations for SHA1 hash collision This version of git-annex has mitigations for SHA1 hash collision
problems. problems.
@ -10,7 +20,7 @@ git-annex (6.20170228) unstable; urgency=medium
-- Joey Hess <id@joeyh.name> Tue, 28 Feb 2017 13:28:50 -0400 -- Joey Hess <id@joeyh.name> Tue, 28 Feb 2017 13:28:50 -0400
git-annex (6.20170101) unstable; urgency=medium git-annex (6.20170101) upstream; urgency=medium
XMPP support has been removed from the assistant in this release. XMPP support has been removed from the assistant in this release.
@ -20,7 +30,7 @@ git-annex (6.20170101) unstable; urgency=medium
-- Joey Hess <id@joeyh.name> Tue, 27 Dec 2016 16:37:46 -0400 -- Joey Hess <id@joeyh.name> Tue, 27 Dec 2016 16:37:46 -0400
git-annex (4.20131002) unstable; urgency=low git-annex (4.20131002) upstream; urgency=low
The layout of gcrypt repositories has changed, and The layout of gcrypt repositories has changed, and
if you created one you must manually upgrade it. if you created one you must manually upgrade it.
@ -28,7 +38,7 @@ git-annex (4.20131002) unstable; urgency=low
-- Joey Hess <joeyh@debian.org> Tue, 24 Sep 2013 13:55:23 -0400 -- Joey Hess <joeyh@debian.org> Tue, 24 Sep 2013 13:55:23 -0400
git-annex (3.20120123) unstable; urgency=low git-annex (3.20120123) upstream; urgency=low
There was a bug in the handling of directory special remotes that There was a bug in the handling of directory special remotes that
could cause partial file contents to be stored in them. If you use could cause partial file contents to be stored in them. If you use
@ -39,7 +49,7 @@ git-annex (3.20120123) unstable; urgency=low
-- Joey Hess <joeyh@debian.org> Thu, 19 Jan 2012 15:24:23 -0400 -- Joey Hess <joeyh@debian.org> Thu, 19 Jan 2012 15:24:23 -0400
git-annex (3.20110624) experimental; urgency=low git-annex (3.20110624) upstream; urgency=low
There has been another change to the git-annex data store. There has been another change to the git-annex data store.
Use `git annex upgrade` to migrate your repositories to the new Use `git annex upgrade` to migrate your repositories to the new
@ -56,7 +66,7 @@ git-annex (3.20110624) experimental; urgency=low
-- Joey Hess <joeyh@debian.org> Tue, 21 Jun 2011 20:18:00 -0400 -- Joey Hess <joeyh@debian.org> Tue, 21 Jun 2011 20:18:00 -0400
git-annex (0.20110316) experimental; urgency=low git-annex (0.20110316) upstream; urgency=low
This version reorganises the layout of git-annex's files in your repository. This version reorganises the layout of git-annex's files in your repository.
There is an upgrade process to convert a repository from the old git-annex There is an upgrade process to convert a repository from the old git-annex

View file

@ -61,7 +61,6 @@ import qualified P2P.IO as P2P
import P2P.Address import P2P.Address
import Annex.Path import Annex.Path
import Creds import Creds
import Messages.Progress
import Types.NumCopies import Types.NumCopies
import Annex.Action import Annex.Action
@ -150,24 +149,23 @@ gen r u c gc
| Git.GCrypt.isEncrypted r = Remote.GCrypt.chainGen r u c gc | Git.GCrypt.isEncrypted r = Remote.GCrypt.chainGen r u c gc
| otherwise = case repoP2PAddress r of | otherwise = case repoP2PAddress r of
Nothing -> do Nothing -> do
duc <- mkDeferredUUIDCheck r u gc st <- mkState r u gc
connpool <- Ssh.mkP2PSshConnectionPool go st <$> remoteCost gc defcst
go duc connpool <$> remoteCost gc defcst
Just addr -> Remote.P2P.chainGen addr r u c gc Just addr -> Remote.P2P.chainGen addr r u c gc
where where
defcst = if repoCheap r then cheapRemoteCost else expensiveRemoteCost defcst = if repoCheap r then cheapRemoteCost else expensiveRemoteCost
go duc connpool cst = Just new go st cst = Just new
where where
new = Remote new = Remote
{ uuid = u { uuid = u
, cost = cst , cost = cst
, name = Git.repoDescribe r , name = Git.repoDescribe r
, storeKey = copyToRemote new duc , storeKey = copyToRemote new st
, retrieveKeyFile = copyFromRemote new , retrieveKeyFile = copyFromRemote new st
, retrieveKeyFileCheap = copyFromRemoteCheap new , retrieveKeyFileCheap = copyFromRemoteCheap new st
, removeKey = dropKey new duc connpool , removeKey = dropKey new st
, lockContent = Just (lockKey new duc) , lockContent = Just (lockKey new st)
, checkPresent = inAnnex new duc , checkPresent = inAnnex new st
, checkPresentCheap = repoCheap r , checkPresentCheap = repoCheap r
, exportActions = exportUnsupported , exportActions = exportUnsupported
, whereisKey = Nothing , whereisKey = Nothing
@ -332,8 +330,8 @@ tryGitConfigRead autoinit r
else [] else []
{- Checks if a given remote has the content for a key in its annex. -} {- Checks if a given remote has the content for a key in its annex. -}
inAnnex :: Remote -> DeferredUUIDCheck -> Key -> Annex Bool inAnnex :: Remote -> State -> Key -> Annex Bool
inAnnex rmt duc key inAnnex rmt (State connpool duc) key
| Git.repoIsHttp r = checkhttp | Git.repoIsHttp r = checkhttp
| Git.repoIsUrl r = checkremote | Git.repoIsUrl r = checkremote
| otherwise = checklocal | otherwise = checklocal
@ -345,7 +343,9 @@ inAnnex rmt duc key
( return True ( return True
, giveup "not found" , giveup "not found"
) )
checkremote = Ssh.inAnnex r key checkremote =
let fallback = Ssh.inAnnex r key
in P2PHelper.checkpresent (runProto rmt connpool fallback) key
checklocal = ifM duc checklocal = ifM duc
( guardUsable r (cantCheck r) $ ( guardUsable r (cantCheck r) $
maybe (cantCheck r) return maybe (cantCheck r) return
@ -370,8 +370,8 @@ keyUrls r key = map tourl locs'
remoteconfig = gitconfig r remoteconfig = gitconfig r
cfg = remoteGitConfig remoteconfig cfg = remoteGitConfig remoteconfig
dropKey :: Remote -> DeferredUUIDCheck -> Ssh.P2PSshConnectionPool -> Key -> Annex Bool dropKey :: Remote -> State -> Key -> Annex Bool
dropKey r duc connpool key dropKey r (State connpool duc) key
| not $ Git.repoIsUrl (repo r) = ifM duc | not $ Git.repoIsUrl (repo r) = ifM duc
( guardUsable (repo r) (return False) $ ( guardUsable (repo r) (return False) $
commitOnCleanup r $ onLocalFast r $ do commitOnCleanup r $ onLocalFast r $ do
@ -389,8 +389,8 @@ dropKey r duc connpool key
let fallback = Ssh.dropKey (repo r) key let fallback = Ssh.dropKey (repo r) key
P2PHelper.remove (runProto r connpool fallback) key P2PHelper.remove (runProto r connpool fallback) key
lockKey :: Remote -> DeferredUUIDCheck -> Key -> (VerifiedCopy -> Annex r) -> Annex r lockKey :: Remote -> State -> Key -> (VerifiedCopy -> Annex r) -> Annex r
lockKey r duc key callback lockKey r (State _ duc) key callback
| not $ Git.repoIsUrl (repo r) = ifM duc | not $ Git.repoIsUrl (repo r) = ifM duc
( guardUsable (repo r) failedlock $ do ( guardUsable (repo r) failedlock $ do
inorigrepo <- Annex.makeRunner inorigrepo <- Annex.makeRunner
@ -446,15 +446,13 @@ lockKey r duc key callback
failedlock = giveup "can't lock content" failedlock = giveup "can't lock content"
{- Tries to copy a key's content from a remote's annex to a file. -} {- Tries to copy a key's content from a remote's annex to a file. -}
copyFromRemote :: Remote -> Key -> AssociatedFile -> FilePath -> MeterUpdate -> Annex (Bool, Verification) copyFromRemote :: Remote -> State -> Key -> AssociatedFile -> FilePath -> MeterUpdate -> Annex (Bool, Verification)
copyFromRemote r key file dest p copyFromRemote = copyFromRemote' False
| Git.repoIsHttp (repo r) = unVerified $
Annex.Content.downloadUrl key p (keyUrls r key) dest
| otherwise = commandMetered (Just p) key (return Nothing) $
copyFromRemote' r key file dest
copyFromRemote' :: Remote -> Key -> AssociatedFile -> FilePath -> MeterUpdate -> Annex (Bool, Verification) copyFromRemote' :: Bool -> Remote -> State -> Key -> AssociatedFile -> FilePath -> MeterUpdate -> Annex (Bool, Verification)
copyFromRemote' r key file dest meterupdate copyFromRemote' forcersync r (State connpool _) key file dest meterupdate
| Git.repoIsHttp (repo r) = unVerified $
Annex.Content.downloadUrl key meterupdate (keyUrls r key) dest
| not $ Git.repoIsUrl (repo r) = guardUsable (repo r) (unVerified (return False)) $ do | not $ Git.repoIsUrl (repo r) = guardUsable (repo r) (unVerified (return False)) $ do
params <- Ssh.rsyncParams r Download params <- Ssh.rsyncParams r Download
u <- getUUID u <- getUUID
@ -470,11 +468,16 @@ copyFromRemote' r key file dest meterupdate
runTransfer (Transfer Download u key) runTransfer (Transfer Download u key)
file forwardRetry file forwardRetry
(\p -> copier object dest (combineMeterUpdate p meterupdate) checksuccess) (\p -> copier object dest (combineMeterUpdate p meterupdate) checksuccess)
| Git.repoIsSsh (repo r) = unVerified $ feedprogressback $ \p -> do | Git.repoIsSsh (repo r) = if forcersync
Ssh.rsyncHelper (Just (combineMeterUpdate meterupdate p)) then unVerified fallback
=<< Ssh.rsyncParamsRemote False r Download key dest file else P2PHelper.retrieve
(runProto r connpool fallback)
key file dest meterupdate
| otherwise = giveup "copying from non-ssh, non-http remote not supported" | otherwise = giveup "copying from non-ssh, non-http remote not supported"
where where
fallback = feedprogressback $ \p -> do
Ssh.rsyncHelper (Just (combineMeterUpdate meterupdate p))
=<< Ssh.rsyncParamsRemote False r Download key dest file
{- Feed local rsync's progress info back to the remote, {- Feed local rsync's progress info back to the remote,
- by forking a feeder thread that runs - by forking a feeder thread that runs
- git-annex-shell transferinfo at the same time - git-annex-shell transferinfo at the same time
@ -535,9 +538,9 @@ copyFromRemote' r key file dest meterupdate
=<< tryTakeMVar pidv =<< tryTakeMVar pidv
bracketIO noop (const cleanup) (const $ a feeder) bracketIO noop (const cleanup) (const $ a feeder)
copyFromRemoteCheap :: Remote -> Key -> AssociatedFile -> FilePath -> Annex Bool copyFromRemoteCheap :: Remote -> State -> Key -> AssociatedFile -> FilePath -> Annex Bool
#ifndef mingw32_HOST_OS #ifndef mingw32_HOST_OS
copyFromRemoteCheap r key af file copyFromRemoteCheap r st key af file
| not $ Git.repoIsUrl (repo r) = guardUsable (repo r) (return False) $ liftIO $ do | not $ Git.repoIsUrl (repo r) = guardUsable (repo r) (return False) $ liftIO $ do
loc <- gitAnnexLocation key (repo r) $ loc <- gitAnnexLocation key (repo r) $
remoteGitConfig $ gitconfig r remoteGitConfig $ gitconfig r
@ -551,36 +554,31 @@ copyFromRemoteCheap r key af file
) )
| Git.repoIsSsh (repo r) = | Git.repoIsSsh (repo r) =
ifM (Annex.Content.preseedTmp key file) ifM (Annex.Content.preseedTmp key file)
( fst <$> copyFromRemote r key af file nullMeterUpdate ( fst <$> copyFromRemote' True r st key af file nullMeterUpdate
, return False , return False
) )
| otherwise = return False | otherwise = return False
#else #else
copyFromRemoteCheap _ _ _ _ = return False copyFromRemoteCheap _ _ _ _ _ = return False
#endif #endif
{- Tries to copy a key's content to a remote's annex. -} {- Tries to copy a key's content to a remote's annex. -}
copyToRemote :: Remote -> DeferredUUIDCheck -> Key -> AssociatedFile -> MeterUpdate -> Annex Bool copyToRemote :: Remote -> State -> Key -> AssociatedFile -> MeterUpdate -> Annex Bool
copyToRemote r duc key file meterupdate copyToRemote r (State connpool duc) key file meterupdate
| not $ Git.repoIsUrl (repo r) = ifM duc | not $ Git.repoIsUrl (repo r) = ifM duc
( guardUsable (repo r) (return False) $ commitOnCleanup r $ ( guardUsable (repo r) (return False) $ commitOnCleanup r $
copylocal =<< Annex.Content.prepSendAnnex key copylocal =<< Annex.Content.prepSendAnnex key
, return False , return False
) )
| Git.repoIsSsh (repo r) = commitOnCleanup r $ | Git.repoIsSsh (repo r) = commitOnCleanup r $
Annex.Content.sendAnnex key noop $ \object -> P2PHelper.store
withmeter object $ \p -> do (runProto r connpool copyremotefallback)
-- This is too broad really, but recvkey normally key file meterupdate
-- verifies content anyway, so avoid complicating
-- it with a local sendAnnex check and rollback.
unlocked <- isDirect <||> versionSupportsUnlockedPointers
Ssh.rsyncHelper (Just p)
=<< Ssh.rsyncParamsRemote unlocked r Upload key object file
| otherwise = giveup "copying to non-ssh repo not supported" | otherwise = giveup "copying to non-ssh repo not supported"
where where
withmeter object = commandMetered (Just meterupdate) key (return $ Just object)
copylocal Nothing = return False copylocal Nothing = return False
copylocal (Just (object, checksuccess)) = withmeter object $ \p -> do copylocal (Just (object, checksuccess)) = do
-- The checksuccess action is going to be run in -- The checksuccess action is going to be run in
-- the remote's Annex, but it needs access to the local -- the remote's Annex, but it needs access to the local
-- Annex monad's state. -- Annex monad's state.
@ -595,12 +593,19 @@ copyToRemote r duc key file meterupdate
ensureInitialized ensureInitialized
copier <- mkCopier hardlink params copier <- mkCopier hardlink params
let verify = Annex.Content.RemoteVerify r let verify = Annex.Content.RemoteVerify r
runTransfer (Transfer Download u key) file forwardRetry $ \p' -> runTransfer (Transfer Download u key) file forwardRetry $ \p ->
let p'' = combineMeterUpdate p p' let p' = combineMeterUpdate meterupdate p
in Annex.Content.saveState True `after` in Annex.Content.saveState True `after`
Annex.Content.getViaTmp verify key Annex.Content.getViaTmp verify key
(\dest -> copier object dest p'' (liftIO checksuccessio)) (\dest -> copier object dest p' (liftIO checksuccessio))
) )
copyremotefallback = Annex.Content.sendAnnex key noop $ \object -> do
-- This is too broad really, but recvkey normally
-- verifies content anyway, so avoid complicating
-- it with a local sendAnnex check and rollback.
unlocked <- isDirect <||> versionSupportsUnlockedPointers
Ssh.rsyncHelper (Just meterupdate)
=<< Ssh.rsyncParamsRemote unlocked r Upload key object file
fsckOnRemote :: Git.Repo -> [CommandParam] -> Annex (IO Bool) fsckOnRemote :: Git.Repo -> [CommandParam] -> Annex (IO Bool)
fsckOnRemote r params fsckOnRemote r params
@ -736,6 +741,13 @@ mkCopier remotewanthardlink rsyncparams = do
, return copier , return copier
) )
data State = State Ssh.P2PSshConnectionPool DeferredUUIDCheck
mkState :: Git.Repo -> UUID -> RemoteGitConfig -> Annex State
mkState r u gc = State
<$> Ssh.mkP2PSshConnectionPool
<*> mkDeferredUUIDCheck r u gc
{- Normally the UUID of a local repository is checked at startup, {- Normally the UUID of a local repository is checked at startup,
- but annex-checkuuid config can prevent that. To avoid getting - but annex-checkuuid config can prevent that. To avoid getting
- confused, a deferred check is done just before the repository - confused, a deferred check is done just before the repository

View file

@ -14,7 +14,6 @@ import qualified P2P.Protocol as P2P
import P2P.IO import P2P.IO
import Types.Remote import Types.Remote
import Annex.Content import Annex.Content
import Config.Cost
import Messages.Progress import Messages.Progress
import Utility.Metered import Utility.Metered
import Types.NumCopies import Types.NumCopies

View file

@ -61,3 +61,5 @@ took 12s, so remote access over localhost seems faster now! Possibly
there's a little bit more concurrency when git-annex and git-annex-shell there's a little bit more concurrency when git-annex and git-annex-shell
are both running?) are both running?)
Transferring a 30 mb file over ssh to localhost, speed increased from
3.288s to 3.031s. Due to rsync's several levels of checksumming?