enable forwardRetry for command-line transfers
If a transfer fails for some reason, but some data managed to be sent, the transfer will be retried. (The assistant already did this.) Possible impacts: * More ssh prompts if ssh needs to prompt for a password to connect to a host, or is prompting about some other problem like a ssh key mismatch. * More data transfer due to retrying, epecially when a remote does not support resuming a transfer. In the worst case, a lot of data will be transferred but it fails before the end, and then all that data gets transferred again plus one byte more; repeat until it manages to get the whole file.
This commit is contained in:
parent
ca2435b21b
commit
8dcf79694d
7 changed files with 75 additions and 5 deletions
|
@ -12,6 +12,8 @@ git-annex (6.20161013) UNRELEASED; urgency=medium
|
||||||
* test: Deal with gpg-agent behavior change that broke the test suite.
|
* test: Deal with gpg-agent behavior change that broke the test suite.
|
||||||
* Improve ssh socket cleanup code to skip over the cruft that
|
* Improve ssh socket cleanup code to skip over the cruft that
|
||||||
NFS sometimes puts in a directory when a file is being deleted.
|
NFS sometimes puts in a directory when a file is being deleted.
|
||||||
|
* If a transfer fails for some reason, but some data managed to be sent,
|
||||||
|
the transfer will be retried. (The assistant already did this.)
|
||||||
|
|
||||||
-- Joey Hess <id@joeyh.name> Mon, 17 Oct 2016 12:46:54 -0400
|
-- Joey Hess <id@joeyh.name> Mon, 17 Oct 2016 12:46:54 -0400
|
||||||
|
|
||||||
|
|
|
@ -109,7 +109,7 @@ getKey' key afile = dispatch
|
||||||
either (const False) id <$> Remote.hasKey r key
|
either (const False) id <$> Remote.hasKey r key
|
||||||
| otherwise = return True
|
| otherwise = return True
|
||||||
docopy r witness = getViaTmp (RemoteVerify r) key $ \dest ->
|
docopy r witness = getViaTmp (RemoteVerify r) key $ \dest ->
|
||||||
download (Remote.uuid r) key afile noRetry
|
download (Remote.uuid r) key afile forwardRetry
|
||||||
(\p -> do
|
(\p -> do
|
||||||
showAction $ "from " ++ Remote.name r
|
showAction $ "from " ++ Remote.name r
|
||||||
Remote.retrieveKeyFile r key afile dest p
|
Remote.retrieveKeyFile r key afile dest p
|
||||||
|
|
|
@ -114,7 +114,7 @@ toPerform dest move key afile fastcheck isthere =
|
||||||
Right False -> do
|
Right False -> do
|
||||||
showAction $ "to " ++ Remote.name dest
|
showAction $ "to " ++ Remote.name dest
|
||||||
ok <- notifyTransfer Upload afile $
|
ok <- notifyTransfer Upload afile $
|
||||||
upload (Remote.uuid dest) key afile noRetry $
|
upload (Remote.uuid dest) key afile forwardRetry $
|
||||||
Remote.storeKey dest key afile
|
Remote.storeKey dest key afile
|
||||||
if ok
|
if ok
|
||||||
then finish $
|
then finish $
|
||||||
|
@ -177,7 +177,7 @@ fromPerform src move key afile = ifM (inAnnex key)
|
||||||
)
|
)
|
||||||
where
|
where
|
||||||
go = notifyTransfer Download afile $
|
go = notifyTransfer Download afile $
|
||||||
download (Remote.uuid src) key afile noRetry $ \p -> do
|
download (Remote.uuid src) key afile forwardRetry $ \p -> do
|
||||||
showAction $ "from " ++ Remote.name src
|
showAction $ "from " ++ Remote.name src
|
||||||
getViaTmp (RemoteVerify src) key $ \t ->
|
getViaTmp (RemoteVerify src) key $ \t ->
|
||||||
Remote.retrieveKeyFile src key afile t p
|
Remote.retrieveKeyFile src key afile t p
|
||||||
|
|
|
@ -48,6 +48,7 @@ fieldTransfer direction key a = do
|
||||||
liftIO $ debugM "fieldTransfer" "transfer start"
|
liftIO $ debugM "fieldTransfer" "transfer start"
|
||||||
afile <- Fields.getField Fields.associatedFile
|
afile <- Fields.getField Fields.associatedFile
|
||||||
ok <- maybe (a $ const noop)
|
ok <- maybe (a $ const noop)
|
||||||
|
-- Using noRetry here because we're the sender.
|
||||||
(\u -> runner (Transfer direction (toUUID u) key) afile noRetry a)
|
(\u -> runner (Transfer direction (toUUID u) key) afile noRetry a)
|
||||||
=<< Fields.getField Fields.remoteUUID
|
=<< Fields.getField Fields.remoteUUID
|
||||||
liftIO $ debugM "fieldTransfer" "transfer done"
|
liftIO $ debugM "fieldTransfer" "transfer done"
|
||||||
|
|
|
@ -439,7 +439,7 @@ copyFromRemote' r key file dest meterupdate
|
||||||
Just (object, checksuccess) -> do
|
Just (object, checksuccess) -> do
|
||||||
copier <- mkCopier hardlink params
|
copier <- mkCopier hardlink params
|
||||||
runTransfer (Transfer Download u key)
|
runTransfer (Transfer Download u key)
|
||||||
file noRetry
|
file forwardRetry
|
||||||
(\p -> copier object dest (combineMeterUpdate p meterupdate) checksuccess)
|
(\p -> copier object dest (combineMeterUpdate p meterupdate) checksuccess)
|
||||||
| Git.repoIsSsh (repo r) = unVerified $ feedprogressback $ \p -> do
|
| Git.repoIsSsh (repo r) = unVerified $ feedprogressback $ \p -> do
|
||||||
Ssh.rsyncHelper (Just (combineMeterUpdate meterupdate p))
|
Ssh.rsyncHelper (Just (combineMeterUpdate meterupdate p))
|
||||||
|
@ -565,7 +565,7 @@ copyToRemote' r key file meterupdate
|
||||||
ensureInitialized
|
ensureInitialized
|
||||||
copier <- mkCopier hardlink params
|
copier <- mkCopier hardlink params
|
||||||
let verify = Annex.Content.RemoteVerify r
|
let verify = Annex.Content.RemoteVerify r
|
||||||
runTransfer (Transfer Download u key) file noRetry $ \p ->
|
runTransfer (Transfer Download u key) file forwardRetry $ \p ->
|
||||||
let p' = combineMeterUpdate meterupdate p
|
let p' = combineMeterUpdate meterupdate p
|
||||||
in Annex.Content.saveState True `after`
|
in Annex.Content.saveState True `after`
|
||||||
Annex.Content.getViaTmp verify key
|
Annex.Content.getViaTmp verify key
|
||||||
|
|
|
@ -58,3 +58,4 @@ SHA256E-s41311329--69c3b054a3fe9676133605730d85b7fcef8696f6782d402a524e41b836253
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
[[!meta title="Detect stalled transfer and retry or abort it"]]
|
||||||
|
|
|
@ -0,0 +1,66 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 2"""
|
||||||
|
date="2016-10-26T18:26:35Z"
|
||||||
|
content="""
|
||||||
|
The most common way a network connection can stall like this is when
|
||||||
|
moving to a different wifi network: the connection is open but
|
||||||
|
no more data will be received. I suppose other kinds of network
|
||||||
|
glitches could also lead to this kind of situation.
|
||||||
|
|
||||||
|
ssh has some things, like ServerAliveInterval and TCPKeepAlive,
|
||||||
|
that it can use to detect such problems. You may find them useful.
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
As for the retrying once a stall is detected, some transfers use
|
||||||
|
`forwardRetry` which will automatically retry as long as the failed try
|
||||||
|
managed to send some data. But the get/move/copy commands currently use
|
||||||
|
`noRetry`. I can't find any justification for not always using
|
||||||
|
`forwardRetry`; I think that it was added for the assistant originally and
|
||||||
|
the other stuff just never switched over.
|
||||||
|
|
||||||
|
Only problem I can think of is, if there actually is a ssh password
|
||||||
|
prompt, it would prompt again on retry. But most people using git-annex
|
||||||
|
with ssh have something in place to make ssh not prompt repeatedly for
|
||||||
|
passwords.
|
||||||
|
|
||||||
|
So, I've gone ahead and enabled `forwardRetry` for everything.
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
Occurs to me that git-annex could try to notice when a transfer is not
|
||||||
|
progressing, by reusing the existing progress metering code.
|
||||||
|
|
||||||
|
Since some remotes don't update the progress meter, this could
|
||||||
|
only be used to detect stalls after the progress meter has been updated
|
||||||
|
at least once. If the stall occurs earlier than that, it would not be able
|
||||||
|
to be detected.
|
||||||
|
|
||||||
|
It seems quite hard to come up with a good timeout value to detect a
|
||||||
|
stalled connection. Often progress meters are updated after every small
|
||||||
|
(eg 32kb) chunk transferred. But others might poll periodically, or might
|
||||||
|
use a larger chunk size. It's even possible that some special remotes
|
||||||
|
are looking at a percent output by some program, and only update the meter
|
||||||
|
when the percent transferred changes -- in which case it could be many
|
||||||
|
minutes in between each meter update when a large file is being
|
||||||
|
transferred.
|
||||||
|
|
||||||
|
If the timeout is too short, git-annex will stall in a new way, by
|
||||||
|
constantly killing "stalled" connections before they can send enough data.
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
So it really seems better to fix the ssh connection to not stall, since
|
||||||
|
that is not so heuristic a fix. Seems like git-annex could force
|
||||||
|
ServerAliveInterval to be set, and perhaps lower ServerAliveCountMax from 3
|
||||||
|
to 1. The ssh BatchMode setting sets the former to 300, so a stalled
|
||||||
|
connection will time out after 15 minutes. But BatchMode also disables
|
||||||
|
prompting, and git-annex should not disable that.
|
||||||
|
|
||||||
|
Catch is, what if the user has configured ssh with some
|
||||||
|
other ServerAliveInterval value? We don't want git-annex to override that.
|
||||||
|
|
||||||
|
(git-annex does have a rudimentary .ssh/config parser, but it's not
|
||||||
|
good enough to handle eg, "Host *.example.com ")
|
||||||
|
"""]]
|
Loading…
Reference in a new issue