fix transfer lock file for Download to not include uuid

While redundant concurrent transfers were already prevented in most
cases, it failed to prevent the case where two different repositories were
sending the same content to the same repository. By removing the uuid
from the transfer lock file for Download transfers, one repository
sending content will block the other one from also sending the same
content.

In order to interoperate with old git-annex, the old lock file is still
locked, as well as locking the new one. That added a lot of extra code
and work, and the plan is to eventually stop locking the old lock file,
at some point in time when an old git-annex process is unlikely to be
running at the same time.

Note that in the case of 2 repositories both doing eg
`git-annex copy foo --to origin`
the output is not that great:

copy b (to origin...)
  transfer already in progress, or unable to take transfer lock
git-annex: transfer already in progress, or unable to take transfer lock
97%   966.81 MiB      534 GiB/s 0sp2pstdio: 1 failed

  Lost connection (fd:14: hPutBuf: resource vanished (Broken pipe))

  Transfer failed

Perhaps that output could be cleaned up? Anyway, it's a lot better than letting
the redundant transfer happen and then failing with an obscure error about
a temp file, which is what it did before. And it seems users don't often
try to do this, since nobody ever reported this bug to me before.
(The "97%" there is actually how far along the *other* transfer is.)

Sponsored-by: Joshua Antonishen on Patreon
This commit is contained in:
Joey Hess 2024-03-25 14:47:38 -04:00
parent 62129f0b24
commit f04d9574d6
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 147 additions and 69 deletions

View file

@ -1,6 +1,6 @@
{- git-annex transfers {- git-annex transfers
- -
- Copyright 2012-2023 Joey Hess <id@joeyh.name> - Copyright 2012-2024 Joey Hess <id@joeyh.name>
- -
- Licensed under the GNU AGPL version 3 or higher. - Licensed under the GNU AGPL version 3 or higher.
-} -}
@ -128,9 +128,10 @@ runTransfer' ignorelock t eventualbackend afile stalldetection retrydecider tran
where where
go = do go = do
info <- liftIO $ startTransferInfo afile info <- liftIO $ startTransferInfo afile
(meter, tfile, createtfile, metervar) <- mkProgressUpdater t info (tfile, lckfile, moldlckfile) <- fromRepo $ transferFileAndLockFile t
(meter, createtfile, metervar) <- mkProgressUpdater t info tfile
mode <- annexFileMode mode <- annexFileMode
(lck, inprogress) <- prep tfile createtfile mode (lck, inprogress) <- prep lckfile moldlckfile createtfile mode
if inprogress && not ignorelock if inprogress && not ignorelock
then do then do
warning "transfer already in progress, or unable to take transfer lock" warning "transfer already in progress, or unable to take transfer lock"
@ -139,51 +140,75 @@ runTransfer' ignorelock t eventualbackend afile stalldetection retrydecider tran
v <- retry 0 info metervar $ v <- retry 0 info metervar $
detectStallsAndSuggestConfig stalldetection metervar $ detectStallsAndSuggestConfig stalldetection metervar $
transferaction meter transferaction meter
liftIO $ cleanup tfile lck liftIO $ cleanup tfile lckfile moldlckfile lck
if observeBool v if observeBool v
then removeFailedTransfer t then removeFailedTransfer t
else recordFailedTransfer t info else recordFailedTransfer t info
return v return v
prep :: RawFilePath -> Annex () -> ModeSetter -> Annex (Maybe LockHandle, Bool) prep :: RawFilePath -> Maybe RawFilePath -> Annex () -> ModeSetter -> Annex (Maybe (LockHandle, Maybe LockHandle), Bool)
#ifndef mingw32_HOST_OS #ifndef mingw32_HOST_OS
prep tfile createtfile mode = catchPermissionDenied (const prepfailed) $ do prep lckfile moldlckfile createtfile mode = catchPermissionDenied (const prepfailed) $ do
let lck = transferLockFile tfile createAnnexDirectory $ P.takeDirectory lckfile
createAnnexDirectory $ P.takeDirectory lck tryLockExclusive (Just mode) lckfile >>= \case
tryLockExclusive (Just mode) lck >>= \case
Nothing -> return (Nothing, True) Nothing -> return (Nothing, True)
-- Since the lock file is removed in cleanup, -- Since the lock file is removed in cleanup,
-- there's a race where different processes -- there's a race where different processes
-- may have a deleted and a new version of the same -- may have a deleted and a new version of the same
-- lock file open. checkSaneLock guards against -- lock file open. checkSaneLock guards against
-- that. -- that.
Just lockhandle -> ifM (checkSaneLock lck lockhandle) Just lockhandle -> ifM (checkSaneLock lckfile lockhandle)
( do ( case moldlckfile of
createtfile Nothing -> do
return (Just lockhandle, False) createtfile
return (Just (lockhandle, Nothing), False)
Just oldlckfile -> do
createAnnexDirectory $ P.takeDirectory oldlckfile
tryLockExclusive (Just mode) oldlckfile >>= \case
Nothing -> do
liftIO $ dropLock lockhandle
return (Nothing, True)
Just oldlockhandle -> ifM (checkSaneLock oldlckfile oldlockhandle)
( do
createtfile
return (Just (lockhandle, Just oldlockhandle), False)
, do
liftIO $ dropLock oldlockhandle
liftIO $ dropLock lockhandle
return (Nothing, True)
)
, do , do
liftIO $ dropLock lockhandle liftIO $ dropLock lockhandle
return (Nothing, True) return (Nothing, True)
) )
#else #else
prep tfile createtfile _mode = catchPermissionDenied (const prepfailed) $ do prep lckfile moldlckfile createtfile _mode = catchPermissionDenied (const prepfailed) $ do
let lck = transferLockFile tfile createAnnexDirectory $ P.takeDirectory lckfile
createAnnexDirectory $ P.takeDirectory lck catchMaybeIO (liftIO $ lockExclusive lckfile) >>= \case
catchMaybeIO (liftIO $ lockExclusive lck) >>= \case Just (Just lockhandle) -> case moldlckfile of
Nothing -> return (Nothing, False) Nothing -> do
Just Nothing -> return (Nothing, False) createtfile
Just (Just lockhandle) -> do return (Just (lockhandle, Nothing), False)
createtfile Just oldlckfile -> do
return (Just lockhandle, False) createAnnexDirectory $ P.takeDirectory oldlckfile
catchMaybeIO (liftIO $ lockExclusive oldlckfile) >>= \case
Just (Just oldlockhandle) -> do
createtfile
return (Just (lockhandle, Just oldlockhandle), False)
_ -> do
liftIO $ dropLock lockhandle
return (Nothing, False)
_ -> return (Nothing, False)
#endif #endif
prepfailed = return (Nothing, False) prepfailed = return (Nothing, False)
cleanup _ Nothing = noop cleanup _ _ _ Nothing = noop
cleanup tfile (Just lockhandle) = do cleanup tfile lckfile moldlckfile (Just (lockhandle, moldlockhandle)) = do
let lck = transferLockFile tfile
void $ tryIO $ R.removeLink tfile void $ tryIO $ R.removeLink tfile
#ifndef mingw32_HOST_OS #ifndef mingw32_HOST_OS
void $ tryIO $ R.removeLink lck void $ tryIO $ R.removeLink lckfile
maybe noop (void . tryIO . R.removeLink) moldlckfile
maybe noop dropLock moldlockhandle
dropLock lockhandle dropLock lockhandle
#else #else
{- Windows cannot delete the lockfile until the lock {- Windows cannot delete the lockfile until the lock
@ -191,8 +216,10 @@ runTransfer' ignorelock t eventualbackend afile stalldetection retrydecider tran
- process that takes the lock before it's removed, - process that takes the lock before it's removed,
- so ignore failure to remove. - so ignore failure to remove.
-} -}
maybe noop dropLock moldlockhandle
dropLock lockhandle dropLock lockhandle
void $ tryIO $ R.removeLink lck void $ tryIO $ R.removeLink lckfile
maybe noop (void . tryIO . R.removeLink) moldlckfile
#endif #endif
retry numretries oldinfo metervar run = retry numretries oldinfo metervar run =

View file

@ -45,7 +45,7 @@ transferPollerThread = namedThread "TransferPoller" $ do
{- Otherwise, this code polls the upload progress {- Otherwise, this code polls the upload progress
- by reading the transfer info file. -} - by reading the transfer info file. -}
| otherwise = do | otherwise = do
let f = transferFile t g let (f, _, _) = transferFileAndLockFile t g
mi <- liftIO $ catchDefaultIO Nothing $ mi <- liftIO $ catchDefaultIO Nothing $
readTransferInfoFile Nothing (fromRawFilePath f) readTransferInfoFile Nothing (fromRawFilePath f)
maybe noop (newsize t info . bytesComplete) mi maybe noop (newsize t info . bytesComplete) mi

View file

@ -7,6 +7,10 @@ git-annex (10.20240228) UNRELEASED; urgency=medium
* Added dependency on unbounded-delays. * Added dependency on unbounded-delays.
* reregisterurl: New command that can change an url from being * reregisterurl: New command that can change an url from being
used by a special remote to being used by the web remote. used by a special remote to being used by the web remote.
* Bugfix: While redundant concurrent transfers were already
prevented in most cases, it failed to prevent the case where
two different repositories were sending the same content to
the same repository.
-- Joey Hess <id@joeyh.name> Tue, 27 Feb 2024 13:07:10 -0400 -- Joey Hess <id@joeyh.name> Tue, 27 Feb 2024 13:07:10 -0400

View file

@ -1,6 +1,6 @@
{- git-annex transfer information files and lock files {- git-annex transfer information files and lock files
- -
- Copyright 2012-2021 Joey Hess <id@joeyh.name> - Copyright 2012-2024 Joey Hess <id@joeyh.name>
- -
- Licensed under the GNU AGPL version 3 or higher. - Licensed under the GNU AGPL version 3 or higher.
-} -}
@ -56,26 +56,24 @@ percentComplete t info =
<*> Just (fromMaybe 0 $ bytesComplete info) <*> Just (fromMaybe 0 $ bytesComplete info)
{- Generates a callback that can be called as transfer progresses to update {- Generates a callback that can be called as transfer progresses to update
- the transfer info file. Also returns the file it'll be updating, - the transfer info file. Also returns an action that sets up the file with
- an action that sets up the file with appropriate permissions, - appropriate permissions, which should be run after locking the transfer
- which should be run after locking the transfer lock file, but - lock file, but before using the callback, and a TVar that can be used to
- before using the callback, and a TVar that can be used to read - read the number of bytes processed so far. -}
- the number of bytes processed so far. -} mkProgressUpdater :: Transfer -> TransferInfo -> RawFilePath -> Annex (MeterUpdate, Annex (), TVar (Maybe BytesProcessed))
mkProgressUpdater :: Transfer -> TransferInfo -> Annex (MeterUpdate, RawFilePath, Annex (), TVar (Maybe BytesProcessed)) mkProgressUpdater t info tfile = do
mkProgressUpdater t info = do
tfile <- fromRepo $ transferFile t
let createtfile = void $ tryNonAsync $ writeTransferInfoFile info tfile let createtfile = void $ tryNonAsync $ writeTransferInfoFile info tfile
tvar <- liftIO $ newTVarIO Nothing tvar <- liftIO $ newTVarIO Nothing
loggedtvar <- liftIO $ newTVarIO 0 loggedtvar <- liftIO $ newTVarIO 0
return (liftIO . updater (fromRawFilePath tfile) tvar loggedtvar, tfile, createtfile, tvar) return (liftIO . updater (fromRawFilePath tfile) tvar loggedtvar, createtfile, tvar)
where where
updater tfile tvar loggedtvar new = do updater tfile' tvar loggedtvar new = do
old <- atomically $ swapTVar tvar (Just new) old <- atomically $ swapTVar tvar (Just new)
let oldbytes = maybe 0 fromBytesProcessed old let oldbytes = maybe 0 fromBytesProcessed old
let newbytes = fromBytesProcessed new let newbytes = fromBytesProcessed new
when (newbytes - oldbytes >= mindelta) $ do when (newbytes - oldbytes >= mindelta) $ do
let info' = info { bytesComplete = Just newbytes } let info' = info { bytesComplete = Just newbytes }
_ <- tryIO $ updateTransferInfoFile info' tfile _ <- tryIO $ updateTransferInfoFile info' tfile'
atomically $ writeTVar loggedtvar newbytes atomically $ writeTVar loggedtvar newbytes
{- The minimum change in bytesComplete that is worth {- The minimum change in bytesComplete that is worth
@ -102,33 +100,40 @@ startTransferInfo afile = TransferInfo
{- If a transfer is still running, returns its TransferInfo. {- If a transfer is still running, returns its TransferInfo.
- -
- If no transfer is running, attempts to clean up the stale - If no transfer is running, attempts to clean up the stale
- lock and info files. This can happen if a transfer process was - lock and info files, which can be left behind when a transfer
- interrupted. - process was interrupted.
-} -}
checkTransfer :: Transfer -> Annex (Maybe TransferInfo) checkTransfer :: Transfer -> Annex (Maybe TransferInfo)
checkTransfer t = debugLocks $ do checkTransfer t = debugLocks $ do
tfile <- fromRepo $ transferFile t (tfile, lck, moldlck) <- fromRepo $ transferFileAndLockFile t
let lck = transferLockFile tfile let deletestale = do
let cleanstale = do
void $ tryIO $ R.removeLink tfile void $ tryIO $ R.removeLink tfile
void $ tryIO $ R.removeLink lck void $ tryIO $ R.removeLink lck
maybe noop (void . tryIO . R.removeLink) moldlck
#ifndef mingw32_HOST_OS #ifndef mingw32_HOST_OS
v <- getLockStatus lck v <- getLockStatus lck
case v of v' <- case (moldlck, v) of
(Nothing, _) -> pure v
(_, StatusLockedBy pid) -> pure (StatusLockedBy pid)
(Just oldlck, _) -> getLockStatus oldlck
case v' of
StatusLockedBy pid -> liftIO $ catchDefaultIO Nothing $ StatusLockedBy pid -> liftIO $ catchDefaultIO Nothing $
readTransferInfoFile (Just pid) (fromRawFilePath tfile) readTransferInfoFile (Just pid) (fromRawFilePath tfile)
_ -> do _ -> do
-- Take a non-blocking lock while deleting mode <- annexFileMode
-- the stale lock file. Ignore failure -- Ignore failure due to permissions, races, etc.
-- due to permissions problems, races, etc. void $ tryIO $ tryLockExclusive (Just mode) lck >>= \case
void $ tryIO $ do Just lockhandle -> case moldlck of
mode <- annexFileMode Nothing -> liftIO $ do
r <- tryLockExclusive (Just mode) lck deletestale
case r of
Just lockhandle -> liftIO $ do
cleanstale
dropLock lockhandle dropLock lockhandle
_ -> noop Just oldlck -> tryLockExclusive (Just mode) oldlck >>= \case
Just oldlockhandle -> liftIO $ do
deletestale
dropLock oldlockhandle
dropLock lockhandle
Nothing -> liftIO $ dropLock lockhandle
Nothing -> noop
return Nothing return Nothing
#else #else
v <- liftIO $ lockShared lck v <- liftIO $ lockShared lck
@ -198,12 +203,39 @@ recordFailedTransfer t info = do
failedtfile <- fromRepo $ failedTransferFile t failedtfile <- fromRepo $ failedTransferFile t
writeTransferInfoFile info failedtfile writeTransferInfoFile info failedtfile
{- The transfer information file to use for a given Transfer. -} {- The transfer information file and transfer lock file
transferFile :: Transfer -> Git.Repo -> RawFilePath - to use for a given Transfer.
transferFile (Transfer direction u kd) r = -
transferDir direction r - The transfer lock file used for an Upload includes the UUID.
P.</> B8.filter (/= '/') (fromUUID u) - This allows multiple transfers of the same key to different remote
P.</> keyFile (mkKey (const kd)) - repositories run at the same time, while preventing multiple
- transfers of the same key to the same remote repository.
-
- The transfer lock file used for a Download does not include the UUID.
- This prevents multiple transfers of the same key into the local
- repository at the same time.
-
- Since old versions of git-annex (10.20240227 and older) used to
- include the UUID in the transfer lock file for a Download, this also
- returns a second lock file for Downloads, which has to be locked
- in order to interoperate with the old git-annex processes.
- Lock order is the same as return value order.
- At some point in the future, when old git-annex processes are no longer
- a concern, this complication can be removed.
-}
transferFileAndLockFile :: Transfer -> Git.Repo -> (RawFilePath, RawFilePath, Maybe RawFilePath)
transferFileAndLockFile (Transfer direction u kd) r =
case direction of
Upload -> (transferfile, uuidlockfile, Nothing)
Download -> (transferfile, nouuidlockfile, Just uuidlockfile)
where
td = transferDir direction r
fu = B8.filter (/= '/') (fromUUID u)
kf = keyFile (mkKey (const kd))
lckkf = "lck." <> kf
transferfile = td P.</> fu P.</> kf
uuidlockfile = td P.</> fu P.</> lckkf
nouuidlockfile = td P.</> "lck" P.</> lckkf
{- The transfer information file to use to record a failed Transfer -} {- The transfer information file to use to record a failed Transfer -}
failedTransferFile :: Transfer -> Git.Repo -> RawFilePath failedTransferFile :: Transfer -> Git.Repo -> RawFilePath
@ -211,12 +243,6 @@ failedTransferFile (Transfer direction u kd) r =
failedTransferDir u direction r failedTransferDir u direction r
P.</> keyFile (mkKey (const kd)) P.</> keyFile (mkKey (const kd))
{- The transfer lock file corresponding to a given transfer info file. -}
transferLockFile :: RawFilePath -> RawFilePath
transferLockFile infofile =
let (d, f) = P.splitFileName infofile
in P.combine d ("lck." <> f)
{- Parses a transfer information filename to a Transfer. -} {- Parses a transfer information filename to a Transfer. -}
parseTransferFile :: FilePath -> Maybe Transfer parseTransferFile :: FilePath -> Maybe Transfer
parseTransferFile file parseTransferFile file

View file

@ -21,6 +21,9 @@ I'm easily able to reproduce this in a bench test with 2 clones of a repo.
(cd b; git-annex copy --to origin) & (cd b; git-annex copy --to origin) &
(cd c; git-annex copy --to origin) & (cd c; git-annex copy --to origin) &
Also same happens when running `git-annex get --from` two different remotes
concurrently in the same repo.
Aha... Looking at the code, this seems like a fundamental oversight. Aha... Looking at the code, this seems like a fundamental oversight.
The `transferFile` depends on the uuid of the remote being transfered The `transferFile` depends on the uuid of the remote being transfered
to/from, so there are two different ones in this case. And the transfer to/from, so there are two different ones in this case. And the transfer
@ -47,5 +50,7 @@ noticing multiple downloads from the same uuid. Which is not a great
behavior to break either, even if it would usually only break transiently. behavior to break either, even if it would usually only break transiently.
Of course that could be avoided by keeping the current lock file, and Of course that could be avoided by keeping the current lock file, and
adding a second level lock file. adding a second lock file. [[done]] this, with plans in
[[todo/v11_changes]] to transition to use only the new lock file in
the future.
--[[Joey]] --[[Joey]]

View file

@ -10,3 +10,19 @@ version.
after upgrading to the repo version that enables this. Depending on the after upgrading to the repo version that enables this. Depending on the
timing of v11, this may need to be put in a v12 upgrade that is delayed timing of v11, this may need to be put in a v12 upgrade that is delayed
some amount of time (eg 1 year) after v11. some amount of time (eg 1 year) after v11.
* Avoid locking old transfer lock file. transferFileAndLockFile
currently returns two lock files for Download transfers,
and locking both of them is unncessary work, which is only needed to
interoperate with old git-annex versions that only lock the old lock file
and not the new one.
(Note that the old lock file should still be deleted when cleaning up the
new lock file, to make sure that all the old lock files get deleted.)
It would not be great if this change were made when a git-annex version
10.20240227 or older was running in the repository. But it wouldn't be
the end of the world either, because the effect would be effectively
the same as the bug that the second transfer lock was added to fix.
Still, it would make sense to put this in a v12 upgrade that is delayed
some amount of time (eg 1 year) after v11.