Fix an unlikely race that could result in two transfers of the same key running at once.

As discussed in bug report.
2015-05-12 19:39:28 -04:00 · 2015-05-12 19:39:28 -04:00 · 8c2dd7d8ee
commit 8c2dd7d8ee
parent e25ecab7dd
4 changed files with 56 additions and 1 deletions
--- a/Annex/Transfer.hs
+++ b/Annex/Transfer.hs
@ -87,9 +87,12 @@ runTransfer' ignorelock t file shouldretry transferobserver transferaction = do
 		r <- tryLockExclusive (Just mode) lck
 		case r of
 			Nothing -> return (Nothing, True)
-			Just lockhandle -> do
+			Just lockhandle -> ifM (checkSaneLock lck lockhandle)
+				( do
 					void $ tryIO $ writeTransferInfoFile info tfile
 					return (Just lockhandle, False)
+				, return (Nothing, True)
+				)
 #else
 	prep tfile _mode info = do
 		let lck = transferLockFile tfile
--- a/Utility/LockFile/Posix.hs
+++ b/Utility/LockFile/Posix.hs
@ -16,6 +16,7 @@ module Utility.LockFile.Posix (
 	checkLocked,
 	getLockStatus,
 	dropLock,
+	checkSaneLock,
 ) where

 import Utility.Exception
@ -97,3 +98,17 @@ getLockStatus' lockfile = go =<< catchMaybeIO open

 dropLock :: LockHandle -> IO ()
 dropLock (LockHandle fd) = closeFd fd
+
+-- Checks that the lock file still exists, and is the same file that was
+-- locked to get the LockHandle.
+--
+-- This check is useful if the lock file might get deleted by something
+-- else.
+checkSaneLock :: LockFile -> LockHandle -> IO Bool
+checkSaneLock lockfile (LockHandle fd) =
+	go =<< catchMaybeIO (getFileStatus lockfile)
+  where
+	go Nothing = return False
+	go (Just st) = do
+		fdst <- getFdStatus fd
+		return $ deviceID fdst == deviceID st && fileID fdst == fileID st
--- a/debian/changelog
+++ b/debian/changelog
@ -14,6 +14,8 @@ git-annex (5.20150508.2) UNRELEASED; urgency=medium
    checking annex.diskreserve.
  * Avoid accumulating transfer failure log files unless the assistant is
    being used.
+  * Fix an unlikely race that could result in two transfers of the same key
+    running at once.

 -- Joey Hess <id@joeyh.name>  Mon, 11 May 2015 12:45:06 -0400

--- a/doc/bugs/transfer_lock_file_removal_bug.mdwn
+++ b/doc/bugs/transfer_lock_file_removal_bug.mdwn
@ -0,0 +1,35 @@
+There's a race that can result in 2 concurrent downloads
+of the same key. This can happen because the transfer lock files get
+deleted after a transfer completes.
+
+Scenario:
+
+1. first process creates lock file, takes lock, starts download
+2. second process opens existing lock file
+3. first process finishes/fails download, so deletes lock file, and then
+   drops lock
+4. second process takes lock (exclusive, non-blocking) of deleted file!
+5. third process sees no lock file, so creates a new one, takes lock,
+   starts download
+
+Worst-case, this might result in data loss, if the two concurrent
+downloaders confuse one-another. Perhaps the second one overwrites the file
+the first was downloading, and then the first, thinking it's written the
+file, moves it into the object directory.
+
+Note that the window between 4-5 can be as long as it takes for a
+file to download. However, the window between 2-4 is very narrow indeed,
+since the second process is not blocking on the lock.
+So, this is very unlikely to happen.
+
+But, it could. Can it be prevented?
+
+Yes: The second process, after taking the lock, can check that
+the lock file exists, and has the same dev and fileno as the fd it has
+locked. If the lock file doesn't exist, or is different, then the
+second process can give up.
+
+Oh BTW, this race cannot happen on windows, since on windows, an open lock
+file cannot be deleted by another process.
+
+> [[fixed|done]]