catch error statting pid lock file if it somehow does not exist

It ought to exist, since linkToLock has just created it. However,
Lustre seems to have a rather probabilisitic view of the contents of a
directory, so catching the error if it somehow does not exist and
running the same code path that would be ran if linkToLock failed
might avoid this fun Lustre failure.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2021-11-29 14:51:28 -04:00
parent 567f63ba47
commit a6699be79d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 32 additions and 13 deletions

View file

@ -154,14 +154,11 @@ tryLock lockfile = do
removeWhenExistsWith removeLink tmp'
return Nothing
let tooklock st = return $ Just $ LockHandle abslockfile st sidelock
ifM (linkToLock sidelock tmp' abslockfile)
( do
linkToLock sidelock tmp' abslockfile >>= \case
Just lckst -> do
removeWhenExistsWith removeLink tmp'
-- May not have made a hard link, so stat
-- the lockfile
lckst <- getFileStatus abslockfile
tooklock lckst
, do
Nothing -> do
v <- readPidLock abslockfile
hn <- getHostName
tmpst <- getFileStatus tmp'
@ -175,7 +172,6 @@ tryLock lockfile = do
rename tmp' abslockfile
tooklock tmpst
_ -> failedlock tmpst
)
-- Linux's open(2) man page recommends linking a pid lock into place,
-- as the most portable atomic operation that will fail if
@ -187,8 +183,8 @@ tryLock lockfile = do
--
-- However, not all filesystems support hard links. So, first probe
-- to see if they are supported. If not, use open with O_EXCL.
linkToLock :: SideLockHandle -> RawFilePath -> RawFilePath -> IO Bool
linkToLock Nothing _ _ = return False
linkToLock :: SideLockHandle -> RawFilePath -> RawFilePath -> IO (Maybe FileStatus)
linkToLock Nothing _ _ = return Nothing
linkToLock (Just _) src dest = do
let probe = src <> ".lnk"
v <- tryIO $ createLink src probe
@ -197,10 +193,13 @@ linkToLock (Just _) src dest = do
Right _ -> do
_ <- tryIO $ createLink src dest
ifM (catchBoolIO checklinked)
( catchBoolIO $ not <$> checkInsaneLustre dest
, return False
( ifM (catchBoolIO $ not <$> checkInsaneLustre dest)
( catchMaybeIO $ getFileStatus dest
, return Nothing
)
Left _ -> catchBoolIO $ do
, return Nothing
)
Left _ -> catchMaybeIO $ do
let setup = do
fd <- openFd dest WriteOnly
(Just $ combineModes readModes)
@ -209,7 +208,7 @@ linkToLock (Just _) src dest = do
let cleanup = hClose
let go h = readFile (fromRawFilePath src) >>= hPutStr h
bracket setup cleanup go
return True
getFileStatus dest
where
checklinked = do
x <- getSymbolicLinkStatus src

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-11-29T18:11:30Z"
content="""
I think that this error comes from Utility.LockFile.PidLock.tryLock,
which has the only getFileStatus involving the pidlock whose exceptions
are not caught. The file is assumed to exist since it was just created,
and normally nothing deletes it.
While looking at where this might come from, I refreshed my memory of
how Lustre can to do insane stuff like having 2 different files with the
same name in a directory. Which checkInsaneLustre tries to deal with
by deleting one of them, but since this is all behavior undefined by POSIX,
maybe that sometimes deletes both of them. Or the file doesn't appear
after being created for some other POSIX-defying reason.
I've changed it to catch exceptions from that getFileStatus, which will
test this theory.
"""]]