catch error statting pid lock file if it somehow does not exist
It ought to exist, since linkToLock has just created it. However, Lustre seems to have a rather probabilisitic view of the contents of a directory, so catching the error if it somehow does not exist and running the same code path that would be ran if linkToLock failed might avoid this fun Lustre failure. Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
parent
567f63ba47
commit
a6699be79d
2 changed files with 32 additions and 13 deletions
|
@ -154,14 +154,11 @@ tryLock lockfile = do
|
|||
removeWhenExistsWith removeLink tmp'
|
||||
return Nothing
|
||||
let tooklock st = return $ Just $ LockHandle abslockfile st sidelock
|
||||
ifM (linkToLock sidelock tmp' abslockfile)
|
||||
( do
|
||||
linkToLock sidelock tmp' abslockfile >>= \case
|
||||
Just lckst -> do
|
||||
removeWhenExistsWith removeLink tmp'
|
||||
-- May not have made a hard link, so stat
|
||||
-- the lockfile
|
||||
lckst <- getFileStatus abslockfile
|
||||
tooklock lckst
|
||||
, do
|
||||
Nothing -> do
|
||||
v <- readPidLock abslockfile
|
||||
hn <- getHostName
|
||||
tmpst <- getFileStatus tmp'
|
||||
|
@ -175,7 +172,6 @@ tryLock lockfile = do
|
|||
rename tmp' abslockfile
|
||||
tooklock tmpst
|
||||
_ -> failedlock tmpst
|
||||
)
|
||||
|
||||
-- Linux's open(2) man page recommends linking a pid lock into place,
|
||||
-- as the most portable atomic operation that will fail if
|
||||
|
@ -187,8 +183,8 @@ tryLock lockfile = do
|
|||
--
|
||||
-- However, not all filesystems support hard links. So, first probe
|
||||
-- to see if they are supported. If not, use open with O_EXCL.
|
||||
linkToLock :: SideLockHandle -> RawFilePath -> RawFilePath -> IO Bool
|
||||
linkToLock Nothing _ _ = return False
|
||||
linkToLock :: SideLockHandle -> RawFilePath -> RawFilePath -> IO (Maybe FileStatus)
|
||||
linkToLock Nothing _ _ = return Nothing
|
||||
linkToLock (Just _) src dest = do
|
||||
let probe = src <> ".lnk"
|
||||
v <- tryIO $ createLink src probe
|
||||
|
@ -197,10 +193,13 @@ linkToLock (Just _) src dest = do
|
|||
Right _ -> do
|
||||
_ <- tryIO $ createLink src dest
|
||||
ifM (catchBoolIO checklinked)
|
||||
( catchBoolIO $ not <$> checkInsaneLustre dest
|
||||
, return False
|
||||
( ifM (catchBoolIO $ not <$> checkInsaneLustre dest)
|
||||
( catchMaybeIO $ getFileStatus dest
|
||||
, return Nothing
|
||||
)
|
||||
Left _ -> catchBoolIO $ do
|
||||
, return Nothing
|
||||
)
|
||||
Left _ -> catchMaybeIO $ do
|
||||
let setup = do
|
||||
fd <- openFd dest WriteOnly
|
||||
(Just $ combineModes readModes)
|
||||
|
@ -209,7 +208,7 @@ linkToLock (Just _) src dest = do
|
|||
let cleanup = hClose
|
||||
let go h = readFile (fromRawFilePath src) >>= hPutStr h
|
||||
bracket setup cleanup go
|
||||
return True
|
||||
getFileStatus dest
|
||||
where
|
||||
checklinked = do
|
||||
x <- getSymbolicLinkStatus src
|
||||
|
|
|
@ -0,0 +1,20 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-11-29T18:11:30Z"
|
||||
content="""
|
||||
I think that this error comes from Utility.LockFile.PidLock.tryLock,
|
||||
which has the only getFileStatus involving the pidlock whose exceptions
|
||||
are not caught. The file is assumed to exist since it was just created,
|
||||
and normally nothing deletes it.
|
||||
|
||||
While looking at where this might come from, I refreshed my memory of
|
||||
how Lustre can to do insane stuff like having 2 different files with the
|
||||
same name in a directory. Which checkInsaneLustre tries to deal with
|
||||
by deleting one of them, but since this is all behavior undefined by POSIX,
|
||||
maybe that sometimes deletes both of them. Or the file doesn't appear
|
||||
after being created for some other POSIX-defying reason.
|
||||
|
||||
I've changed it to catch exceptions from that getFileStatus, which will
|
||||
test this theory.
|
||||
"""]]
|
Loading…
Reference in a new issue