catch error statting pid lock file if it somehow does not exist
It ought to exist, since linkToLock has just created it. However, Lustre seems to have a rather probabilisitic view of the contents of a directory, so catching the error if it somehow does not exist and running the same code path that would be ran if linkToLock failed might avoid this fun Lustre failure. Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
parent
567f63ba47
commit
a6699be79d
2 changed files with 32 additions and 13 deletions
|
@ -154,14 +154,11 @@ tryLock lockfile = do
|
||||||
removeWhenExistsWith removeLink tmp'
|
removeWhenExistsWith removeLink tmp'
|
||||||
return Nothing
|
return Nothing
|
||||||
let tooklock st = return $ Just $ LockHandle abslockfile st sidelock
|
let tooklock st = return $ Just $ LockHandle abslockfile st sidelock
|
||||||
ifM (linkToLock sidelock tmp' abslockfile)
|
linkToLock sidelock tmp' abslockfile >>= \case
|
||||||
( do
|
Just lckst -> do
|
||||||
removeWhenExistsWith removeLink tmp'
|
removeWhenExistsWith removeLink tmp'
|
||||||
-- May not have made a hard link, so stat
|
|
||||||
-- the lockfile
|
|
||||||
lckst <- getFileStatus abslockfile
|
|
||||||
tooklock lckst
|
tooklock lckst
|
||||||
, do
|
Nothing -> do
|
||||||
v <- readPidLock abslockfile
|
v <- readPidLock abslockfile
|
||||||
hn <- getHostName
|
hn <- getHostName
|
||||||
tmpst <- getFileStatus tmp'
|
tmpst <- getFileStatus tmp'
|
||||||
|
@ -175,7 +172,6 @@ tryLock lockfile = do
|
||||||
rename tmp' abslockfile
|
rename tmp' abslockfile
|
||||||
tooklock tmpst
|
tooklock tmpst
|
||||||
_ -> failedlock tmpst
|
_ -> failedlock tmpst
|
||||||
)
|
|
||||||
|
|
||||||
-- Linux's open(2) man page recommends linking a pid lock into place,
|
-- Linux's open(2) man page recommends linking a pid lock into place,
|
||||||
-- as the most portable atomic operation that will fail if
|
-- as the most portable atomic operation that will fail if
|
||||||
|
@ -187,8 +183,8 @@ tryLock lockfile = do
|
||||||
--
|
--
|
||||||
-- However, not all filesystems support hard links. So, first probe
|
-- However, not all filesystems support hard links. So, first probe
|
||||||
-- to see if they are supported. If not, use open with O_EXCL.
|
-- to see if they are supported. If not, use open with O_EXCL.
|
||||||
linkToLock :: SideLockHandle -> RawFilePath -> RawFilePath -> IO Bool
|
linkToLock :: SideLockHandle -> RawFilePath -> RawFilePath -> IO (Maybe FileStatus)
|
||||||
linkToLock Nothing _ _ = return False
|
linkToLock Nothing _ _ = return Nothing
|
||||||
linkToLock (Just _) src dest = do
|
linkToLock (Just _) src dest = do
|
||||||
let probe = src <> ".lnk"
|
let probe = src <> ".lnk"
|
||||||
v <- tryIO $ createLink src probe
|
v <- tryIO $ createLink src probe
|
||||||
|
@ -197,10 +193,13 @@ linkToLock (Just _) src dest = do
|
||||||
Right _ -> do
|
Right _ -> do
|
||||||
_ <- tryIO $ createLink src dest
|
_ <- tryIO $ createLink src dest
|
||||||
ifM (catchBoolIO checklinked)
|
ifM (catchBoolIO checklinked)
|
||||||
( catchBoolIO $ not <$> checkInsaneLustre dest
|
( ifM (catchBoolIO $ not <$> checkInsaneLustre dest)
|
||||||
, return False
|
( catchMaybeIO $ getFileStatus dest
|
||||||
|
, return Nothing
|
||||||
)
|
)
|
||||||
Left _ -> catchBoolIO $ do
|
, return Nothing
|
||||||
|
)
|
||||||
|
Left _ -> catchMaybeIO $ do
|
||||||
let setup = do
|
let setup = do
|
||||||
fd <- openFd dest WriteOnly
|
fd <- openFd dest WriteOnly
|
||||||
(Just $ combineModes readModes)
|
(Just $ combineModes readModes)
|
||||||
|
@ -209,7 +208,7 @@ linkToLock (Just _) src dest = do
|
||||||
let cleanup = hClose
|
let cleanup = hClose
|
||||||
let go h = readFile (fromRawFilePath src) >>= hPutStr h
|
let go h = readFile (fromRawFilePath src) >>= hPutStr h
|
||||||
bracket setup cleanup go
|
bracket setup cleanup go
|
||||||
return True
|
getFileStatus dest
|
||||||
where
|
where
|
||||||
checklinked = do
|
checklinked = do
|
||||||
x <- getSymbolicLinkStatus src
|
x <- getSymbolicLinkStatus src
|
||||||
|
|
|
@ -0,0 +1,20 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 1"""
|
||||||
|
date="2021-11-29T18:11:30Z"
|
||||||
|
content="""
|
||||||
|
I think that this error comes from Utility.LockFile.PidLock.tryLock,
|
||||||
|
which has the only getFileStatus involving the pidlock whose exceptions
|
||||||
|
are not caught. The file is assumed to exist since it was just created,
|
||||||
|
and normally nothing deletes it.
|
||||||
|
|
||||||
|
While looking at where this might come from, I refreshed my memory of
|
||||||
|
how Lustre can to do insane stuff like having 2 different files with the
|
||||||
|
same name in a directory. Which checkInsaneLustre tries to deal with
|
||||||
|
by deleting one of them, but since this is all behavior undefined by POSIX,
|
||||||
|
maybe that sometimes deletes both of them. Or the file doesn't appear
|
||||||
|
after being created for some other POSIX-defying reason.
|
||||||
|
|
||||||
|
I've changed it to catch exceptions from that getFileStatus, which will
|
||||||
|
test this theory.
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue