Improve ssh socket cleanup code to skip over the cruft that NFS sometimes puts in a directory when a file is being deleted.
This commit is contained in:
parent
ce5f407b4c
commit
1a8ba7eab4
3 changed files with 64 additions and 2 deletions
13
Annex/Ssh.hs
13
Annex/Ssh.hs
|
@ -136,11 +136,20 @@ prepSocket socketfile = do
|
||||||
liftIO $ createDirectoryIfMissing True $ parentDir socketfile
|
liftIO $ createDirectoryIfMissing True $ parentDir socketfile
|
||||||
lockFileCached $ socket2lock socketfile
|
lockFileCached $ socket2lock socketfile
|
||||||
|
|
||||||
|
{- Find ssh socket files.
|
||||||
|
-
|
||||||
|
- The check that the lock file exists makes only socket files
|
||||||
|
- that were set up by prepSocket be found. On some NFS systems,
|
||||||
|
- a deleted socket file may linger for a while under another filename;
|
||||||
|
- and this check makes such files be skipped since the corresponding lock
|
||||||
|
- file won't exist.
|
||||||
|
-}
|
||||||
enumSocketFiles :: Annex [FilePath]
|
enumSocketFiles :: Annex [FilePath]
|
||||||
enumSocketFiles = go =<< sshCacheDir
|
enumSocketFiles = liftIO . go =<< sshCacheDir
|
||||||
where
|
where
|
||||||
go Nothing = return []
|
go Nothing = return []
|
||||||
go (Just dir) = liftIO $ filter (not . isLock)
|
go (Just dir) = filterM (doesFileExist . socket2lock)
|
||||||
|
=<< filter (not . isLock)
|
||||||
<$> catchDefaultIO [] (dirContents dir)
|
<$> catchDefaultIO [] (dirContents dir)
|
||||||
|
|
||||||
{- Stop any unused ssh connection caching processes. -}
|
{- Stop any unused ssh connection caching processes. -}
|
||||||
|
|
|
@ -10,6 +10,8 @@ git-annex (6.20161013) UNRELEASED; urgency=medium
|
||||||
* Assistant, repair: Improved filtering out of git fsck lines about
|
* Assistant, repair: Improved filtering out of git fsck lines about
|
||||||
duplicate file entries in tree objects.
|
duplicate file entries in tree objects.
|
||||||
* test: Deal with gpg-agent behavior change that broke the test suite.
|
* test: Deal with gpg-agent behavior change that broke the test suite.
|
||||||
|
* Improve ssh socket cleanup code to skip over the cruft that
|
||||||
|
NFS sometimes puts in a directory when a file is being deleted.
|
||||||
|
|
||||||
-- Joey Hess <id@joeyh.name> Mon, 17 Oct 2016 12:46:54 -0400
|
-- Joey Hess <id@joeyh.name> Mon, 17 Oct 2016 12:46:54 -0400
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,51 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 13"""
|
||||||
|
date="2016-10-26T16:34:38Z"
|
||||||
|
content="""
|
||||||
|
Looking at the strace finally...
|
||||||
|
|
||||||
|
Here's git-annex creating the dummy posix lock file, soon after
|
||||||
|
the ssh socket is removed. (I think that's ssh removing the socket before,
|
||||||
|
but the jumble of strace output is hard to follow here.)
|
||||||
|
|
||||||
|
2456732 unlink(".git/annex/ssh/smaug" <unfinished ...>
|
||||||
|
...
|
||||||
|
2456732 open(".git/annex/ssh/.nfs00000000099d85d000000002.lock", O_RDONLY|O_CREAT, 0666 <unfinished ...>
|
||||||
|
|
||||||
|
And later ssh is told to stop using a related socket:
|
||||||
|
|
||||||
|
2456815 execve("/mnt/btrfs/scrap/datalad/test_nfs/git-annex.linux/bin/ssh", ["ssh", "-O", "stop", "-S", ".nfs00000000099d85d000000002", "-o", "ControlMaster=auto", "-o", "ControlPersist=yes", "localhost"], [/* 98 vars */] <unfinished ...>
|
||||||
|
|
||||||
|
Which it seems does not exist by then:
|
||||||
|
|
||||||
|
2456815 connect(3, {sa_family=AF_LOCAL, sun_path=".nfs00000000099d85d000000002"}, 31) = -1 ENOENT (No such file or directory)
|
||||||
|
2456732 unlink(".git/annex/ssh/.nfs00000000099d85d000000002") = -1 ENOENT (No such file or directory)
|
||||||
|
|
||||||
|
Seems like this would have to be `sshCleanup` running, and `enumSocketFiles`
|
||||||
|
seeing ".nfs00000000099d85d000000002" existing at that point due
|
||||||
|
probably to the ssh/smaug socket being in the process of being
|
||||||
|
removed.
|
||||||
|
|
||||||
|
So, it assumes it's a socket file and tries to clean it up, creating
|
||||||
|
the dummy posix lock file. And posix lock files don't get deleted
|
||||||
|
(it's impossible to do so in a race-free way), so this results in more
|
||||||
|
and more `.nfs*.lock` files.
|
||||||
|
|
||||||
|
Need to find a way to prevent `enumSocketFiles` from listing these
|
||||||
|
files. Seems that ".nfs00000000099d85d000000002" is probably the deleted
|
||||||
|
form of the "smaug" socket, so it really is a socket file, so checking
|
||||||
|
if a file is a socket won't help. Of course it could
|
||||||
|
explicitly filter out ".nfs" prefixed files. That could be a case of
|
||||||
|
kicking the can down the road, since NFS could always choose to use
|
||||||
|
another name for these files.
|
||||||
|
|
||||||
|
Hmm.. Before a ssh socket file gets created, git-annex always locks the
|
||||||
|
associated lock file. And as noted, the posix lock file is never
|
||||||
|
removed. (On Windows the lock file is also created and never deleted.)
|
||||||
|
So, if $file does not have an associated $file.lock
|
||||||
|
then $file must not be a ssh socket, and `enumSocketFiles` can skip it.
|
||||||
|
|
||||||
|
I've added that check. Unable to test it because the NFS mount has
|
||||||
|
been lost in the intervening time. @yoh can you test this fixed it?
|
||||||
|
"""]]
|
Loading…
Reference in a new issue