diff --git a/Annex/Ssh.hs b/Annex/Ssh.hs index 10efa9f9ed..4f879436b5 100644 --- a/Annex/Ssh.hs +++ b/Annex/Ssh.hs @@ -136,11 +136,20 @@ prepSocket socketfile = do liftIO $ createDirectoryIfMissing True $ parentDir socketfile lockFileCached $ socket2lock socketfile +{- Find ssh socket files. + - + - The check that the lock file exists makes only socket files + - that were set up by prepSocket be found. On some NFS systems, + - a deleted socket file may linger for a while under another filename; + - and this check makes such files be skipped since the corresponding lock + - file won't exist. + -} enumSocketFiles :: Annex [FilePath] -enumSocketFiles = go =<< sshCacheDir +enumSocketFiles = liftIO . go =<< sshCacheDir where go Nothing = return [] - go (Just dir) = liftIO $ filter (not . isLock) + go (Just dir) = filterM (doesFileExist . socket2lock) + =<< filter (not . isLock) <$> catchDefaultIO [] (dirContents dir) {- Stop any unused ssh connection caching processes. -} diff --git a/CHANGELOG b/CHANGELOG index 4fe08d537d..d91b79c630 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -10,6 +10,8 @@ git-annex (6.20161013) UNRELEASED; urgency=medium * Assistant, repair: Improved filtering out of git fsck lines about duplicate file entries in tree objects. * test: Deal with gpg-agent behavior change that broke the test suite. + * Improve ssh socket cleanup code to skip over the cruft that + NFS sometimes puts in a directory when a file is being deleted. -- Joey Hess Mon, 17 Oct 2016 12:46:54 -0400 diff --git a/doc/bugs/git-annex__58___content_is_locked__while_trying_to_move_under_NFS_and_pidlock/comment_13_05687dca2462eb79872879bd7695d665._comment b/doc/bugs/git-annex__58___content_is_locked__while_trying_to_move_under_NFS_and_pidlock/comment_13_05687dca2462eb79872879bd7695d665._comment new file mode 100644 index 0000000000..64df7bc578 --- /dev/null +++ b/doc/bugs/git-annex__58___content_is_locked__while_trying_to_move_under_NFS_and_pidlock/comment_13_05687dca2462eb79872879bd7695d665._comment @@ -0,0 +1,51 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 13""" + date="2016-10-26T16:34:38Z" + content=""" +Looking at the strace finally... + +Here's git-annex creating the dummy posix lock file, soon after +the ssh socket is removed. (I think that's ssh removing the socket before, +but the jumble of strace output is hard to follow here.) + + 2456732 unlink(".git/annex/ssh/smaug" + ... + 2456732 open(".git/annex/ssh/.nfs00000000099d85d000000002.lock", O_RDONLY|O_CREAT, 0666 + +And later ssh is told to stop using a related socket: + + 2456815 execve("/mnt/btrfs/scrap/datalad/test_nfs/git-annex.linux/bin/ssh", ["ssh", "-O", "stop", "-S", ".nfs00000000099d85d000000002", "-o", "ControlMaster=auto", "-o", "ControlPersist=yes", "localhost"], [/* 98 vars */] + +Which it seems does not exist by then: + + 2456815 connect(3, {sa_family=AF_LOCAL, sun_path=".nfs00000000099d85d000000002"}, 31) = -1 ENOENT (No such file or directory) + 2456732 unlink(".git/annex/ssh/.nfs00000000099d85d000000002") = -1 ENOENT (No such file or directory) + +Seems like this would have to be `sshCleanup` running, and `enumSocketFiles` +seeing ".nfs00000000099d85d000000002" existing at that point due +probably to the ssh/smaug socket being in the process of being +removed. + +So, it assumes it's a socket file and tries to clean it up, creating +the dummy posix lock file. And posix lock files don't get deleted +(it's impossible to do so in a race-free way), so this results in more +and more `.nfs*.lock` files. + +Need to find a way to prevent `enumSocketFiles` from listing these +files. Seems that ".nfs00000000099d85d000000002" is probably the deleted +form of the "smaug" socket, so it really is a socket file, so checking +if a file is a socket won't help. Of course it could + explicitly filter out ".nfs" prefixed files. That could be a case of +kicking the can down the road, since NFS could always choose to use +another name for these files. + +Hmm.. Before a ssh socket file gets created, git-annex always locks the +associated lock file. And as noted, the posix lock file is never +removed. (On Windows the lock file is also created and never deleted.) +So, if $file does not have an associated $file.lock +then $file must not be a ssh socket, and `enumSocketFiles` can skip it. + +I've added that check. Unable to test it because the NFS mount has +been lost in the intervening time. @yoh can you test this fixed it? +"""]]