always check with ls-files before adding new files

Makes it safe to use git annex unlock with the watcher/assistant.
And also to mix use of the watcher/assistant with regular files stored in git.

Long ago, I had avoided doing this check, except during the startup scan,
because it would be slow to run ls-files repeatedly.

But then I added the lsof check, and to make that fast, got it to detect
batch file adds. So let's move the ls-files check to also occur when it'll
have a batch, and can check them all with one call.

This does slow down adding a single file by just a bit, but really only
a little bit. (The lsof check is probably more expensive.) It also
speeds up the startup scan, especially when there are lots of new files
found by the scan.

Also, fixed the sleep for annex.delayadd to not run while the threadstate
lock is held, so it doesn't unnecessarily freeze everything else.

Also, --force no longer makes it skip the lsof check, which was not
documented, and seems never a good idea.
This commit is contained in:
Joey Hess 2012-10-02 17:34:22 -04:00
parent 717e008390
commit 9aab70de66
6 changed files with 126 additions and 133 deletions

View file

@ -26,6 +26,10 @@ data Change
, changeType :: ChangeType , changeType :: ChangeType
} }
| PendingAddChange | PendingAddChange
{ changeTime ::UTCTime
, changeFile :: FilePath
}
| InProcessAddChange
{ changeTime ::UTCTime { changeTime ::UTCTime
, keySource :: KeySource , keySource :: KeySource
} }
@ -44,17 +48,21 @@ madeChange f t = do
noChange :: Annex (Maybe Change) noChange :: Annex (Maybe Change)
noChange = return Nothing noChange = return Nothing
{- Indicates an add is in progress. -} {- Indicates an add needs to be done, but has not started yet. -}
pendingAddChange :: KeySource -> Annex (Maybe Change) pendingAddChange :: FilePath -> Annex (Maybe Change)
pendingAddChange ks = pendingAddChange f =
liftIO $ Just <$> (PendingAddChange <$> getCurrentTime <*> pure ks) liftIO $ Just <$> (PendingAddChange <$> getCurrentTime <*> pure f)
isPendingAddChange :: Change -> Bool isPendingAddChange :: Change -> Bool
isPendingAddChange (PendingAddChange {}) = True isPendingAddChange (PendingAddChange {}) = True
isPendingAddChange _ = False isPendingAddChange _ = False
isInProcessAddChange :: Change -> Bool
isInProcessAddChange (InProcessAddChange {}) = True
isInProcessAddChange _ = False
finishedChange :: Change -> Change finishedChange :: Change -> Change
finishedChange c@(PendingAddChange { keySource = ks }) = Change finishedChange c@(InProcessAddChange { keySource = ks }) = Change
{ changeTime = changeTime c { changeTime = changeTime c
, changeFile = keyFilename ks , changeFile = keyFilename ks
, changeType = AddChange , changeType = AddChange

View file

@ -5,6 +5,8 @@
- Licensed under the GNU GPL version 3 or higher. - Licensed under the GNU GPL version 3 or higher.
-} -}
{-# LANGUAGE CPP #-}
module Assistant.Threads.Committer where module Assistant.Threads.Committer where
import Assistant.Common import Assistant.Common
@ -16,10 +18,10 @@ import Assistant.Threads.Watcher
import Assistant.TransferQueue import Assistant.TransferQueue
import Assistant.DaemonStatus import Assistant.DaemonStatus
import Logs.Transfer import Logs.Transfer
import qualified Annex
import qualified Annex.Queue import qualified Annex.Queue
import qualified Git.Command import qualified Git.Command
import qualified Git.HashObject import qualified Git.HashObject
import qualified Git.LsFiles
import qualified Git.Version import qualified Git.Version
import Git.Types import Git.Types
import qualified Command.Add import qualified Command.Add
@ -27,6 +29,7 @@ import Utility.ThreadScheduler
import qualified Utility.Lsof as Lsof import qualified Utility.Lsof as Lsof
import qualified Utility.DirWatcher as DirWatcher import qualified Utility.DirWatcher as DirWatcher
import Types.KeySource import Types.KeySource
import Config
import Data.Time.Clock import Data.Time.Clock
import Data.Tuple.Utils import Data.Tuple.Utils
@ -38,28 +41,32 @@ thisThread = "Committer"
{- This thread makes git commits at appropriate times. -} {- This thread makes git commits at appropriate times. -}
commitThread :: ThreadState -> ChangeChan -> CommitChan -> TransferQueue -> DaemonStatusHandle -> NamedThread commitThread :: ThreadState -> ChangeChan -> CommitChan -> TransferQueue -> DaemonStatusHandle -> NamedThread
commitThread st changechan commitchan transferqueue dstatus = thread $ runEvery (Seconds 1) $ do commitThread st changechan commitchan transferqueue dstatus = thread $ do
-- We already waited one second as a simple rate limiter. delayadd <- runThreadState st $
-- Next, wait until at least one change is available for maybe delayaddDefault (Just . Seconds) . readish
-- processing. <$> getConfig (annexConfig "delayadd") ""
changes <- getChanges changechan runEvery (Seconds 1) $ do
-- Now see if now's a good time to commit. -- We already waited one second as a simple rate limiter.
time <- getCurrentTime -- Next, wait until at least one change is available for
if shouldCommit time changes -- processing.
then do changes <- getChanges changechan
readychanges <- handleAdds st changechan transferqueue dstatus changes -- Now see if now's a good time to commit.
if shouldCommit time readychanges time <- getCurrentTime
then do if shouldCommit time changes
debug thisThread then do
[ "committing" readychanges <- handleAdds delayadd st changechan transferqueue dstatus changes
, show (length readychanges) if shouldCommit time readychanges
, "changes" then do
] debug thisThread
void $ alertWhile dstatus commitAlert $ [ "committing"
runThreadState st commitStaged , show (length readychanges)
recordCommit commitchan (Commit time) , "changes"
else refill readychanges ]
else refill changes void $ alertWhile dstatus commitAlert $
runThreadState st commitStaged
recordCommit commitchan (Commit time)
else refill readychanges
else refill changes
where where
thread = NamedThread thisThread thread = NamedThread thisThread
refill [] = noop refill [] = noop
@ -109,13 +116,24 @@ shouldCommit now changes
len = length changes len = length changes
thisSecond c = now `diffUTCTime` changeTime c <= 1 thisSecond c = now `diffUTCTime` changeTime c <= 1
{- If there are PendingAddChanges, the files have not yet actually been {- OSX needs a short delay after a file is added before locking it down,
- added to the annex (probably), and that has to be done now, before - as pasting a file seems to try to set file permissions or otherwise
- committing. - access the file after closing it. -}
delayaddDefault :: Maybe Seconds
#ifdef darwin_HOST_OS
delayaddDefault = Just $ Seconds 1
#else
delayaddDefault = Nothing
#endif
{- If there are PendingAddChanges, or InProcessAddChanges, the files
- have not yet actually been added to the annex, and that has to be done
- now, before committing.
- -
- Deferring the adds to this point causes batches to be bundled together, - Deferring the adds to this point causes batches to be bundled together,
- which allows faster checking with lsof that the files are not still open - which allows faster checking with lsof that the files are not still open
- for write by some other process. - for write by some other process, and faster checking with git-ls-files
- that the files are not already checked into git.
- -
- When a file is added, Inotify will notice the new symlink. So this waits - When a file is added, Inotify will notice the new symlink. So this waits
- for additional Changes to arrive, so that the symlink has hopefully been - for additional Changes to arrive, so that the symlink has hopefully been
@ -128,10 +146,11 @@ shouldCommit now changes
- Any pending adds that are not ready yet are put back into the ChangeChan, - Any pending adds that are not ready yet are put back into the ChangeChan,
- where they will be retried later. - where they will be retried later.
-} -}
handleAdds :: ThreadState -> ChangeChan -> TransferQueue -> DaemonStatusHandle -> [Change] -> IO [Change] handleAdds :: Maybe Seconds -> ThreadState -> ChangeChan -> TransferQueue -> DaemonStatusHandle -> [Change] -> IO [Change]
handleAdds st changechan transferqueue dstatus cs = returnWhen (null pendingadds) $ do handleAdds delayadd st changechan transferqueue dstatus cs = returnWhen (null incomplete) $ do
(postponed, toadd) <- partitionEithers <$> let (pending, inprocess) = partition isPendingAddChange incomplete
safeToAdd st pendingadds pending' <- findnew pending
(postponed, toadd) <- partitionEithers <$> safeToAdd delayadd st pending' inprocess
unless (null postponed) $ unless (null postponed) $
refillChanges changechan postponed refillChanges changechan postponed
@ -141,18 +160,26 @@ handleAdds st changechan transferqueue dstatus cs = returnWhen (null pendingadds
if DirWatcher.eventsCoalesce || null added if DirWatcher.eventsCoalesce || null added
then return $ added ++ otherchanges then return $ added ++ otherchanges
else do else do
r <- handleAdds st changechan transferqueue dstatus r <- handleAdds delayadd st changechan transferqueue dstatus
=<< getChanges changechan =<< getChanges changechan
return $ r ++ added ++ otherchanges return $ r ++ added ++ otherchanges
where where
(pendingadds, otherchanges) = partition isPendingAddChange cs (incomplete, otherchanges) = partition (\c -> isPendingAddChange c || isInProcessAddChange c) cs
findnew [] = return []
findnew pending = do
newfiles <- runThreadState st $
inRepo (Git.LsFiles.notInRepo False $ map changeFile pending)
-- note: timestamp info is lost here
let ts = changeTime (pending !! 0)
return $ map (PendingAddChange ts) newfiles
returnWhen c a returnWhen c a
| c = return otherchanges | c = return otherchanges
| otherwise = a | otherwise = a
add :: Change -> IO (Maybe Change) add :: Change -> IO (Maybe Change)
add change@(PendingAddChange { keySource = ks }) = add change@(InProcessAddChange { keySource = ks }) =
alertWhile' dstatus (addFileAlert $ keyFilename ks) $ alertWhile' dstatus (addFileAlert $ keyFilename ks) $
liftM ret $ catchMaybeIO $ liftM ret $ catchMaybeIO $
sanitycheck ks $ runThreadState st $ do sanitycheck ks $ runThreadState st $ do
@ -190,38 +217,43 @@ handleAdds st changechan transferqueue dstatus cs = returnWhen (null pendingadds
then a then a
else return Nothing else return Nothing
{- PendingAddChanges can Either be Right to be added now, {- Files can Either be Right to be added now,
- or are unsafe, and must be Left for later. - or are unsafe, and must be Left for later.
- -
- Check by running lsof on the temp directory, which - Check by running lsof on the temp directory, which
- the KeySources are locked down in. - the KeySources are locked down in.
-} -}
safeToAdd :: ThreadState -> [Change] -> IO [Either Change Change] safeToAdd :: Maybe Seconds -> ThreadState -> [Change] -> [Change] -> IO [Either Change Change]
safeToAdd st changes = runThreadState st $ safeToAdd _ _ [] [] = return []
ifM (Annex.getState Annex.force) safeToAdd delayadd st pending inprocess = do
( allRight changes -- force bypasses lsof check maybe noop threadDelaySeconds delayadd
, do runThreadState st $ do
tmpdir <- fromRepo gitAnnexTmpDir keysources <- mapM Command.Add.lockDown (map changeFile pending)
openfiles <- S.fromList . map fst3 . filter openwrite <$> let inprocess' = map mkinprocess (zip pending keysources)
liftIO (Lsof.queryDir tmpdir) tmpdir <- fromRepo gitAnnexTmpDir
openfiles <- S.fromList . map fst3 . filter openwrite <$>
liftIO (Lsof.queryDir tmpdir)
let checked = map (check openfiles) $ inprocess ++ inprocess'
let checked = map (check openfiles) changes {- If new events are received when files are closed,
- there's no need to retry any changes that cannot
{- If new events are received when files are closed, - be done now. -}
- there's no need to retry any changes that cannot if DirWatcher.closingTracked
- be done now. -} then do
if DirWatcher.closingTracked mapM_ canceladd $ lefts checked
then do allRight $ rights checked
mapM_ canceladd $ lefts checked else return checked
allRight $ rights checked
else return checked
)
where where
check openfiles change@(PendingAddChange { keySource = ks }) check openfiles change@(InProcessAddChange { keySource = ks })
| S.member (contentLocation ks) openfiles = Left change | S.member (contentLocation ks) openfiles = Left change
check _ change = Right change check _ change = Right change
canceladd (PendingAddChange { keySource = ks }) = do mkinprocess (c, ks) = InProcessAddChange
{ changeTime = changeTime c
, keySource = ks
}
canceladd (InProcessAddChange { keySource = ks }) = do
warning $ keyFilename ks warning $ keyFilename ks
++ " still has writers, not adding" ++ " still has writers, not adding"
-- remove the hard link -- remove the hard link

View file

@ -93,7 +93,7 @@ check st dstatus transferqueue changechan = do
runThreadState st $ warning msg runThreadState st $ warning msg
void $ addAlert dstatus $ sanityCheckFixAlert msg void $ addAlert dstatus $ sanityCheckFixAlert msg
addsymlink file s = do addsymlink file s = do
Watcher.runHandler thisThread Nothing st dstatus Watcher.runHandler thisThread st dstatus
transferqueue changechan transferqueue changechan
Watcher.onAddSymlink file s Watcher.onAddSymlink file s
insanity $ "found unstaged symlink: " ++ file insanity $ "found unstaged symlink: " ++ file

View file

@ -5,8 +5,6 @@
- Licensed under the GNU GPL version 3 or higher. - Licensed under the GNU GPL version 3 or higher.
-} -}
{-# LANGUAGE CPP #-}
module Assistant.Threads.Watcher ( module Assistant.Threads.Watcher (
watchThread, watchThread,
checkCanWatch, checkCanWatch,
@ -30,14 +28,10 @@ import qualified Annex.Queue
import qualified Git.Command import qualified Git.Command
import qualified Git.UpdateIndex import qualified Git.UpdateIndex
import qualified Git.HashObject import qualified Git.HashObject
import qualified Git.LsFiles
import qualified Backend import qualified Backend
import qualified Command.Add
import Annex.Content import Annex.Content
import Annex.CatFile import Annex.CatFile
import Git.Types import Git.Types
import Config
import Utility.ThreadScheduler
import Data.Bits.Utils import Data.Bits.Utils
import qualified Data.ByteString.Lazy as L import qualified Data.ByteString.Lazy as L
@ -60,32 +54,19 @@ needLsof = error $ unlines
, "Be warned: This can corrupt data in the annex, and make fsck complain." , "Be warned: This can corrupt data in the annex, and make fsck complain."
] ]
{- OSX needs a short delay after a file is added before locking it down,
- as pasting a file seems to try to set file permissions or otherwise
- access the file after closing it. -}
delayaddDefault :: Maybe Seconds
#ifdef darwin_HOST_OS
delayaddDefault = Just $ Seconds 1
#else
delayaddDefault = Nothing
#endif
watchThread :: ThreadState -> DaemonStatusHandle -> TransferQueue -> ChangeChan -> NamedThread watchThread :: ThreadState -> DaemonStatusHandle -> TransferQueue -> ChangeChan -> NamedThread
watchThread st dstatus transferqueue changechan = NamedThread thisThread $ do watchThread st dstatus transferqueue changechan = NamedThread thisThread $ do
delayadd <- runThreadState st $ void $ watchDir "." ignored hooks startup
maybe delayaddDefault (Just . Seconds) . readish
<$> getConfig (annexConfig "delayadd") ""
void $ watchDir "." ignored (hooks delayadd) startup
debug thisThread [ "watching", "."] debug thisThread [ "watching", "."]
where where
startup = startupScan st dstatus startup = startupScan st dstatus
hook delay a = Just $ runHandler thisThread delay st dstatus transferqueue changechan a hook a = Just $ runHandler thisThread st dstatus transferqueue changechan a
hooks delayadd = mkWatchHooks hooks = mkWatchHooks
{ addHook = hook delayadd onAdd { addHook = hook onAdd
, delHook = hook Nothing onDel , delHook = hook onDel
, addSymlinkHook = hook Nothing onAddSymlink , addSymlinkHook = hook onAddSymlink
, delDirHook = hook Nothing onDelDir , delDirHook = hook onDelDir
, errHook = hook Nothing onErr , errHook = hook onErr
} }
{- Initial scartup scan. The action should return once the scan is complete. -} {- Initial scartup scan. The action should return once the scan is complete. -}
@ -113,65 +94,35 @@ ignored = ig . takeFileName
ig ".gitattributes" = True ig ".gitattributes" = True
ig _ = False ig _ = False
type Handler = ThreadName -> Maybe Seconds -> FilePath -> Maybe FileStatus -> DaemonStatusHandle -> TransferQueue -> Annex (Maybe Change) type Handler = ThreadName -> FilePath -> Maybe FileStatus -> DaemonStatusHandle -> TransferQueue -> Annex (Maybe Change)
{- Runs an action handler, inside the Annex monad, and if there was a {- Runs an action handler, inside the Annex monad, and if there was a
- change, adds it to the ChangeChan. - change, adds it to the ChangeChan.
- -
- Exceptions are ignored, otherwise a whole watcher thread could be crashed. - Exceptions are ignored, otherwise a whole watcher thread could be crashed.
-} -}
runHandler :: ThreadName -> Maybe Seconds -> ThreadState -> DaemonStatusHandle -> TransferQueue -> ChangeChan -> Handler -> FilePath -> Maybe FileStatus -> IO () runHandler :: ThreadName -> ThreadState -> DaemonStatusHandle -> TransferQueue -> ChangeChan -> Handler -> FilePath -> Maybe FileStatus -> IO ()
runHandler threadname delay st dstatus transferqueue changechan handler file filestatus = void $ do runHandler threadname st dstatus transferqueue changechan handler file filestatus = void $ do
r <- tryIO go r <- tryIO go
case r of case r of
Left e -> print e Left e -> print e
Right Nothing -> noop Right Nothing -> noop
Right (Just change) -> recordChange changechan change Right (Just change) -> recordChange changechan change
where where
go = runThreadState st $ handler threadname delay file filestatus dstatus transferqueue go = runThreadState st $ handler threadname file filestatus dstatus transferqueue
{- During initial directory scan, this will be run for any regular files
- that are already checked into git. We don't want to turn those into
- symlinks, so do a check. This is rather expensive, but only happens
- during startup.
-
- It's possible for the file to still be open for write by some process.
- This can happen in a few ways; one is if two processes had the file open
- and only one has just closed it. We want to avoid adding a file to the
- annex that is open for write, to avoid anything being able to change it.
-
- We could run lsof on the file here to check for other writers.
- But, that's slow, and even if there is currently a writer, we will want
- to add the file *eventually*. Instead, the file is locked down as a hard
- link in a temp directory, with its write bits disabled, for later
- checking with lsof, and a Change is returned containing a KeySource
- using that hard link. The committer handles running lsof and finishing
- the add.
-}
onAdd :: Handler onAdd :: Handler
onAdd threadname delay file filestatus dstatus _ onAdd _ file filestatus _ _
| maybe False isRegularFile filestatus = | maybe False isRegularFile filestatus = pendingAddChange file
ifM (scanComplete <$> liftIO (getDaemonStatus dstatus))
( go
, ifM (null <$> inRepo (Git.LsFiles.notInRepo False [file]))
( noChange
, go
)
)
| otherwise = noChange | otherwise = noChange
where where
go = do
liftIO $ do
debug threadname ["file added", file]
maybe noop threadDelaySeconds delay
pendingAddChange =<< Command.Add.lockDown file
{- A symlink might be an arbitrary symlink, which is just added. {- A symlink might be an arbitrary symlink, which is just added.
- Or, if it is a git-annex symlink, ensure it points to the content - Or, if it is a git-annex symlink, ensure it points to the content
- before adding it. - before adding it.
-} -}
onAddSymlink :: Handler onAddSymlink :: Handler
onAddSymlink threadname _ file filestatus dstatus transferqueue = go =<< Backend.lookupFile file onAddSymlink threadname file filestatus dstatus transferqueue = go =<< Backend.lookupFile file
where where
go (Just (key, _)) = do go (Just (key, _)) = do
link <- calcGitLink file key link <- calcGitLink file key
@ -232,7 +183,7 @@ onAddSymlink threadname _ file filestatus dstatus transferqueue = go =<< Backend
| otherwise = noop | otherwise = noop
onDel :: Handler onDel :: Handler
onDel threadname _ file _ _dstatus _ = do onDel threadname file _ _dstatus _ = do
liftIO $ debug threadname ["file deleted", file] liftIO $ debug threadname ["file deleted", file]
Annex.Queue.addUpdateIndex =<< Annex.Queue.addUpdateIndex =<<
inRepo (Git.UpdateIndex.unstageFile file) inRepo (Git.UpdateIndex.unstageFile file)
@ -246,7 +197,7 @@ onDel threadname _ file _ _dstatus _ = do
- command to get the recursive list of files in the directory, so rm is - command to get the recursive list of files in the directory, so rm is
- just as good. -} - just as good. -}
onDelDir :: Handler onDelDir :: Handler
onDelDir threadname _ dir _ _dstatus _ = do onDelDir threadname dir _ _dstatus _ = do
liftIO $ debug threadname ["directory deleted", dir] liftIO $ debug threadname ["directory deleted", dir]
Annex.Queue.addCommand "rm" Annex.Queue.addCommand "rm"
[Params "--quiet -r --cached --ignore-unmatch --"] [dir] [Params "--quiet -r --cached --ignore-unmatch --"] [dir]
@ -254,7 +205,7 @@ onDelDir threadname _ dir _ _dstatus _ = do
{- Called when there's an error with inotify or kqueue. -} {- Called when there's an error with inotify or kqueue. -}
onErr :: Handler onErr :: Handler
onErr _ _ msg _ dstatus _ = do onErr _ msg _ dstatus _ = do
warning msg warning msg
void $ liftIO $ addAlert dstatus $ warningAlert "watcher" msg void $ liftIO $ addAlert dstatus $ warningAlert "watcher" msg
return Nothing return Nothing

3
debian/changelog vendored
View file

@ -3,6 +3,9 @@ git-annex (3.20121002) UNRELEASED; urgency=low
* group, ungroup: New commands to indicate groups of repositories. * group, ungroup: New commands to indicate groups of repositories.
* --copies=group:number can now be used to match files that are present * --copies=group:number can now be used to match files that are present
in a specified number of repositories in a group. in a specified number of repositories in a group.
* watch, assistant: It's now safe to git annex unlock files while
the watcher is running, as well as modify files checked into git
as normal files.
-- Joey Hess <joeyh@debian.org> Mon, 01 Oct 2012 15:09:49 -0400 -- Joey Hess <joeyh@debian.org> Mon, 01 Oct 2012 15:09:49 -0400

View file

@ -6,10 +6,6 @@ available!
## known bugs ## known bugs
* If a file is checked into git as a normal file and gets modified
(or merged, etc), it will be converted into an annexed file.
See [[blog/day_7__bugfixes]].
* When you `git annex unlock` a file, it will immediately be re-locked. * When you `git annex unlock` a file, it will immediately be re-locked.
See [[bugs/watcher_commits_unlocked_files]]. See [[bugs/watcher_commits_unlocked_files]].
@ -203,3 +199,6 @@ Many races need to be dealt with by this code. Here are some of them.
injected into the annex, where it could be opened for write again. injected into the annex, where it could be opened for write again.
Would need to detect that and undo the annex injection or something. Would need to detect that and undo the annex injection or something.
- If a file is checked into git as a normal file and gets modified
(or merged, etc), it will be converted into an annexed file.
See [[blog/day_7__bugfixes]]. **done**; we always check ls-files now