2012-06-11 19:08:04 +00:00
|
|
|
{-# LANGUAGE CPP #-}
|
|
|
|
{-# LANGUAGE BangPatterns #-}
|
|
|
|
|
|
|
|
{- git-annex watch daemon
|
|
|
|
-
|
|
|
|
- Overview of threads and MVars, etc:
|
|
|
|
-
|
|
|
|
- Thread 1: Parent
|
|
|
|
- The initial thread run, double forks to background, starts other
|
|
|
|
- threads, and then stops, waiting for them to terminate.
|
|
|
|
- Thread 2: inotify
|
|
|
|
- Notices new files, and calls handlers for events, queuing changes.
|
|
|
|
- Thread 3: inotify internal
|
|
|
|
- Used by haskell inotify library to ensure inotify event buffer is
|
|
|
|
- kept drained.
|
|
|
|
- Thread 4: committer
|
|
|
|
- Waits for changes to occur, and runs the git queue to update its
|
|
|
|
- index, then commits.
|
|
|
|
-
|
|
|
|
- State MVar:
|
|
|
|
- The Annex state is stored here, which allows recuscitating the
|
|
|
|
- Annex monad in IO actions run by the inotify and committer
|
|
|
|
- threads. Thus, a single state is shared amoung the threads, and
|
|
|
|
- only one at a time can access it.
|
|
|
|
- ChangeChan STM TChan:
|
|
|
|
- Changes are indicated by writing to this channel. The committer
|
|
|
|
- reads from it.
|
watch subcommand
So far this only handles auto-annexing new files that are created inside
the repository while it's running. To make this really useful,
it needs to at least:
- notice deleted files and stage the deletion
(tricky; there's a race with add..)
- notice renamed files, auto-fix the symlink, and stage the new file location
- periodically auto-commit staged changes
- honor .gitignore, not adding files it excludes
Also nice to have would be:
- Somehow sync remotes, possibly using a push sync like dvcs-autosync
does, so they are immediately updated.
- Somehow get content that is unavilable. This is problimatic with inotify,
since we only get an event once the user has tried (and failed) to read
from the file. Perhaps instead, automatically copy content that is added
out to remotes, with the goal of all repos eventually getting a copy,
if df allows.
- Drop files that have not been used lately, or meet some other criteria
(as long as there's a copy elsewhere).
- Perhaps automatically dropunused files that have been deleted,
although I cannot see a way to do that, since by the time the inotify
deletion event arrives, the file is deleted, and we cannot see what
its symlink pointed to! Alternatievely, perhaps automatically
do an expensive unused/dropunused cleanup process.
Some of this probably needs the currently stateless threads to maintain
a common state.
2012-04-12 21:29:43 +00:00
|
|
|
-
|
|
|
|
- Copyright 2012 Joey Hess <joey@kitenet.net>
|
|
|
|
-
|
|
|
|
- Licensed under the GNU GPL version 3 or higher.
|
|
|
|
-}
|
|
|
|
|
|
|
|
module Command.Watch where
|
|
|
|
|
|
|
|
import Common.Annex
|
|
|
|
import Command
|
2012-06-11 04:39:09 +00:00
|
|
|
import Utility.Daemon
|
|
|
|
import Utility.LogFile
|
2012-06-04 23:43:29 +00:00
|
|
|
import Utility.ThreadLock
|
2012-06-04 19:10:43 +00:00
|
|
|
import qualified Annex
|
2012-06-07 19:40:44 +00:00
|
|
|
import qualified Annex.Queue
|
2012-06-06 18:29:10 +00:00
|
|
|
import qualified Command.Add
|
2012-06-04 17:30:30 +00:00
|
|
|
import qualified Git.Command
|
2012-06-06 18:29:10 +00:00
|
|
|
import qualified Git.UpdateIndex
|
2012-06-10 23:58:34 +00:00
|
|
|
import qualified Git.HashObject
|
2012-06-04 19:10:43 +00:00
|
|
|
import qualified Backend
|
|
|
|
import Annex.Content
|
2012-06-10 23:58:34 +00:00
|
|
|
import Annex.CatFile
|
|
|
|
import Git.Types
|
2012-06-11 04:39:09 +00:00
|
|
|
import Option
|
watch subcommand
So far this only handles auto-annexing new files that are created inside
the repository while it's running. To make this really useful,
it needs to at least:
- notice deleted files and stage the deletion
(tricky; there's a race with add..)
- notice renamed files, auto-fix the symlink, and stage the new file location
- periodically auto-commit staged changes
- honor .gitignore, not adding files it excludes
Also nice to have would be:
- Somehow sync remotes, possibly using a push sync like dvcs-autosync
does, so they are immediately updated.
- Somehow get content that is unavilable. This is problimatic with inotify,
since we only get an event once the user has tried (and failed) to read
from the file. Perhaps instead, automatically copy content that is added
out to remotes, with the goal of all repos eventually getting a copy,
if df allows.
- Drop files that have not been used lately, or meet some other criteria
(as long as there's a copy elsewhere).
- Perhaps automatically dropunused files that have been deleted,
although I cannot see a way to do that, since by the time the inotify
deletion event arrives, the file is deleted, and we cannot see what
its symlink pointed to! Alternatievely, perhaps automatically
do an expensive unused/dropunused cleanup process.
Some of this probably needs the currently stateless threads to maintain
a common state.
2012-04-12 21:29:43 +00:00
|
|
|
|
2012-06-10 17:56:39 +00:00
|
|
|
import Control.Concurrent
|
smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.
I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)
Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.
There is one bug, for some reason it does double commits, every time.
2012-06-10 20:07:48 +00:00
|
|
|
import Control.Concurrent.STM
|
|
|
|
import Data.Time.Clock
|
2012-06-10 23:58:34 +00:00
|
|
|
import Data.Bits.Utils
|
|
|
|
import qualified Data.ByteString.Lazy as L
|
watch subcommand
So far this only handles auto-annexing new files that are created inside
the repository while it's running. To make this really useful,
it needs to at least:
- notice deleted files and stage the deletion
(tricky; there's a race with add..)
- notice renamed files, auto-fix the symlink, and stage the new file location
- periodically auto-commit staged changes
- honor .gitignore, not adding files it excludes
Also nice to have would be:
- Somehow sync remotes, possibly using a push sync like dvcs-autosync
does, so they are immediately updated.
- Somehow get content that is unavilable. This is problimatic with inotify,
since we only get an event once the user has tried (and failed) to read
from the file. Perhaps instead, automatically copy content that is added
out to remotes, with the goal of all repos eventually getting a copy,
if df allows.
- Drop files that have not been used lately, or meet some other criteria
(as long as there's a copy elsewhere).
- Perhaps automatically dropunused files that have been deleted,
although I cannot see a way to do that, since by the time the inotify
deletion event arrives, the file is deleted, and we cannot see what
its symlink pointed to! Alternatievely, perhaps automatically
do an expensive unused/dropunused cleanup process.
Some of this probably needs the currently stateless threads to maintain
a common state.
2012-04-12 21:29:43 +00:00
|
|
|
|
2012-06-07 02:49:32 +00:00
|
|
|
#if defined linux_HOST_OS
|
smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.
I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)
Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.
There is one bug, for some reason it does double commits, every time.
2012-06-10 20:07:48 +00:00
|
|
|
import Utility.Inotify
|
2012-06-07 02:49:32 +00:00
|
|
|
import System.INotify
|
|
|
|
#endif
|
|
|
|
|
2012-06-10 22:29:05 +00:00
|
|
|
type ChangeChan = TChan Change
|
|
|
|
|
|
|
|
type Handler = FilePath -> Annex (Maybe Change)
|
|
|
|
|
|
|
|
data Change = Change
|
|
|
|
{ changeTime :: UTCTime
|
|
|
|
, changeFile :: FilePath
|
|
|
|
, changeDesc :: String
|
|
|
|
}
|
|
|
|
deriving (Show)
|
smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.
I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)
Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.
There is one bug, for some reason it does double commits, every time.
2012-06-10 20:07:48 +00:00
|
|
|
|
watch subcommand
So far this only handles auto-annexing new files that are created inside
the repository while it's running. To make this really useful,
it needs to at least:
- notice deleted files and stage the deletion
(tricky; there's a race with add..)
- notice renamed files, auto-fix the symlink, and stage the new file location
- periodically auto-commit staged changes
- honor .gitignore, not adding files it excludes
Also nice to have would be:
- Somehow sync remotes, possibly using a push sync like dvcs-autosync
does, so they are immediately updated.
- Somehow get content that is unavilable. This is problimatic with inotify,
since we only get an event once the user has tried (and failed) to read
from the file. Perhaps instead, automatically copy content that is added
out to remotes, with the goal of all repos eventually getting a copy,
if df allows.
- Drop files that have not been used lately, or meet some other criteria
(as long as there's a copy elsewhere).
- Perhaps automatically dropunused files that have been deleted,
although I cannot see a way to do that, since by the time the inotify
deletion event arrives, the file is deleted, and we cannot see what
its symlink pointed to! Alternatievely, perhaps automatically
do an expensive unused/dropunused cleanup process.
Some of this probably needs the currently stateless threads to maintain
a common state.
2012-04-12 21:29:43 +00:00
|
|
|
def :: [Command]
|
2012-06-11 06:01:20 +00:00
|
|
|
def = [withOptions [foregroundOption, stopOption] $
|
2012-06-11 04:39:09 +00:00
|
|
|
command "watch" paramPaths seek "watch for changes"]
|
watch subcommand
So far this only handles auto-annexing new files that are created inside
the repository while it's running. To make this really useful,
it needs to at least:
- notice deleted files and stage the deletion
(tricky; there's a race with add..)
- notice renamed files, auto-fix the symlink, and stage the new file location
- periodically auto-commit staged changes
- honor .gitignore, not adding files it excludes
Also nice to have would be:
- Somehow sync remotes, possibly using a push sync like dvcs-autosync
does, so they are immediately updated.
- Somehow get content that is unavilable. This is problimatic with inotify,
since we only get an event once the user has tried (and failed) to read
from the file. Perhaps instead, automatically copy content that is added
out to remotes, with the goal of all repos eventually getting a copy,
if df allows.
- Drop files that have not been used lately, or meet some other criteria
(as long as there's a copy elsewhere).
- Perhaps automatically dropunused files that have been deleted,
although I cannot see a way to do that, since by the time the inotify
deletion event arrives, the file is deleted, and we cannot see what
its symlink pointed to! Alternatievely, perhaps automatically
do an expensive unused/dropunused cleanup process.
Some of this probably needs the currently stateless threads to maintain
a common state.
2012-04-12 21:29:43 +00:00
|
|
|
|
|
|
|
seek :: [CommandSeek]
|
2012-06-11 06:01:20 +00:00
|
|
|
seek = [withFlag stopOption $ \stopdaemon ->
|
|
|
|
withFlag foregroundOption $ \foreground ->
|
|
|
|
withNothing $ start foreground stopdaemon]
|
watch subcommand
So far this only handles auto-annexing new files that are created inside
the repository while it's running. To make this really useful,
it needs to at least:
- notice deleted files and stage the deletion
(tricky; there's a race with add..)
- notice renamed files, auto-fix the symlink, and stage the new file location
- periodically auto-commit staged changes
- honor .gitignore, not adding files it excludes
Also nice to have would be:
- Somehow sync remotes, possibly using a push sync like dvcs-autosync
does, so they are immediately updated.
- Somehow get content that is unavilable. This is problimatic with inotify,
since we only get an event once the user has tried (and failed) to read
from the file. Perhaps instead, automatically copy content that is added
out to remotes, with the goal of all repos eventually getting a copy,
if df allows.
- Drop files that have not been used lately, or meet some other criteria
(as long as there's a copy elsewhere).
- Perhaps automatically dropunused files that have been deleted,
although I cannot see a way to do that, since by the time the inotify
deletion event arrives, the file is deleted, and we cannot see what
its symlink pointed to! Alternatievely, perhaps automatically
do an expensive unused/dropunused cleanup process.
Some of this probably needs the currently stateless threads to maintain
a common state.
2012-04-12 21:29:43 +00:00
|
|
|
|
2012-06-11 04:39:09 +00:00
|
|
|
foregroundOption :: Option
|
|
|
|
foregroundOption = Option.flag [] "foreground" "do not daemonize"
|
|
|
|
|
2012-06-11 06:01:20 +00:00
|
|
|
stopOption :: Option
|
|
|
|
stopOption = Option.flag [] "stop" "stop daemon"
|
|
|
|
|
|
|
|
start :: Bool -> Bool -> CommandStart
|
|
|
|
start foreground stopdaemon = notBareRepo $ do
|
|
|
|
if stopdaemon
|
|
|
|
then liftIO . stopDaemon =<< fromRepo gitAnnexPidFile
|
|
|
|
else withStateMVar $ startDaemon (not foreground)
|
2012-06-10 21:40:35 +00:00
|
|
|
stop
|
|
|
|
|
2012-06-11 06:01:20 +00:00
|
|
|
startDaemon :: Bool -> MVar Annex.AnnexState -> Annex ()
|
|
|
|
startDaemon False st = do
|
|
|
|
showStart "watch" "."
|
|
|
|
liftIO $ watch st
|
|
|
|
startDaemon True st = do
|
|
|
|
logfd <- liftIO . openLog =<< fromRepo gitAnnexLogFile
|
|
|
|
pidfile <- fromRepo gitAnnexPidFile
|
|
|
|
liftIO $ daemonize logfd (Just pidfile) False $ watch st
|
|
|
|
|
2012-06-11 04:39:09 +00:00
|
|
|
watch :: MVar Annex.AnnexState -> IO ()
|
2012-06-10 21:40:35 +00:00
|
|
|
#if defined linux_HOST_OS
|
2012-06-11 04:39:09 +00:00
|
|
|
watch st = withINotify $ \i -> do
|
2012-06-11 19:08:04 +00:00
|
|
|
changechan <- runChangeChan newTChan
|
2012-06-11 04:39:09 +00:00
|
|
|
let hook a = Just $ runHandler st changechan a
|
|
|
|
let hooks = WatchHooks
|
|
|
|
{ addHook = hook onAdd
|
|
|
|
, delHook = hook onDel
|
|
|
|
, addSymlinkHook = hook onAddSymlink
|
|
|
|
, delDirHook = hook onDelDir
|
|
|
|
, errHook = hook onErr
|
|
|
|
}
|
|
|
|
-- The commit thread is started early, so that the user
|
|
|
|
-- can immediately begin adding files and having them
|
|
|
|
-- committed, even while the inotify scan is taking place.
|
|
|
|
_ <- forkIO $ commitThread st changechan
|
|
|
|
-- This does not return until the inotify scan is done.
|
|
|
|
-- That can take some time for large trees.
|
|
|
|
watchDir i "." (ignored . takeFileName) hooks
|
|
|
|
runStateMVar st $ showAction "scanning"
|
|
|
|
-- Notice any files that were deleted before inotify
|
|
|
|
-- was started.
|
|
|
|
runStateMVar st $ do
|
|
|
|
inRepo $ Git.Command.run "add" [Param "--update"]
|
|
|
|
showAction "started"
|
|
|
|
waitForTermination
|
2012-06-07 02:49:32 +00:00
|
|
|
#else
|
2012-06-10 21:40:35 +00:00
|
|
|
watch = error "watch mode is so far only available on Linux"
|
2012-06-07 02:49:32 +00:00
|
|
|
#endif
|
|
|
|
|
2012-06-07 03:30:38 +00:00
|
|
|
ignored :: FilePath -> Bool
|
|
|
|
ignored ".git" = True
|
|
|
|
ignored ".gitignore" = True
|
|
|
|
ignored ".gitattributes" = True
|
|
|
|
ignored _ = False
|
|
|
|
|
2012-06-10 17:23:10 +00:00
|
|
|
{- Stores the Annex state in a MVar, so that threaded actions can access
|
|
|
|
- it.
|
deletion
When a new file is annexed, a deletion event occurs when it's moved away
to be replaced by a symlink. Most of the time, there is no problimatic
race, because the same thread runs the add event as the deletion event.
So, once the symlink is in place, the deletion code won't run at all,
due to existing checks that a deleted file is really gone.
But there is a race at startup, as then the inotify thread is running
at the same time as the main thread, which does the initial tree walking
and annexing. It would be possible for the deletion inotify to run
in a perfect race with the addition, and remove the newly added symlink
from the git cache.
To solve this race, added event serialization via a MVar. We putMVar
before running each event, which blocks if an event is already running.
And when an event finishes (or crashes!), we takeMVar to free the lock.
Also, make rm -rf not spew warnings by passing --ignore-unmatch when
deleting directories.
2012-06-04 21:32:46 +00:00
|
|
|
-
|
2012-06-10 17:23:10 +00:00
|
|
|
- Once the action is finished, retrieves the state from the MVar.
|
deletion
When a new file is annexed, a deletion event occurs when it's moved away
to be replaced by a symlink. Most of the time, there is no problimatic
race, because the same thread runs the add event as the deletion event.
So, once the symlink is in place, the deletion code won't run at all,
due to existing checks that a deleted file is really gone.
But there is a race at startup, as then the inotify thread is running
at the same time as the main thread, which does the initial tree walking
and annexing. It would be possible for the deletion inotify to run
in a perfect race with the addition, and remove the newly added symlink
from the git cache.
To solve this race, added event serialization via a MVar. We putMVar
before running each event, which blocks if an event is already running.
And when an event finishes (or crashes!), we takeMVar to free the lock.
Also, make rm -rf not spew warnings by passing --ignore-unmatch when
deleting directories.
2012-06-04 21:32:46 +00:00
|
|
|
-}
|
2012-06-10 17:23:10 +00:00
|
|
|
withStateMVar :: (MVar Annex.AnnexState -> Annex a) -> Annex a
|
|
|
|
withStateMVar a = do
|
|
|
|
state <- Annex.getState id
|
|
|
|
mvar <- liftIO $ newMVar state
|
|
|
|
r <- a mvar
|
|
|
|
newstate <- liftIO $ takeMVar mvar
|
|
|
|
Annex.changeState (const newstate)
|
|
|
|
return r
|
|
|
|
|
|
|
|
{- Runs an Annex action, using the state from the MVar. -}
|
2012-06-11 19:08:04 +00:00
|
|
|
runStateMVar :: MVar Annex.AnnexState -> Annex a -> IO a
|
2012-06-10 17:23:10 +00:00
|
|
|
runStateMVar mvar a = do
|
2012-06-11 19:08:04 +00:00
|
|
|
liftIO $ putStrLn "takeMVar"
|
2012-06-05 01:21:52 +00:00
|
|
|
startstate <- takeMVar mvar
|
2012-06-11 19:08:04 +00:00
|
|
|
!(r, newstate) <- Annex.run startstate a
|
|
|
|
liftIO $ putStrLn "putMVar"
|
2012-06-10 17:23:10 +00:00
|
|
|
putMVar mvar newstate
|
2012-06-11 19:08:04 +00:00
|
|
|
return r
|
2012-06-10 17:23:10 +00:00
|
|
|
|
2012-06-11 19:08:04 +00:00
|
|
|
runChangeChan :: STM a -> IO a
|
|
|
|
runChangeChan = atomically
|
|
|
|
|
|
|
|
{- Runs an action handler, inside the Annex monad, and if there was a
|
|
|
|
- change, adds it to the ChangeChan.
|
2012-06-10 17:23:10 +00:00
|
|
|
-
|
|
|
|
- Exceptions are ignored, otherwise a whole watcher thread could be crashed.
|
|
|
|
-}
|
2012-06-10 22:29:05 +00:00
|
|
|
runHandler :: MVar Annex.AnnexState -> ChangeChan -> Handler -> FilePath -> IO ()
|
2012-06-11 19:08:04 +00:00
|
|
|
runHandler st changechan handler file = void $ do
|
|
|
|
r <- tryIO (runStateMVar st $ handler file)
|
|
|
|
case r of
|
|
|
|
Left e -> putStrLn $ show e
|
|
|
|
Right Nothing -> noop
|
|
|
|
Right (Just change) -> void $
|
|
|
|
runChangeChan $ writeTChan changechan change
|
watch subcommand
So far this only handles auto-annexing new files that are created inside
the repository while it's running. To make this really useful,
it needs to at least:
- notice deleted files and stage the deletion
(tricky; there's a race with add..)
- notice renamed files, auto-fix the symlink, and stage the new file location
- periodically auto-commit staged changes
- honor .gitignore, not adding files it excludes
Also nice to have would be:
- Somehow sync remotes, possibly using a push sync like dvcs-autosync
does, so they are immediately updated.
- Somehow get content that is unavilable. This is problimatic with inotify,
since we only get an event once the user has tried (and failed) to read
from the file. Perhaps instead, automatically copy content that is added
out to remotes, with the goal of all repos eventually getting a copy,
if df allows.
- Drop files that have not been used lately, or meet some other criteria
(as long as there's a copy elsewhere).
- Perhaps automatically dropunused files that have been deleted,
although I cannot see a way to do that, since by the time the inotify
deletion event arrives, the file is deleted, and we cannot see what
its symlink pointed to! Alternatievely, perhaps automatically
do an expensive unused/dropunused cleanup process.
Some of this probably needs the currently stateless threads to maintain
a common state.
2012-04-12 21:29:43 +00:00
|
|
|
|
2012-06-10 22:29:05 +00:00
|
|
|
{- Handlers call this when they made a change that needs to get committed. -}
|
|
|
|
madeChange :: FilePath -> String -> Annex (Maybe Change)
|
2012-06-11 19:08:04 +00:00
|
|
|
madeChange file desc = do
|
|
|
|
-- Just in case the commit thread is not flushing the queue fast enough.
|
|
|
|
Annex.Queue.flushWhenFull
|
|
|
|
liftIO $ Just <$> (Change <$> getCurrentTime <*> pure file <*> pure desc)
|
2012-06-10 22:29:05 +00:00
|
|
|
|
2012-06-10 23:58:34 +00:00
|
|
|
noChange :: Annex (Maybe Change)
|
|
|
|
noChange = return Nothing
|
|
|
|
|
2012-06-06 18:29:10 +00:00
|
|
|
{- Adding a file is tricky; the file has to be replaced with a symlink
|
|
|
|
- but this is race prone, as the symlink could be changed immediately
|
|
|
|
- after creation. To avoid that race, git add is not used to stage the
|
2012-06-10 22:29:05 +00:00
|
|
|
- symlink.
|
|
|
|
-
|
|
|
|
- Inotify will notice the new symlink, so this Handler does not stage it
|
|
|
|
- or return a Change, leaving that to onAddSymlink.
|
|
|
|
-}
|
|
|
|
onAdd :: Handler
|
2012-06-05 01:21:52 +00:00
|
|
|
onAdd file = do
|
2012-06-06 18:29:10 +00:00
|
|
|
showStart "add" file
|
2012-06-10 22:29:05 +00:00
|
|
|
handle =<< Command.Add.ingest file
|
2012-06-10 23:58:34 +00:00
|
|
|
noChange
|
2012-06-06 18:29:10 +00:00
|
|
|
where
|
2012-06-10 22:29:05 +00:00
|
|
|
handle Nothing = showEndFail
|
|
|
|
handle (Just key) = do
|
|
|
|
Command.Add.link file key True
|
2012-06-06 18:29:10 +00:00
|
|
|
showEndOk
|
2012-06-04 17:22:56 +00:00
|
|
|
|
2012-06-04 19:10:43 +00:00
|
|
|
{- A symlink might be an arbitrary symlink, which is just added.
|
|
|
|
- Or, if it is a git-annex symlink, ensure it points to the content
|
|
|
|
- before adding it.
|
|
|
|
-}
|
2012-06-10 22:29:05 +00:00
|
|
|
onAddSymlink :: Handler
|
2012-06-04 19:10:43 +00:00
|
|
|
onAddSymlink file = go =<< Backend.lookupFile file
|
|
|
|
where
|
2012-06-10 23:58:34 +00:00
|
|
|
go Nothing = addlink =<< liftIO (readSymbolicLink file)
|
2012-06-04 19:10:43 +00:00
|
|
|
go (Just (key, _)) = do
|
|
|
|
link <- calcGitLink file key
|
|
|
|
ifM ((==) link <$> liftIO (readSymbolicLink file))
|
2012-06-10 23:58:34 +00:00
|
|
|
( addlink link
|
2012-06-04 19:10:43 +00:00
|
|
|
, do
|
|
|
|
liftIO $ removeFile file
|
|
|
|
liftIO $ createSymbolicLink link file
|
2012-06-06 18:29:10 +00:00
|
|
|
addlink link
|
2012-06-04 19:10:43 +00:00
|
|
|
)
|
2012-06-10 23:58:34 +00:00
|
|
|
{- This is often called on symlinks that are already staged
|
|
|
|
- correctly, especially during the startup scan. A symlink
|
|
|
|
- may have been deleted and re-added, or added when
|
|
|
|
- the watcher was not running; so it always stages
|
|
|
|
- even symlinks that already exist.
|
|
|
|
-
|
|
|
|
- So for speed, tries to reuse the existing blob for
|
|
|
|
- the symlink target. -}
|
|
|
|
addlink link = do
|
|
|
|
v <- catObjectDetails $ Ref $ ":" ++ file
|
|
|
|
case v of
|
|
|
|
Just (currlink, sha)
|
|
|
|
| s2w8 link == L.unpack currlink ->
|
|
|
|
stageSymlink file sha
|
|
|
|
_ -> do
|
|
|
|
sha <- inRepo $
|
|
|
|
Git.HashObject.hashObject BlobObject link
|
|
|
|
stageSymlink file sha
|
|
|
|
madeChange file "link"
|
2012-06-04 17:22:56 +00:00
|
|
|
|
2012-06-10 22:29:05 +00:00
|
|
|
onDel :: Handler
|
|
|
|
onDel file = do
|
|
|
|
Annex.Queue.addUpdateIndex =<<
|
|
|
|
inRepo (Git.UpdateIndex.unstageFile file)
|
|
|
|
madeChange file "rm"
|
2012-06-04 17:22:56 +00:00
|
|
|
|
deletion
When a new file is annexed, a deletion event occurs when it's moved away
to be replaced by a symlink. Most of the time, there is no problimatic
race, because the same thread runs the add event as the deletion event.
So, once the symlink is in place, the deletion code won't run at all,
due to existing checks that a deleted file is really gone.
But there is a race at startup, as then the inotify thread is running
at the same time as the main thread, which does the initial tree walking
and annexing. It would be possible for the deletion inotify to run
in a perfect race with the addition, and remove the newly added symlink
from the git cache.
To solve this race, added event serialization via a MVar. We putMVar
before running each event, which blocks if an event is already running.
And when an event finishes (or crashes!), we takeMVar to free the lock.
Also, make rm -rf not spew warnings by passing --ignore-unmatch when
deleting directories.
2012-06-04 21:32:46 +00:00
|
|
|
{- A directory has been deleted, or moved, so tell git to remove anything
|
2012-06-10 17:05:58 +00:00
|
|
|
- that was inside it from its cache. Since it could reappear at any time,
|
|
|
|
- use --cached to only delete it from the index.
|
|
|
|
-
|
|
|
|
- Note: This could use unstageFile, but would need to run another git
|
|
|
|
- command to get the recursive list of files in the directory, so rm is
|
|
|
|
- just as good. -}
|
2012-06-10 22:29:05 +00:00
|
|
|
onDelDir :: Handler
|
|
|
|
onDelDir dir = do
|
|
|
|
Annex.Queue.addCommand "rm"
|
|
|
|
[Params "--quiet -r --cached --ignore-unmatch --"] [dir]
|
|
|
|
madeChange dir "rmdir"
|
2012-06-06 18:29:10 +00:00
|
|
|
|
2012-06-07 03:20:09 +00:00
|
|
|
{- Called when there's an error with inotify. -}
|
2012-06-10 22:29:05 +00:00
|
|
|
onErr :: Handler
|
|
|
|
onErr msg = do
|
|
|
|
warning msg
|
|
|
|
return Nothing
|
2012-06-06 20:50:28 +00:00
|
|
|
|
2012-06-06 18:29:10 +00:00
|
|
|
{- Adds a symlink to the index, without ever accessing the actual symlink
|
|
|
|
- on disk. -}
|
2012-06-10 23:58:34 +00:00
|
|
|
stageSymlink :: FilePath -> Sha -> Annex ()
|
|
|
|
stageSymlink file sha =
|
2012-06-08 04:29:39 +00:00
|
|
|
Annex.Queue.addUpdateIndex =<<
|
2012-06-10 23:58:34 +00:00
|
|
|
inRepo (Git.UpdateIndex.stageSymlink file sha)
|
2012-06-10 17:56:39 +00:00
|
|
|
|
2012-06-10 22:29:05 +00:00
|
|
|
{- Gets all unhandled changes.
|
smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.
I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)
Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.
There is one bug, for some reason it does double commits, every time.
2012-06-10 20:07:48 +00:00
|
|
|
- Blocks until at least one change is made. -}
|
2012-06-10 22:29:05 +00:00
|
|
|
getChanges :: ChangeChan -> IO [Change]
|
2012-06-11 19:08:04 +00:00
|
|
|
getChanges chan = runChangeChan $ do
|
smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.
I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)
Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.
There is one bug, for some reason it does double commits, every time.
2012-06-10 20:07:48 +00:00
|
|
|
c <- readTChan chan
|
|
|
|
go [c]
|
|
|
|
where
|
|
|
|
go l = do
|
|
|
|
v <- tryReadTChan chan
|
|
|
|
case v of
|
|
|
|
Nothing -> return l
|
|
|
|
Just c -> go (c:l)
|
|
|
|
|
|
|
|
{- Puts unhandled changes back into the channel.
|
|
|
|
- Note: Original order is not preserved. -}
|
2012-06-10 22:29:05 +00:00
|
|
|
refillChanges :: ChangeChan -> [Change] -> IO ()
|
2012-06-11 19:08:04 +00:00
|
|
|
refillChanges chan cs = runChangeChan $ mapM_ (writeTChan chan) cs
|
smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.
I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)
Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.
There is one bug, for some reason it does double commits, every time.
2012-06-10 20:07:48 +00:00
|
|
|
|
2012-06-10 22:29:05 +00:00
|
|
|
{- This thread makes git commits at appropriate times. -}
|
smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.
I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)
Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.
There is one bug, for some reason it does double commits, every time.
2012-06-10 20:07:48 +00:00
|
|
|
commitThread :: MVar Annex.AnnexState -> ChangeChan -> IO ()
|
|
|
|
commitThread st changechan = forever $ do
|
|
|
|
-- First, a simple rate limiter.
|
2012-06-10 21:53:17 +00:00
|
|
|
threadDelay oneSecond
|
smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.
I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)
Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.
There is one bug, for some reason it does double commits, every time.
2012-06-10 20:07:48 +00:00
|
|
|
-- Next, wait until at least one change has been made.
|
|
|
|
cs <- getChanges changechan
|
|
|
|
-- Now see if now's a good time to commit.
|
2012-06-10 21:53:17 +00:00
|
|
|
time <- getCurrentTime
|
|
|
|
if shouldCommit time cs
|
2012-06-10 22:29:05 +00:00
|
|
|
then void $ tryIO $ runStateMVar st $ commitStaged
|
2012-06-10 21:53:17 +00:00
|
|
|
else refillChanges changechan cs
|
smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.
I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)
Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.
There is one bug, for some reason it does double commits, every time.
2012-06-10 20:07:48 +00:00
|
|
|
where
|
|
|
|
oneSecond = 1000000 -- microseconds
|
|
|
|
|
2012-06-10 22:29:05 +00:00
|
|
|
commitStaged :: Annex ()
|
|
|
|
commitStaged = do
|
|
|
|
Annex.Queue.flush
|
|
|
|
inRepo $ Git.Command.run "commit"
|
|
|
|
[ Param "--allow-empty-message"
|
|
|
|
, Param "-m", Param ""
|
|
|
|
-- Empty commits may be made if tree changes cancel
|
|
|
|
-- each other out, etc
|
|
|
|
, Param "--allow-empty"
|
|
|
|
-- Avoid running the usual git-annex pre-commit hook;
|
|
|
|
-- watch does the same symlink fixing, and we don't want
|
|
|
|
-- to deal with unlocked files in these commits.
|
|
|
|
, Param "--quiet"
|
|
|
|
]
|
|
|
|
|
smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.
I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)
Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.
There is one bug, for some reason it does double commits, every time.
2012-06-10 20:07:48 +00:00
|
|
|
{- Decide if now is a good time to make a commit.
|
|
|
|
- Note that the list of change times has an undefined order.
|
|
|
|
-
|
|
|
|
- Current strategy: If there have been 10 commits within the past second,
|
|
|
|
- a batch activity is taking place, so wait for later.
|
|
|
|
-}
|
2012-06-10 22:29:05 +00:00
|
|
|
shouldCommit :: UTCTime -> [Change] -> Bool
|
|
|
|
shouldCommit now changes
|
smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.
I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)
Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.
There is one bug, for some reason it does double commits, every time.
2012-06-10 20:07:48 +00:00
|
|
|
| len == 0 = False
|
|
|
|
| len > 4096 = True -- avoid bloating queue too much
|
2012-06-10 22:29:05 +00:00
|
|
|
| length (filter thisSecond changes) < 10 = True
|
smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.
I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)
Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.
There is one bug, for some reason it does double commits, every time.
2012-06-10 20:07:48 +00:00
|
|
|
| otherwise = False -- batch activity
|
2012-06-10 17:56:39 +00:00
|
|
|
where
|
2012-06-10 22:29:05 +00:00
|
|
|
len = length changes
|
|
|
|
thisSecond c = now `diffUTCTime` changeTime c <= 1
|