The idea, not yet done, is to use this to detect when a file
has an old change time, and avoid expensive restaging of the file.
If git-annex watch keeps track of the last time it finished a full scan,
then any symlink that is older than that time must have been scanned
before, so need not be added. (Relying on moving, copying, etc of a file
all updating its change time.)
Anyway, this info is available for free since inotify already checks it,
so it might as well make it available.
Now really only done in the startup scan.
It turns out to be quite hard for event handlers to know when the startup
scan is complete. I tried to make addWatch pass that info, but found
threading the state very difficult. For now, a quick hack, using the fast
flag.
Note that it's actually possible for inotify events to come in while the
startup scan is still ongoing. Due to my hack, the expensive check will
be done for files added in such inotify events.
This requires a relatively expensive test at file add time to see if it's
in git already. But it can be optimised to only happen during the startup
scan.
When a new file is annexed, a deletion event occurs when it's moved away
to be replaced by a symlink. Most of the time, there is no problimatic
race, because the same thread runs the add event as the deletion event.
So, once the symlink is in place, the deletion code won't run at all,
due to existing checks that a deleted file is really gone.
But there is a race at startup, as then the inotify thread is running
at the same time as the main thread, which does the initial tree walking
and annexing. It would be possible for the deletion inotify to run
in a perfect race with the addition, and remove the newly added symlink
from the git cache.
To solve this race, added event serialization via a MVar. We putMVar
before running each event, which blocks if an event is already running.
And when an event finishes (or crashes!), we takeMVar to free the lock.
Also, make rm -rf not spew warnings by passing --ignore-unmatch when
deleting directories.
This fixes the scenario where:
* directory foo is moved away (and still watched)
* a new directory foo is made
* file (or directory) foo/bar is created
* the old directory's file (or directory) "bar" is deleted
We don't want a deletion event for foo/bar in this case.
There are two reasons for this test. First, there could be a fifo or other
non-regular file that was closed.
Second, this test avoids ugliness when a subdirectory is moved out of the
top of the watch directory to elsewhere, and a file added to it.
Since the subdirectory has moved, the file won't be present under the
old location, and nothing will be done.
I cannot find a way to stop watching such directories, at least not without
a lot of pain. The inotify interface in Haskell doesn't make it easy to
stop watching a given subdirectory when it's moved out; it would require
keeping a map of all watch handles that is shared between threads.
This workaround avoids the problem in most cases; the only remaining case
being deletion of a file from a moved subdirectory.
Improved the inotify code, so it will also notice directory removal
and symlink creation.
In the watch code, optimised away a stat of a file that's being added,
that's done by Command.Add.start. This is the reason symlink creation is
handled separately from file creation, since during initial tree walk
at startup, a stat was already done, and can be reused.
Recursive inotify has beaten me before, with its bad design and races,
but not this time! (I think.) This is able to follow the strongest
filesystem traffic I can throw at it, and robustly notices every file
add and delete. Mostly that's down to Haskell having a quite nice threaded
inotify library (that does its own buffering). A key insight was realizing
that the inotify directory add race could be dealt with by scanning for
files inside newly added directories.
TODO: Add support for freebsd/osx kqueue; see
http://hackage.haskell.org/package/kqueue
Can a git-annex-monitor be far off?