Defer adding files to the annex until commit time, when during a batch
operation, a bundle of files will be available. This will allow for
checking a them all with a single lsof call.
The tricky part is that adding the file causes a symlink change inotify.
So I made it wait for an appropriate number of symlink changes to be
received before continuing with the commit. This avoids any delay
in the commit process. It is possible that some unrelated symlink change is
made; if that happens it'll commit it and delay committing the newly added
symlink for 1 second. This seems ok. I do rely on the expected symlink
change event always being received, but only when the add succeeds.
Another way to do it might be to directly stage the symlink, and then
ignore the redundant symlink change event. That would involve some
redundant work, and perhaps an empty commit, but if this code turns
out to have some bug, that'd be the best way to avoid it.
FWIW, this change seems to, as a bonus, have produced better grouping
of batch changes into single commits. Before, a large batch change would
result in a series of commits, with the first containing only one file,
and each of the rest bundling a number of files. Now, the added wait for
the symlink changes to arrive gives time for additional add changes to
be processed, all within the same commit.
A few places catch IO errors after calling runThreadState,
but since the MVar was not restored, it'd later deadlock trying to read
from it.
I'd like to catch all exceptions here, but I could not get the types
to unify.
Now it starts really, really fast! Down from 15 minutes or so on my big
tree to around 1 minute.
The trick is to remember the last time the daemon was running. Links with a
ctime from before that point don't need to be restaged on startup (as long
as they are correct), since the old daemon would have handled them already.
We also assume that if the daemon has never run before, any links that
already exist are good. The pre-commit hook fixes links, so this should be
a safe assumption.
Adds another MVar holding a DaemonStatus data structure. Also
allowed getting rid of the Annex.Fast hack. This data structure will
probably grow a lot of details about the daemon's status, that will
later be used by the webapp's UI.
The code to actually track when the daemon was last running is not written
yet. It's 3 am.
The idea, not yet done, is to use this to detect when a file
has an old change time, and avoid expensive restaging of the file.
If git-annex watch keeps track of the last time it finished a full scan,
then any symlink that is older than that time must have been scanned
before, so need not be added. (Relying on moving, copying, etc of a file
all updating its change time.)
Anyway, this info is available for free since inotify already checks it,
so it might as well make it available.