Commit graph

50 commits

Author SHA1 Message Date
Joey Hess
60da0d6ad2 full autostart support
git annex assistant --autostart will start separate daemons in each
listed autostart repo

running the webapp outside any git-annex repo will open it on the
first listed autostart repo
2012-08-02 00:42:33 -04:00
Joey Hess
02ec8ea012 much better webapp startup of the assistant
This avoids forking another process, avoids polling, fixes a race,
and avoids a rare forkProcess thread hang that I saw once time
when starting the webapp.
2012-07-27 15:33:24 -04:00
Joey Hess
3ee44cf8fe add assistant command
like watch, but more magic
2012-06-22 13:04:03 -04:00
Joey Hess
ccc5005245 reorganize 2012-06-13 12:46:39 -04:00
Joey Hess
c31ddeda84 optimise link staging at startup
Now it starts really, really fast! Down from 15 minutes or so on my big
tree to around 1 minute.

The trick is to remember the last time the daemon was running. Links with a
ctime from before that point don't need to be restaged on startup (as long
as they are correct), since the old daemon would have handled them already.

We also assume that if the daemon has never run before, any links that
already exist are good. The pre-commit hook fixes links, so this should be
a safe assumption.

Adds another MVar holding a DaemonStatus data structure. Also
allowed getting rid of the Annex.Fast hack. This data structure will
probably grow a lot of details about the daemon's status, that will
later be used by the webapp's UI.

The code to actually track when the daemon was last running is not written
yet. It's 3 am.
2012-06-13 02:56:16 -04:00
Joey Hess
12dbb9d1d0 plumb file status through to event handlers
The idea, not yet done, is to use this to detect when a file
has an old change time, and avoid expensive restaging of the file.

If git-annex watch keeps track of the last time it finished a full scan,
then any symlink that is older than that time must have been scanned
before, so need not be added. (Relying on moving, copying, etc of a file
all updating its change time.)

Anyway, this info is available for free since inotify already checks it,
so it might as well make it available.
2012-06-13 01:20:37 -04:00
Joey Hess
ab076b2e81 move comment 2012-06-13 00:57:48 -04:00
Joey Hess
7d458c40db tweak 2012-06-12 19:36:11 -04:00
Joey Hess
cb2255e93a do fewer commits during long batch jobs
10 thousand queue size does not use appreciable memory in my testing.
2012-06-12 16:25:56 -04:00
Joey Hess
b240418acc better optimisation of add check
Now really only done in the startup scan.

It turns out to be quite hard for event handlers to know when the startup
scan is complete. I tried to make addWatch pass that info, but found
threading the state very difficult. For now, a quick hack, using the fast
flag.

Note that it's actually possible for inotify events to come in while the
startup scan is still ongoing. Due to my hack, the expensive check will
be done for files added in such inotify events.
2012-06-12 16:24:06 -04:00
Joey Hess
7d2c813396 fix bug that turned files already in git into symlinks
This requires a relatively expensive test at file add time to see if it's
in git already. But it can be optimised to only happen during the startup
scan.
2012-06-12 15:57:24 -04:00
Joey Hess
535d9e4998 add a flag indicating if an event was synthesized during initial dir scan 2012-06-12 14:34:09 -04:00
Joey Hess
d3b9b32f21 cleanup 2012-06-12 13:54:00 -04:00
Joey Hess
942d8f7298 hlint 2012-06-12 11:32:06 -04:00
Joey Hess
d3a6f04abf update 2012-06-11 15:41:26 -04:00
Joey Hess
7f3934520a avoid using STM while the MVar is held
I thought this might be a lock conflict that explains the deadlock when
built with -threaded, but it seems not.. it still locks! It even locks
without the committer thread.

Indeed, it locks when running "git annex add"! -threaded is exposing some
other problem.

Still, this seems conceptually cleaner and did not add any inneficiencies.
Also added some high-level documentation about the threads used.
2012-06-11 15:29:11 -04:00
Joey Hess
f7dbcd58ff tweak 2012-06-11 14:24:13 -04:00
Joey Hess
d0a0a6ae21 git annex watch --stop 2012-06-11 02:01:20 -04:00
Joey Hess
0b3e2bed78 add a pid file
Writes pid to a file. Is supposed to take an exclusive lock, but that's not
working, and it's too late for me to understand why.
2012-06-11 01:20:19 -04:00
Joey Hess
d5884388b0 daemonize git annex watch 2012-06-11 00:39:09 -04:00
Joey Hess
ca9ee21bd7 crazy optimisation
Crazy like a fox..
2012-06-10 19:58:34 -04:00
Joey Hess
c1b432ee54 run git add --update after inotify is started
This way, there's no window where deleted files won't be noticed.
2012-06-10 19:10:18 -04:00
Joey Hess
aae0ba1995 fixed the double commits problem 2012-06-10 18:41:05 -04:00
Joey Hess
fc0dd79774 avoid running pre-commit hook from watch commits 2012-06-10 17:53:17 -04:00
Joey Hess
cda6c4dff5 tweak 2012-06-10 17:40:35 -04:00
Joey Hess
2de50f733a smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.

I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)

Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.

There is one bug, for some reason it does double commits, every time.
2012-06-10 16:07:48 -04:00
Joey Hess
6e54907e35 add a thread to commit changes
Currently the stupidest possible version, just wakes up every second,
and may make empty commits sometimes.
2012-06-10 13:56:39 -04:00
Joey Hess
e5f855b7f8 generalize and improve state MVar code 2012-06-10 13:23:10 -04:00
Joey Hess
5308b51ec0 stage deletions directly using update-index
no need to run git-rm separately
2012-06-10 13:05:58 -04:00
Joey Hess
7f823b56af fix non-linux build 2012-06-09 14:06:56 -04:00
Joey Hess
d45a9a7831 refactor and function name cleanup
(oops, I had a calcMerge and a calc_merge!)
2012-06-08 00:29:39 -04:00
Joey Hess
7d78cbf97c use git queue for rm too 2012-06-07 21:17:10 -04:00
Joey Hess
20f425be19 make watch use the queue
May not work. Certianly needs to flush the queue from time to time
when only symlink changes are being made.
2012-06-07 15:40:44 -04:00
Joey Hess
d5de27ff40 tweak 2012-06-06 23:30:38 -04:00
Joey Hess
b8ae9528ab refactor 2012-06-06 23:20:09 -04:00
Joey Hess
b8f85f7a82 build watch on non-linux, just don't do anything 2012-06-06 22:49:32 -04:00
Joey Hess
c5b11561f0 handle running out of watch descriptors 2012-06-06 16:50:28 -04:00
Joey Hess
db8effb8f3 ignore .gitignore and .gitattributes 2012-06-06 15:50:12 -04:00
Joey Hess
b819f644ad close the git add race
There's a race adding a new file to the annex: The file is moved to the
annex and replaced with a symlink, and then we git add the symlink. If
someone comes along in the meantime and replaces the symlink with
something else, such as a new large file, we add that instead. Which could
be bad..

This race is fixed by avoiding using git add, instead the symlink is
directly staged into the index.

It would be nice to make `git annex add` use this same technique.
I have not done so yet because it currently runs git update-index once per
file, which would slow does `git annex add`. A future enhancement would be
to extend the Git.Queue to include the ability to run update-index with
a list of Streamers.
2012-06-06 14:29:10 -04:00
Joey Hess
cbdaccd44a run event handlers all in the same Annex monad
Uses a MVar again, as there seems no other way to thread the state through
inotify events.

This is a rather unsatisfactory result. I had wanted to run them in
the same monad so that the git queue could be used to coleasce git commands
and speed things up. But, that led to fragility: If several files are
added, and one is removed before queue flush, git add will fail to add
any of them. So, the queue is still explicitly flushed after each add for
now.

TODO: Investigate using git add --ignore-errors. This would need to be done
in Command.Add. And, git add still exits nonzero with it, so would need
to avoid crashing on queue flush.
2012-06-04 21:21:52 -04:00
Joey Hess
48efa2d2d3 avoid explicit queue flush
The queue is still flushed on add, because each add event is handled by a
separate Annex monad. That needs to be fixed to speed up add a lot.
2012-06-04 20:44:15 -04:00
Joey Hess
bd7857d903 ignore-unmatch when removing a staged file
When a file is added, and then deleted before the add action runs,
the delete event was unhappy that the file never did get staged.
2012-06-04 20:13:25 -04:00
Joey Hess
cbf16f1967 refactor 2012-06-04 19:43:29 -04:00
Joey Hess
ec98581112 notice deleted files on startup 2012-06-04 18:14:42 -04:00
Joey Hess
5b4e5ce7e5 deletion
When a new file is annexed, a deletion event occurs when it's moved away
to be replaced by a symlink. Most of the time, there is no problimatic
race, because the same thread runs the add event as the deletion event.
So, once the symlink is in place, the deletion code won't run at all,
due to existing checks that a deleted file is really gone.

But there is a race at startup, as then the inotify thread is running
at the same time as the main thread, which does the initial tree walking
and annexing. It would be possible for the deletion inotify to run
in a perfect race with the addition, and remove the newly added symlink
from the git cache.

To solve this race, added event serialization via a MVar. We putMVar
before running each event, which blocks if an event is already running.
And when an event finishes (or crashes!), we takeMVar to free the lock.

Also, make rm -rf not spew warnings by passing --ignore-unmatch when
deleting directories.
2012-06-04 18:09:18 -04:00
Joey Hess
659e6b1324 suppress "recording state in git" message during add 2012-06-04 17:18:54 -04:00
Joey Hess
677ad74687 add handling of symlink addition events
And just like that, annexed files can be moved and copies around within
the tree, and are automatically fixed to point to the content, and staged
in git. Huzzah!

Delete still remains TODO, with its troublesome race during add..
2012-06-04 15:10:43 -04:00
Joey Hess
7053f5f947 handle directory deletion
When a directory is deleted, or moved away, git rm -r it to stage
the deletion.
2012-06-04 13:30:30 -04:00
Joey Hess
23dbff4b43 add events for symlink creation and directory removal
Improved the inotify code, so it will also notice directory removal
and symlink creation.

In the watch code, optimised away a stat of a file that's being added,
that's done by Command.Add.start. This is the reason symlink creation is
handled separately from file creation, since during initial tree walk
at startup, a stat was already done, and can be reused.
2012-06-04 13:22:56 -04:00
Joey Hess
d5ffd2d99d watch subcommand
So far this only handles auto-annexing new files that are created inside
the repository while it's running. To make this really useful,
it needs to at least:

- notice deleted files and stage the deletion
  (tricky; there's a race with add..)
- notice renamed files, auto-fix the symlink, and stage the new file location
- periodically auto-commit staged changes
- honor .gitignore, not adding files it excludes

Also nice to have would be:

- Somehow sync remotes, possibly using a push sync like dvcs-autosync
  does, so they are immediately updated.
- Somehow get content that is unavilable. This is problimatic with inotify,
  since we only get an event once the user has tried (and failed) to read
  from the file. Perhaps instead, automatically copy content that is added
  out to remotes, with the goal of all repos eventually getting a copy,
  if df allows.
- Drop files that have not been used lately, or meet some other criteria
  (as long as there's a copy elsewhere).
- Perhaps automatically dropunused files that have been deleted,
  although I cannot see a way to do that, since by the time the inotify
  deletion event arrives, the file is deleted, and we cannot see what
  its symlink pointed to! Alternatievely, perhaps automatically
  do an expensive unused/dropunused cleanup process.

Some of this probably needs the currently stateless threads to maintain
a common state.
2012-04-12 17:42:05 -04:00