git annex assistant --autostart will start separate daemons in each
listed autostart repo
running the webapp outside any git-annex repo will open it on the
first listed autostart repo
This prevents multiple runs of the assistant in the foreground, and lets
--stop stop foregrounded runs too.
The webapp firstrun case also now writes a pid file, once it's made the git
repo to put it in.
This is a way to send a notification to a set of clients, any of which can be
blocked waiting for a new notification to arrive. A complication is that any
number of clients may be be dead, and we don't want stale notifications for
those clients to pile up and leak memory.
It took me 3 tries to find the solution, which turns out to be simple: An array
of SampleVars, one per client. Using SampleVars means that clients only see the
most recent notification, but when the notification is just "the assistant's
state changed somehow; display a refreshed rendering of it", that's sufficient.
This avoids forking another process, avoids polling, fixes a race,
and avoids a rare forkProcess thread hang that I saw once time
when starting the webapp.
The webapp is now a constantly updating clock! I accomplished this amazing
feat using "long polling", with some jquery and a little custom java
script.
There are more modern techniques, but this one works everywhere.
Broke hamlet out into standalone files.
I don't like the favicon display; it should be served from /favicon.ico,
but I could only get the static site to serve /static/favicon.ico, so
I had to use a <link rel=icon> to pull it in. I looked at
Yesod.Default.Handlers.getFaviconR, but it doesn't seem to embed
the favicon into the binary?
There's a minor performance overhead to doing this, but this way I don't
have to worry about a situation where statfs might block for a long time.
For example, when it's on a network filesystem.
While this seems to work fine when used in a simple program,
when I load it in ghci, it segfaults about half the time. Don't know why,
and seems ghci specific, but if I get reports of crashes, I'll need to look
into that.
Converted from using c2hs to using hsc2hs, just because other code
in git-annex uses hsc2hs.
Various cleanups.
This code is LGPLed, so I had to include that licence.
Make Utility.Process wrap the parts of System.Process that I use,
and add debug logging to them.
Also wrote some higher-level code that allows running an action
with handles to a processes stdin or stdout (or both), and checking
its exit status, all in a single function call.
As a bonus, the debug logging now indicates whether the process
is being run to read from it, feed it data, chat with it (writing and
reading), or just call it for its side effect.
Test suite now passes with -threaded!
I traced back all the hangs with -threaded to System.Cmd.Utils. It seems
it's just crappy/unsafe/outdated, and should not be used. System.Process
seems to be the cool new thing, so converted all the code to use it
instead.
In the process, --debug stopped printing commands it runs. I may try to
bring that back later.
Note that even SafeSystem was switched to use System.Process. Since that
was a modified version of code from System.Cmd.Utils, it needed to be
converted too. I also got rid of nearly all calls to forkProcess,
and all calls to executeFile, which I'm also doubtful about working
well with -threaded.
This is necessary to generate events when a file is deleted and immediately
replaced. Otherwise, the cache will have the old file, and so no event
would be generated.
Use of getFileStatus is suboptimal, it would be faster to use
the inode returned by readdir -- but getDirectoryContents does not expose
it, so I'd have to copy and modify a lot of low-level code.
that doesn't exist, or cannot be read
The problem is its use of unsafeInterleaveIO, which causes its IO code
to run when the thunk is forced, outside any exception trapping the caller
may do.
Not yet tested and places git-annex-shell is run need to be modified to
pass the new field settings.
Note that rsyncServerSend was changed to fork, rather than directly exec
rsync, because it needs to keep the transfer lock held, and clean up the
transfer log when done.
The reason the DirWatcher had to wait for program termination was because
it used withINotify, so when it finished, its watcher threads were killed.
But since I have two DirWatcher threads now, that was not good, and could
perhaps explain the MVar problem I saw yesterday. In any case, fixed this
part of the code by making the DirWatcher return a handle that can be used
to stop it, and now the main Assistant thread is the only one calling
waitForTermination.
SampleMVar won't work; between getting the current value and changing
it, another thread could made a change, which would get lost.
TMVar works well; this update situation is handled by atomic transactions.
Note that, since this always pushes branch synced/master to the remote, it
assumes that master has already gotten all the commits that are on the
remote merged in. Otherwise, fast-forward prevention may prevent the push.
That's probably ok, because the next stage is to automatically detect
incoming pushes and merge.
This *may* now return Add or Delete Changes as appropriate. All I know
for sure is that it compiles.
I had hoped to avoid maintaining my own state about the content of the
directory tree, and rely on git to check what was changed. But I can't;
I need to know about new and deleted subdirectories to add them to the
watch list, and git doesn't deal with (empty) directories.
So, wrote all the code to scan directories, remember their past contents,
compare with current contents, generate appropriate Change events, and
update bookkeeping info appropriately.
Could not reproduce the build failure I had seen related to this,
but the numbers were wrong with statfs64. Probably pulling from the wrong
place in the structure. statvfs seems to work..
The idea, not yet done, is to use this to detect when a file
has an old change time, and avoid expensive restaging of the file.
If git-annex watch keeps track of the last time it finished a full scan,
then any symlink that is older than that time must have been scanned
before, so need not be added. (Relying on moving, copying, etc of a file
all updating its change time.)
Anyway, this info is available for free since inotify already checks it,
so it might as well make it available.
Now really only done in the startup scan.
It turns out to be quite hard for event handlers to know when the startup
scan is complete. I tried to make addWatch pass that info, but found
threading the state very difficult. For now, a quick hack, using the fast
flag.
Note that it's actually possible for inotify events to come in while the
startup scan is still ongoing. Due to my hack, the expensive check will
be done for files added in such inotify events.
This requires a relatively expensive test at file add time to see if it's
in git already. But it can be optimised to only happen during the startup
scan.
When a new file is annexed, a deletion event occurs when it's moved away
to be replaced by a symlink. Most of the time, there is no problimatic
race, because the same thread runs the add event as the deletion event.
So, once the symlink is in place, the deletion code won't run at all,
due to existing checks that a deleted file is really gone.
But there is a race at startup, as then the inotify thread is running
at the same time as the main thread, which does the initial tree walking
and annexing. It would be possible for the deletion inotify to run
in a perfect race with the addition, and remove the newly added symlink
from the git cache.
To solve this race, added event serialization via a MVar. We putMVar
before running each event, which blocks if an event is already running.
And when an event finishes (or crashes!), we takeMVar to free the lock.
Also, make rm -rf not spew warnings by passing --ignore-unmatch when
deleting directories.
This fixes the scenario where:
* directory foo is moved away (and still watched)
* a new directory foo is made
* file (or directory) foo/bar is created
* the old directory's file (or directory) "bar" is deleted
We don't want a deletion event for foo/bar in this case.
There are two reasons for this test. First, there could be a fifo or other
non-regular file that was closed.
Second, this test avoids ugliness when a subdirectory is moved out of the
top of the watch directory to elsewhere, and a file added to it.
Since the subdirectory has moved, the file won't be present under the
old location, and nothing will be done.
I cannot find a way to stop watching such directories, at least not without
a lot of pain. The inotify interface in Haskell doesn't make it easy to
stop watching a given subdirectory when it's moved out; it would require
keeping a map of all watch handles that is shared between threads.
This workaround avoids the problem in most cases; the only remaining case
being deletion of a file from a moved subdirectory.
Improved the inotify code, so it will also notice directory removal
and symlink creation.
In the watch code, optimised away a stat of a file that's being added,
that's done by Command.Add.start. This is the reason symlink creation is
handled separately from file creation, since during initial tree walk
at startup, a stat was already done, and can be reused.
Baked into the code was an assumption that a repository's git directory
could be determined by adding ".git" to its work tree (or nothing for bare
repos). That fails when core.worktree, or GIT_DIR and GIT_WORK_TREE are
used to separate the two.
This was attacked at the type level, by storing the gitdir and worktree
separately, so Nothing for the worktree means a bare repo.
A complication arose because we don't learn where a repository is bare
until its configuration is read. So another Location type handles
repositories that have not had their config read yet. I am not entirely
happy with this being a Location type, rather than representing them
entirely separate from the Git type. The new code is not worse than the
old, but better types could enforce more safety.
Added support for core.worktree. Overriding it with -c isn't supported
because it's not really clear what to do if a git repo's config is read, is
not bare, and is then overridden to bare. What is the right git directory
in this case? I will worry about this if/when someone has a use case for
overriding core.worktree with -c. (See Git.Config.updateLocation)
Also removed and renamed some functions like gitDir and workTree that
misused git's terminology.
One minor regression is known: git annex add in a bare repository does not
print a nice error message, but runs git ls-files in a way that fails
earlier with a less nice error message. This is because before --work-tree
was always passed to git commands, even in a bare repo, while now it's not.
Amoung other things, this makes unlocking a WORM backed file and then
re-adding it without making any changes not add a new object, as the
timestamp is preserved.
This option avoids gpg key distribution, at the expense of flexability, and
with the requirement that all clones of the git repository be equally
trusted.
This is incomplete, it does not honor it yet for hash directories
and other annex bookkeeping files. Some of that is not needed for a bare
repo; some of it may be.
Not only an optimisation. fsck always tried to preventWrite to make sure
file modes are good, and in a shared repo, that will fail on directories
not owned by the current user. Although there may be other problems with
such a setup.
When preparing the debian stable backport, I am seeing a call to statvfs()
succeed, but also set errno to 2 (ENOENT). Not sure why this happens;
I am in a schroot when it does happen, or perhaps stable's libc is a little
broken and sets errno incorrectly. Anyway, it should be perfectly fine to
clear errno after the successful call, rather than before it.
Recursive inotify has beaten me before, with its bad design and races,
but not this time! (I think.) This is able to follow the strongest
filesystem traffic I can throw at it, and robustly notices every file
add and delete. Mostly that's down to Haskell having a quite nice threaded
inotify library (that does its own buffering). A key insight was realizing
that the inotify directory add race could be dealt with by scanning for
files inside newly added directories.
TODO: Add support for freebsd/osx kqueue; see
http://hackage.haskell.org/package/kqueue
Can a git-annex-monitor be far off?