This fixes the issue mentioned in the last commit.
Turns out just collecting UUID of clients behind a XMPP remote is
insufficient (although I should probably still do it for other reasons),
because a single remote repo might be connected via both XMPP and local
pairing. So a way is needed to know when a push was received from any
client using a given XMPP remote over XMPP, as opposed to via ssh.
This cannot completely guard against a runaway log event, and only runs
every hour anyway, but it should avoid most problems with very
long-running, active assistants using up too much space.
The only thing lost is ./ghci
Speed: make fast used to take 20 seconds here, when rebuilding from
touching Command/Unused.hs. With cabal, it's 29 seconds.
In general, git-annex does not try to preserve file permissions. For
example, they don't round trip through special remotes. So it's ok to not
preserve them for git remotes either.
On crippled filesystems, rsync has been observed failing after the file
was transferred because it couldn't set some permission or other.
This is so gratutious and pointless. It's a shame that everything we
learned about Unix portability and the importance of standards has been
thrown out the window by these guys.
Various things that don't work on Android are just ifdefed out.
* the webapp (needs template haskell for arm)
* --include and --exclude globbing (needs libpcre, which is not ported;
probably I'll make it use the pure haskell glob library instead)
* annex.diskreserve checking (missing sys/statvfs.h)
* timestamp preservation support (yawn)
* S3
* WebDAV
* XMPP
The resulting 17mb binary has been tested on Android, and it is able to,
at least, print its usage message.
It used to not log to daemon.log when a repository was first created, and
when starting the webapp. Now both do. Redirecting stdout and stderr to the
log is tricky when starting the webapp, because the web browser may want to
communicate with the user. (Either a console web browser, or web.browser = echo)
This is handled by restoring the original fds when running the browser.
since some systems may have configuration problems or other issues that
prevent web browsers from connecting to the right localhost IP for the
webapp.
Tested on both ipv4 and ipv6 localhost. Url for the latter looks like:
http://[::1]:50676
This allows it to use Build.SysConfig to always install the programs
configure detected. Amoung other fixes, this ensures the right uuid
generator and checksum programs are installed.
I also cleaned up the handling of lsof's path; configure now checks for
it in PATH, but falls back to looking for it in sbin directories.
* get/copy --auto: Transfer data even if it would exceed numcopies,
when preferred content settings want it.
* drop --auto: Fix dropping content when there are no preferred content
settings.
Both the directory and webdav special remotes used to have to buffer
the whole file contents before it could be decrypted, as they read
from chunks. Now the chunks are streamed through gpg with no buffering.
This adds a dep on haskell's async library, but since that's been
added to the recent haskell platform release, it should not be
much hardship to my poor long-suffering library chasing users.
Inject the required git-remote-xmpp into PATH when running xmpp git push.
Rest of the time it will not be in PATH, and git won't be able to talk to
xmpp remotes.
MountWatcher can't do this, because it uses the session dbus,
and won't have access to the new DBUS_SESSION_BUS_ADDRESS if a new session
is started.
Bumped dbus library version, FD leak in it is fixed.
Currently relies on SRV being set, or the JID's hostname being the server
hostname and the port being default. Future work: Allow manual
configuration of user name, hostname, and port.
Now when the dbus connection is dropped, it'll fall back to polling.
I could make it try to reconnect, but there's a FD leak in the dbus
library, so not yet.
I had been using -ignore-package monads-tf to deal with this, but
the XMPP library uses monads-tf, so that also ignores it. Instead,
use PackageImports to force use of mtl in my own code.
When rsyncProgress pipes rsync's stdout, this turns out to cause a ssh
process started by rsync to be left behind as a zombie. I don't know why,
but my recent zombie reaping cleanup was correct, it's just that this other
zombie, that's not directly started by git-annex, was no longer reaped
due to changes in the cleanup. Make rsyncProgress reap the zombie started
by rsync, as a workaround.
FWIW, the process tree looks like this. It seems like the rsync child
is for some reason starting but not waiting on this extra ssh process.
Ssh connection caching may be involved -- disabling it seemed to change
the shape of the tree, but did not eliminate the zombie.
9378 pts/14 S+ 0:00 | \_ rsync -p --progress --inplace -4 -e 'ssh' '-S' ...
9379 pts/14 S+ 0:00 | | \_ ssh ...
9380 pts/14 S+ 0:00 | | \_ rsync -p --progress --inplace -4 -e 'ssh' '-S' ...
9381 pts/14 Z+ 0:00 | \_ [ssh] <defunct>
Nearly everything that's reading from git is operating on a small
amount of output and has been switched to use that. Only pipeNullSplit
stuff continues using the lazy version that yields zombies.
This includes a full parser for the boolean expressions in the log,
that compiles them into Matchers. Those matchers are not used yet.
A complication is that matching against an expression should never
crash git-annex with an error. Instead, vicfg checks that the expressions
parse. If a bad expression (or an expression understood by some future
git-annex version) gets into the log, it'll be ignored.
Most of the code in Limit couldn't fail anyway, but I did have to make
limitCopies check its parameter first, and return an error if it's bad,
rather than erroring at runtime.
This seems to fix a problem I've recently seen where ctrl-c during rsync
leads to `git annex get` moving on to the next thing rather than exiting.
Seems likely that started happening with the switch to System.Process
(d1da9cf221), as the old code took care
to install a default SIGINT handler.
Note that since the bug was only occurring sometimes, I am not 100% sure
I've squashed it, although I seem to have.
This is a workaround for bind failing with EINVAL sometimes on OSX.
I don't know why; EVINAL should mean the socket is already bound to an
address, but this is with a new socket.
Simplified it using existing functions.
I doubt setSticky needs to return the FileMode; if it does for some
reason, it can be changed to use modifyFileMode'
Converted isSticky to a pure function for consistency with isSymlink.
Note that the sticky bit of a file can be tested thus:
isSticky . fileMode <$> getFileStatus file
This is handled differently for inotify, which can track modifications of
existing files, and kqueue, which cannot (TTBOMK). On the inotify side,
the TransferWatcher just waits for the file to be updated and reads the new
bytesComplete. On the kqueue side, the TransferPoller has to re-read the
file every update (currently 0.5 seconds, might need to increase that).
I did think about working around kqueue's limitations by somehow creating
a new file each time the size changed. But cleaning up all the files that
would result seemed difficult. And really, this is not a lot worse than
the TransferWatcher's behavior for downloads, which stats a file every 0.5
seconds. As long as the OS has decent file caching behavior..
Current implementation parses rsync's output a character a time, which
is hardly efficient. It could be sped up a lot by using hGetBufSome,
but that would require going really lowlevel, down to raw C style buffers
(good example of that here: http://users.aber.ac.uk/afc/stricthaskell.html)
But rsync doesn't output very much, so currently it seems ok.
The webapp can only run on one of ipv4 and ipv6, no both. Some web browsers
may not support ipv6, so ipv4 is the safe choice.
The actual problem I ran into with it only listening to ipv6 was that
Utility.Url.exists was failing to connect to it. I doubt that haskell's
HTTP library is ipv4 only. More likely, it was only trying one address,
and tried ipv4 first.
Actually 3 forms in one, this handles the initial passphrase entry, and the
confirmation, and also varys wording if the same user or a different user
is confirming.
To avoid conflict with different liftIO from MonadIO (in some version of
yesod not the one I have here), and because it's generally clearer, since
this module has both Wai and Yesod stuff, to qualify them both.
This deals with interruptions in network connectevity, by listening
for a new network interface coming up (using dbus to see when
network-manager or wicd do it), and forcing a rescan of
Old 1.0.1 version is still supported as well. Cabal autodetects
which version is available, but in the Makefile, WITH_OLD_YESOD
has to be configured appropriately.
I have not squashed all the $newline warnings with the new Yesod.
They should go away eventually anyway as Yesod moves past that transition.
git annex assistant --autostart will start separate daemons in each
listed autostart repo
running the webapp outside any git-annex repo will open it on the
first listed autostart repo
This prevents multiple runs of the assistant in the foreground, and lets
--stop stop foregrounded runs too.
The webapp firstrun case also now writes a pid file, once it's made the git
repo to put it in.
This is a way to send a notification to a set of clients, any of which can be
blocked waiting for a new notification to arrive. A complication is that any
number of clients may be be dead, and we don't want stale notifications for
those clients to pile up and leak memory.
It took me 3 tries to find the solution, which turns out to be simple: An array
of SampleVars, one per client. Using SampleVars means that clients only see the
most recent notification, but when the notification is just "the assistant's
state changed somehow; display a refreshed rendering of it", that's sufficient.
This avoids forking another process, avoids polling, fixes a race,
and avoids a rare forkProcess thread hang that I saw once time
when starting the webapp.
The webapp is now a constantly updating clock! I accomplished this amazing
feat using "long polling", with some jquery and a little custom java
script.
There are more modern techniques, but this one works everywhere.
Broke hamlet out into standalone files.
I don't like the favicon display; it should be served from /favicon.ico,
but I could only get the static site to serve /static/favicon.ico, so
I had to use a <link rel=icon> to pull it in. I looked at
Yesod.Default.Handlers.getFaviconR, but it doesn't seem to embed
the favicon into the binary?
There's a minor performance overhead to doing this, but this way I don't
have to worry about a situation where statfs might block for a long time.
For example, when it's on a network filesystem.
While this seems to work fine when used in a simple program,
when I load it in ghci, it segfaults about half the time. Don't know why,
and seems ghci specific, but if I get reports of crashes, I'll need to look
into that.
Converted from using c2hs to using hsc2hs, just because other code
in git-annex uses hsc2hs.
Various cleanups.
This code is LGPLed, so I had to include that licence.
Make Utility.Process wrap the parts of System.Process that I use,
and add debug logging to them.
Also wrote some higher-level code that allows running an action
with handles to a processes stdin or stdout (or both), and checking
its exit status, all in a single function call.
As a bonus, the debug logging now indicates whether the process
is being run to read from it, feed it data, chat with it (writing and
reading), or just call it for its side effect.
Test suite now passes with -threaded!
I traced back all the hangs with -threaded to System.Cmd.Utils. It seems
it's just crappy/unsafe/outdated, and should not be used. System.Process
seems to be the cool new thing, so converted all the code to use it
instead.
In the process, --debug stopped printing commands it runs. I may try to
bring that back later.
Note that even SafeSystem was switched to use System.Process. Since that
was a modified version of code from System.Cmd.Utils, it needed to be
converted too. I also got rid of nearly all calls to forkProcess,
and all calls to executeFile, which I'm also doubtful about working
well with -threaded.
This is necessary to generate events when a file is deleted and immediately
replaced. Otherwise, the cache will have the old file, and so no event
would be generated.
Use of getFileStatus is suboptimal, it would be faster to use
the inode returned by readdir -- but getDirectoryContents does not expose
it, so I'd have to copy and modify a lot of low-level code.
that doesn't exist, or cannot be read
The problem is its use of unsafeInterleaveIO, which causes its IO code
to run when the thunk is forced, outside any exception trapping the caller
may do.
Not yet tested and places git-annex-shell is run need to be modified to
pass the new field settings.
Note that rsyncServerSend was changed to fork, rather than directly exec
rsync, because it needs to keep the transfer lock held, and clean up the
transfer log when done.
The reason the DirWatcher had to wait for program termination was because
it used withINotify, so when it finished, its watcher threads were killed.
But since I have two DirWatcher threads now, that was not good, and could
perhaps explain the MVar problem I saw yesterday. In any case, fixed this
part of the code by making the DirWatcher return a handle that can be used
to stop it, and now the main Assistant thread is the only one calling
waitForTermination.
SampleMVar won't work; between getting the current value and changing
it, another thread could made a change, which would get lost.
TMVar works well; this update situation is handled by atomic transactions.
Note that, since this always pushes branch synced/master to the remote, it
assumes that master has already gotten all the commits that are on the
remote merged in. Otherwise, fast-forward prevention may prevent the push.
That's probably ok, because the next stage is to automatically detect
incoming pushes and merge.
This *may* now return Add or Delete Changes as appropriate. All I know
for sure is that it compiles.
I had hoped to avoid maintaining my own state about the content of the
directory tree, and rely on git to check what was changed. But I can't;
I need to know about new and deleted subdirectories to add them to the
watch list, and git doesn't deal with (empty) directories.
So, wrote all the code to scan directories, remember their past contents,
compare with current contents, generate appropriate Change events, and
update bookkeeping info appropriately.
Could not reproduce the build failure I had seen related to this,
but the numbers were wrong with statfs64. Probably pulling from the wrong
place in the structure. statvfs seems to work..
The idea, not yet done, is to use this to detect when a file
has an old change time, and avoid expensive restaging of the file.
If git-annex watch keeps track of the last time it finished a full scan,
then any symlink that is older than that time must have been scanned
before, so need not be added. (Relying on moving, copying, etc of a file
all updating its change time.)
Anyway, this info is available for free since inotify already checks it,
so it might as well make it available.
Now really only done in the startup scan.
It turns out to be quite hard for event handlers to know when the startup
scan is complete. I tried to make addWatch pass that info, but found
threading the state very difficult. For now, a quick hack, using the fast
flag.
Note that it's actually possible for inotify events to come in while the
startup scan is still ongoing. Due to my hack, the expensive check will
be done for files added in such inotify events.
This requires a relatively expensive test at file add time to see if it's
in git already. But it can be optimised to only happen during the startup
scan.