This means that anyone serving up the webapp to users as a service
(ie, without providing any git-annex binary at all to the user) still needs
to provide a link to the source code for it, including any modifications
they may make.
This may make git-annex be covered by the AGPL as a whole when it is built
with the webapp. If in doubt, you should ask a lawyer.
When git-annex is built with the webapp disabled, no AGPLed code is used.
Even building in the assistant does not pull in AGPLed code.
This is handled differently for inotify, which can track modifications of
existing files, and kqueue, which cannot (TTBOMK). On the inotify side,
the TransferWatcher just waits for the file to be updated and reads the new
bytesComplete. On the kqueue side, the TransferPoller has to re-read the
file every update (currently 0.5 seconds, might need to increase that).
I did think about working around kqueue's limitations by somehow creating
a new file each time the size changed. But cleaning up all the files that
would result seemed difficult. And really, this is not a lot worse than
the TransferWatcher's behavior for downloads, which stats a file every 0.5
seconds. As long as the OS has decent file caching behavior..
cp is used here, but we can just watch the size of the destination file
This commit made from within the ruins of an old mill, overlooking a
beautiful waterfall.
This doesn't avoid it sometimes attempting to commit when there are no
changes. Typically that happens when a change is pushed in from another
repo; the watcher sees the file and tries to stage it, resulting in an
empty commit. Really fixing that would probably use more CPU than
occasionally trying to make an empty commit.
However, this does save a lot of unnecessary work, as those empty commits
had to be synced out, which no longer happens.
This ensures file propigate takes place in situations such as: Usb drive A
is connected to B. A's master branch is already in sync with B, but it is
being used to sneakernet some files around, so B downloads those. There is no
master branch change, so C does not request these files. B needs to upload
the files it just downloaded on to C, etc.
My first try at this, I saw loops happen. B uploaded to C, which then
tried to upload back to B (because it had not received the updated
git-annex branch from B yet). B already had the file, but it still created
a transfer info file from the incoming transfer, and its watcher saw
that be removed, and tried to upload back to C.
These loops should have been fixed by my previous commit. (They never
affected ssh remotes, only local ones, it seemed.) While C might still try
to upload to B, or to some other remote that already has the file, the
extra work dies out there.
I was seeing some interesting crashes after the previous commit,
when making file changes slightly faster than the assistant could keep up.
error: Ref refs/heads/master is at 7074f8e0a11110c532d06746e334f2fec6af6ab4 but expected 95ea86008d72a40d97a81cfc8fb47a0da92166bd
fatal: cannot lock HEAD ref
Committer crashed: git commit [Param "--allow-empty-message",Param "-m",Param "",Param "--allow-empty",Param "--quiet"] failed
Pusher crashed: thread blocked indefinitely in an STM transaction
Clearly the the merger ended up running at the same time as the committer,
and with both modifying HEAD the committer crashed. I fixed that by
making the Merger run its merge inside the annex monad, which avoids
it running concurrently with other git operations. Also by making
the committer not crash if git fails.
What I don't understand is why the pusher then crashed with a STM deadlock.
That must be in either the DaemonStatusHandle or the FailedPushMap,
and the latter is only used by the pusher. Did the committer's crash somehow
break STM?
The BlockedIndefinitelyOnSTM exception is described as:
-- |The thread is waiting to retry an STM transaction, but there are no
-- other references to any @TVar@s involved, so it can't ever continue.
If the Committer had a reference to a TVar and crashed, I can sort of see
this leading to that exception..
The crash was quite easy to reproduce after the previous commit, but
after making the above change, I have yet to see it again. Here's hoping.
Now when a download is queued and there's no known remote to get it from,
it's added to a deferred download list, which will be retried later.
The Merger thread tries to queue any deferred downloads when it receives
a push to the git-annex branch.
Note that the Merger thread now also forces an update of the git-annex
branch. The assistant was not updating this branch before, and it saw a
(mostly) correct view of state, but now that incoming pushes go to
synced/git-annex, it needs to be merged in.
Don't expose these as branches in refs/heads/. Instead hide them away in
refs/synced/ where only show-ref will find them.
Make unused only look at branches and tags, not these other things,
so it won't care if some stale sync ref used to use a file.
This means they don't need to be deleted, which could have
led to an incoming sync being missed.
The fallback branches pushed to contain the uuid of the pusher, which is
ugly. That's why syncing doesn't normally use this method.
The merger deletes fallback branches after merging them, to contain the
ugliness, and so unused doesn't look at data from these branches.
(The fallback git-annex branch is left behind for now.)
Now other repositories can configure special remotes, and when their
configuration has propigated out, they'll appear in the webapp's list of
repositories, with a link to enable them.
Added support for enabling rsync special remotes, and directory special
remotes that are on removable drives. However, encrypted directory special
remotes are not supported yet. The removable drive configuator doesn't
support them yet anyway.
Turns out sClose was working fine.. but it was not being run on every
opened socket. The upstream bug is that multicastSender can crash
on an invalid (or ipv6) address and when this happens it's already
opened a socket, which just goes missing with no way to close it.
A simple fix to the library can avoid this, as I describe here:
https://github.com/audreyt/network-multicast/issues/2
In the meantime, just skipping ipv6 addresses will fix the fd leak.
Finally.
Last bug fixes here: Send PairResp with same UUID in the PairReq.
Fix off-by-one in code that filters out our own pairing messages.
Also reworked the pairing alerts, which are still slightly buggy.
Pair requests the the same UUID are part of the same pairing session,
which allows us to detect attempts to brute force the shared secret,
as that will result in pair requests with the same UUID that are
not verified with the right secret.
They work fine. But I had to go to a lot of trouble to get Yesod to render
routes in a pure function. It may instead make more sense to have each
alert have an assocated IO action, and a single route that runs the IO
action of a given alert id. I just wish I'd realized that before the past
several hours of struggling with something Yesod really doesn't want to
allow.
The remote computer may not support mDNS. Instead, pass over the uname -a
hostname, and the IP address, and leave best hostname calculation to the
remote side.
Pair requests are sent on all network interfaces, and contain the best
available hostname to use to contact the host on that interface.
Added a pairing in progress page.
Revert "reduce some boilerplate using ghc extensions", because it caused
overlapping instances for Text.
Actually 3 forms in one, this handles the initial passphrase entry, and the
confirmation, and also varys wording if the same user or a different user
is confirming.
Roughed out a data type that models the whole pairing conversation,
and can be serialized to implement it. And a state machine to run
that conversation. Not yet hooked up to any transport such as multicast
UDP.
Avoid trying to git push/pull to special remotes, but still do transfer
scans of them, after git pull from any other remotes, so we know about
any values that have been placed on them.
I think this makes sense.. Unless the assistant is running on the server,
the repo won't be updated, so it might as well be bare.
Non-bare repos will be handled by the pairing configurator, later.
The code to maintain that TChan in parallel with the list was buggy,
the two were not always the same. And all that TChan was needed for was
blocking on the next transfer, which can be accomplished just as well by
checking the size and retrying, thanks to STM.
Also, this is faster, and uses less memory. Total win.
I had an intuition that throwTo might be blocking because an exception was
caught and the exception handler was running. This seems to be the case,
and is avoided by using try. However, I can't really find anywhere in
throwTo's documentation that justifies this behavior.
When multiple downloads of a key are queued, it starts the first, but leaves the
other downloads in the queue. This ensures that we don't lose a queued
download if the one that got started failed.
Run code that pops off the next queued transfer and adds it to the active
transfer map within an allocated transfer slot, rather than before
allocating a slot. Fixes the transfers display, which had been displaying
the next transfer as a running transfer, while the previous transfer was
still running.
Currently only the web special remote is readonly, but it'd be possible to
also have readonly drives, or other remotes. These are handled in the
assistant by only downloading from them, and never trying to upload to
them.
The expensive transfer scan now scans a whole set of remotes in one pass.
So at startup, or when network comes up, it will run only once.
Note that this can result in transfers from/to higher cost remotes being
queued before other transfers of other content from/to lower cost remotes.
Before, low cost remotes were scanned first and all their transfers came
first. When multiple transfers are queued for a key, the lower cost ones
are still queued first. However, this could result in transfers from slow
remotes running for a long time while transfers of other data from faster
remotes waits.
I expect to make the transfer queue smarter about ordering
and/or make it allow multiple transfers at a time, which should eliminate
this annoyance. (Also, it was already possible to get into that situation,
for example if the network was up, lots of transfers from slow remotes
might be queued, and then a disk is mounted and its faster transfers have
to wait.)
Also note that this means I don't need to improve the code in
Assistant.Sync that currently checks if any of the reconnected remotes
have diverged, and if so, queues scans of all of them. That had been very
innefficient, but now doesn't matter.
Used by the assistant, rather than copy, this is faster because it avoids
using git ls-files, avoids checking the location log redundantly, and
runs in oneshot mode, avoiding making a commit to the git-annex branch
for every file transferred.
There are multiple reasons to do this:
* The local network may be up solid, but a route to a networked remote
is having trouble. Any transfers to it that fail should be retried.
* Someone might have wicd running, but like to bring up new networks
by hand too. This way, it'll eventually notice them.
The problem with using it here is that, if a removable drive is scanned
and gets disconnected during the scan, testing for all the files will
indicate it doesn't have them, and the scan is logged as completed
successfully, without necessary transfers being queued.
Found a very cheap way to determine when a disconnected remote has
diverged, and has new content that needs to be transferred: Piggyback on
the git-annex branch update, which already checks for divergence.
However, this does not check if new content has appeared locally while
disconnected, that should be transferred to the remote.
Also, this does not handle cases where the two git repos are in sync,
but their content syncing has not caught up yet.
This code could have its efficiency improved:
* When multiple remotes are synced, if any one has diverged, they're
all queued for transfer scans.
* The transfer scanner could be told whether the remote has new content,
the local repo has new content, or both, and could optimise its scan
accordingly.
This deals with interruptions in network connectevity, by listening
for a new network interface coming up (using dbus to see when
network-manager or wicd do it), and forcing a rescan of
A paused transfer's thread keeps running, keeping the slot in use.
This is intentional; pausing a transfer should not let other
queued transfers to run in its place.
This seems to work pretty well.
Handled the process groups like this:
- git-annex processes started by the assistant for transfers are run in their
own process groups.
- otherwise, rely on the shell to allocate a process group for git-annex
There is potentially a problem if some other program runs git-annex
directly (not using sh -c) The program and git-annex would then be in
the same process group. If that git-annex starts a transfer and it's
canceled, the program would also get killed. May or may not be a desired
result.
Also, the new updateTransferInfo probably closes a race where it was
possible for the thread id to not be recorded in the transfer info, if
the transfer info file from the transfer process is read first.
This doesn't quite work, because canceling a transfer sends a signal
to git-annex, but not to rsync (etc).
Looked at making git-annex run in its own process group, which could then
be killed, and would kill child processes. But, rsync checks if it's
process group is the foreground process group and doesn't show progress if
not, and when git has run git-annex, if git-annex makes a new process
group, that is not the case. Also, if git has run git-annex, ctrl-c
wouldn't be propigated to it if it made a new process group.
So this seems like a blind alley, but recording it here just in case.
Should work (untested) for transfers being run by other processes.
Not yet by transfers being run by the assistant. killThread does not
kill processes forked off by a thread. To fix this, will probably
need to make `git annex getkey` and `git annex sendkey` commands that
operate on keys, and write their own transfer info. Then the assistant
can run them, and kill them, as needed.
This commit includes a paydown on technical debt incurred two years ago,
when I didn't know that it was bad to make custom Read and Show instances
for types. As the routes need Read and Show for Transfer, which includes a
Key, and deriving my own Read instance of key was not practical,
I had to finally clean that up.
So the compact Key read and show functions are now file2key and key2file,
and Read and Show are now derived instances.
Changed all code that used the old instances, compiler checked.
(There were a few places, particularly in Command.Unused, and the test
suite where the Show instance continue to be used for legitimate
comparisons; ie show key_x == show key_y (though really in a bloom filter))
The TMVar is supposed to be left empty once the map is empty, but the code
neglected to do that, so the next time takeMVar got an empty map, which
is not handled since that was supposed to never happen..
Also, avoid any possibility of this crash. If an empty map somehow creeps
in, just retry.
This should work on linux (xdg-open) and OSX (open). If the program
is not in $PATH, it falls back to opening a browser window/tab with file:///
The only tricky bit is the javascript code, that handles clicking on the
link. This is to avoid unnecessary page refreshes. Until I added the
return false at the end, the <a>'s normal click event also fired, so two
file browsers opened. I have not checked portability extensively.
30 characters would mostly work, but 20 is safer due to some wider letters
like 'w'. Of course this is very heuristic based on filesize anyway.
(Bootstrap does a surprisingly bad job at dealing with overlong words
in the sidebar.)
Now an alert tracks files that have recently been added. As a large file
is added, it will have its own alert, that then combines with the tracker
when dones.
Also used for combining sanity checker alerts, as it could possibly want to
display a lot.
git annex assistant --autostart will start separate daemons in each
listed autostart repo
running the webapp outside any git-annex repo will open it on the
first listed autostart repo
This allows me to not build-depend on blaze-markup, which was causing
me some trouble when tring to build with cabal on debian. Seems debian
ships Text.Blaze.Renderer.String in two packages.
Unifying poll results, it's Annex in lowercase. :)
When cwd is HOME, use ~/Desktop/annex, unless there's no Desktop directory;
then use use ~/annex
If cwd is not $HOME, use cwd
Now the javascript does an ajax call at the start to request the url
to use to poll, and the notification id is generated then, once we know
javascript is working.