Batch detection is heuristic, so can sometimes fail. I observed one such
failure while starting up in a repository with 87000 files. After the first
several batches of ~5000 files, it fell out of batch mode, and never
re-entered it, and so made many more commits of a few files at a time
than necessary.
So, let's always use batch mode when in the startup scan. This avoids the
heuristic there, at least.
There is clearly also room to improve the heuristic. Possibly 10 files is
too high a bar to be found during a commit, on a system that can commit
quickly.
Fixes a test case I received where a corrupted repo was repaired, but the
git-annex branch was not. The root of the problem was that the
MissingObject returned by the repair code was not necessarily a complete
set of all objects that might have been deleted during the repair.
So, stop trying to return that at all, and instead make the index file
checking code explicitly verify that each object the index uses is present.
The assistant's commit code also always avoids git commit, for simplicity.
Indirect mode sync still does a git commit -a to catch unstaged changes.
Note that this means that direct mode sync no longer runs the pre-commit
hook or any other hooks git commit might call. The git annex pre-commit
hook action for direct mode is however explicitly run. (The assistant
already ran git commit with hooks disabled, so no change there.)
The Upgrader avoids checking for upgrades on startup when it was just
upgraded. This avoids an upgrade loop if something goes wrong. One example
of something going wrong would be if the upgrade info file and the
distribution file get out of sync (or the distribution file is cached in
a proxy), so it thinks it has upgraded to a new version, but has really
not.
When an automatic upgrade completes, or when the user clicks on the upgrade
button in one webapp, but also has it open in another browser window/tab,
we have a problem: The current web server is going to stop running in
minutes, but there is no way to send a redirect to the web browser to the
new url.
To solve this, used long polling, so the webapp is always listening for
urls it should redirect to. This allows globally redirecting every open
webapp. Works great! Tested with 2 web browsers with 2 tabs each.
May be useful for other purposes later too, dunno.
The overhead is 2 http requests per page load in the webapp. Due to yesod's
speed, this does not seem to noticibly delay it. Only 1 of the requests
could possibly block the page load, the other is async.
Made alerts be able to have multiple buttons, so the alerts about upgrading
can have a button that enables automatic upgrades.
Implemented automatic upgrading when the program file has changed.
Note that when an automatic upgrade happens, the webapp displays an alert
about it for a few minutes, and then closes. This still needs work.
Not yet wired up to restart the assistant on upgrade; that needs careful
sanity checking to wait until the upgrade is done before restarting.
Used the DirWatcher here, so it gets events for any changes to the
directory containing the program file. (But not subdirs.) This is necessary
in order to detect when the file is renamed as part of the upgrade, which
an inotify on a single file would not detect. (Also, I have DirWatcher code,
but not FileWatcher code.)
Note that upgrades that remove or rename a whole directory tree containing
the executable will *not* trigger this code. So eg, deleting and replacing
the whole standalone tarball dir tree won't work -- but untarring it
over top will. So should dpkg package upgrades.
Added programPath, using a new GHC feature to find the full path to the
executable. The fallback code for old GHC or unsupported OS is less good;
its worst failure mode would be either failing to find the program, and so
not checking for upgrades, or finding a git-annex that's in PATH, but is
not the one running.
This commit was sponsored by John Roepke.
The webapp will check twice a day, when the network is connected, to see if
it can download a distributon upgrade file. If a newer version is found,
display an upgrade alert.
This will need the autobuilders to set UPGRADE_LOCATION to the url
it can be downloaded from when building git-annex. Only builds with that
set need automatic upgrade alerts.
Currently, the upgrade page just requests the user manually download
and upgrade it. But, all the info is provided to do automated upgrades
in the future.
Note that urls used will need to all be https.
This commit was sponsored by Dirk Kraft.
I was able to reproduce something very like this bug by starting
pairing separately on both computers under poor network conditions (ie,
weak wifi on my front porch). Neither computer showed an alert for the
PairReq messages it was seeing (intermittently) from the other.
So, I've made a new PairReq message that has not been seen before
always make the alert pop up, even if the assistant thinks it is
in the middle of its own pairing process (or even another pairing
process with a different box on the LAN).
(This shouldn't cause a rogue PairAck to disrupt a pairing process part
way through.)
The msg contains a haskell-escaped string, so control characters in it can
also be escaped. So this didn't work before, really.
Got rid of the \n check, because current pairing messages actually do
contain a \n, after the ssh public key. Don't want to break
back-compatability.
When starting up the assistant, it'll remind about the current
repository, if it doesn't have checks. And when a removable drive
is plugged in, it will remind if a repository on it lacks checks.
Since that might be annoying, the reminders can be turned off.
This commit was sponsored by Nedialko Andreev.
Added a RemoteChecker thread, that waits for problems to be reported with
remotes, and checks if their git repository is in need of repair.
Currently, only failures to sync with the remote cause a problem to be
reported. This seems enough, but we'll see.
Plugging in a removable drive with a repository on it that is corrupted
does automatically repair the repository, as long as the corruption causes
git push or git pull to fail. Some types of corruption do not, eg
missing/corrupt objects for blobs that git push doesn't need to look at.
So, this is not really a replacement for scheduled git repository fscking.
But it does make the assistant more robust.
This commit is sponsored by Fernando Jimenez.
Currently only implemented for local git remotes. May try to add support
to git-annex-shell for ssh remotes later. Could concevably also be
supported by some special remote, although that seems unlikely.
Cronner user this when available, and when not falls back to
fsck --fast --from remote
git annex fsck --from does not itself use this interface.
To do so, I would need to pass --fast and all other options that influence
fsck on to the git annex fsck that it runs inside the remote. And that
seems like a lot of work for a result that would be no better than
cd remote; git annex fsck
This may need to be revisited if git-annex-shell gets support, since it
may be the case that the user cannot ssh to the server to run git-annex
fsck there, but can run git-annex-shell there.
This commit was sponsored by Damien Diederen.
I probably need to improve handling of the PleaseTerminate exception to
kill the fsck process. Also, if fsck finds bad files, something needs
to requeue downloads of them. Otherwise, this should work, but is probably
quite buggy since I have only tested the pure code over the past 2 days.
Extends the index.lock handling to other git lock files. I surveyed
all lock files used by git, and found more than I expected. All are
handled the same in git; it leaves them open while doing the operation,
possibly writing the new file content to the lock file, and then closes
them when done.
The gc.pid file is excluded because it won't affect the normal operation
of the assistant, and waiting for a gc to finish on startup wouldn't be
good.
All threads except the webapp thread wait on the new startup sanity checker
thread to complete, so they won't try to do things with git that fail
due to stale lock files. The webapp thread mostly avoids doing that kind of
thing itself. A few configurators might fail on lock files, but only if the
user is explicitly trying to run them. The webapp needs to start
immediately when the user has opened it, even if there are stale lock
files.
Arranging for the threads to wait on the startup sanity checker was a bit
of a bear. Have to get all the NotificationHandles set up before the
startup sanity checker runs, or they won't see its signal. Perhaps
the NotificationBroadcaster is not the best interface to have used for
this. Oh well, it works.
This commit was sponsored by Michael Jakl
This is motivated by a user report that the assistant was repeatedly
retrying transfers of files that had been deleted (in direct mode, so
removing the only copy).
Note that the glacier code retries failed transfers after a while to retry
downloads that have aged long enough to be available. This is ok; if we're
doing a full transfer scan we'll retry on every file that is still in the
git tree.
Also note that this makes the assistant less likely to get every file
referenced by old revs of the git tree. Not something the assistant tries
to ensure anyway, so I feel this is acceptable.
To support this, a core.gcrypt-id is stored by git-annex inside the git
config of a local gcrypt repository, when setting it up.
That is compared with the remote's cached gcrypt-id. When different, a
drive has been changed. git-annex then looks up the remote config for
the uuid mapped from the core.gcrypt-id, and tweaks the configuration
appropriately. When there is no known config for the uuid, it will refuse to
use the remote.
Requires git 1.8.4 or newer. When it's installed, a background
git check-ignore process is run, and used to efficiently check ignores
whenever a new file is added.
Thanks to Adam Spiers, for getting the necessary support into git for this.
A complication is what to do about files that are gitignored but have
been checked into git anyway. git commands assume the ignore has been
overridden in this case, and not need any more overriding to commit a
changed version.
However, for the assistant to do the same, it would have to run git ls-files
to check if the ignored file is in git. This is somewhat expensive. Or it
could use the running git-cat-file process to query the file that way,
but that requires transferring the whole file content over a pipe, so it
can be quite expensive too, for files that are not git-annex
symlinks.
Now imagine if the user knows that a file or directory tree will be getting
frequent changes, and doesn't want the assistant to sync it, so gitignores
it. The assistant could overload the system with repeated ls-files checks!
So, I've decided that the assistant will not automatically commit changes
to files that are gitignored. This is a tradeoff. Hopefully it won't be a
problem to adjust .gitignore settings to not ignore files you want the
assistant to autocommit, or to manually git annex add files that are listed
in .gitignore.
(This could be revisited if git-annex gets access to an interface to check
the content of the index w/o forking a git command. This could be libgit2,
or perhaps a separate git cat-file --batch-check process, so it wouldn't
need to ship over the whole file content.)
This commit was sponsored by Francois Marier. Thanks!
This bug was introduced in 82a6db8fe8,
which improved handling of adding very large numbers of files by ensuring
that a minimum number of max size commits (5000 files each) were done.
I accidentially made it wait for another change to appear after such a max
size commit, even if a lot of queued changes were already accumulated.
That resulted in a stall when it got to the end. Now fixed to not wait
any longer than necessary to ensure the watcher has had time to wake back
up after the max size commit.
This commit was sponsored by Michael Linksvayer. Thanks!
This is a laziness problem. Despite the bang pattern on newfiles, the list
was not being fully evaluated before cleanup was called. Moving cleanup out
to after the list is actually used fixes this.
More evidence that I should be using ResourceT or pipes, if any was needed.
This affected both the hourly NetWatcherFallback thread and the syncing
when network connection is detected.
It was a reversion of sorts, introduced in
8861e270be, when annex-ignore was changed to
not control git syncing. I forgot to make it check annex-sync at that
point.
This is a compromise. I would like to nice every thread except for the
webapp thread, but it's not practical to do so. That would need every
thread to run as a bound thread, which could add significant overhead.
And any forkIO would escape the nice level.
I noticed that when my modem hung up and redialed, my xmpp client was left
sending messages into the void. This will also handle any idle
disconnection issues.
I hope this will be easier to reason about, and less buggy. It was
certianly easier to write!
An immediate benefit is that with a traversable queue of push requests to
select from, the threads can be a lot fairer about choosing which client to
service next.
This will avoid losing any messages received from 1 client when a push
involving another client is running.
Additionally, the handling of push initiation is improved,
it's no longer allowed to run multiples of the same type of push to
the same client.
Still stalls sometimes :(
Observed: With 2 xmpp clients, one would sometimes stop responding
to CanPush messages. Often it was in the middle of a receive-pack
of its own (or was waiting for a failed one to time out).
Now these are always immediately responded to, which is fine; the point
of CanPush is to find out if there's another client out there that's
interested in our push.
Also, in queueNetPushMessage, queue push initiation messages when
we're already running the side of the push they would initiate.
Before, these messages were sent into the netMessagesPush channel,
which was wrong. The xmpp send-pack and receive-pack code discarded
such messages.
This still doesn't make XMPP push 100% robust. In testing, I am seeing
it sometimes try to run two send-packs, or two receive-packs at once
to the same client (probably because the client sent two requests).
Also, I'm seeing rather a lot of cases where it stalls out until it
runs into the 120 second timeout and cancels a push.
And finally, there seems to be a bug in runPush. I have logs that
show it running its setup action, but never its cleanup action.
How is this possible given its use of E.bracket? Either some exception
is finding its way through, or the action somehow stalls forever.
When this happens, one of the 2 clients stops syncing.
This fixes a bug with git annex add in direct mode. If some files already
existed in the tree pointing at the same key as a file that was just added,
and their content was not present, add neglected to copy the content to
those files.
I also changed the behavior of moveAnnex slightly: When content is moved
into the annex in direct mode, it does not overwrite any content already
present in direct mode files. That content may be modified after all.
Unless the request is for repo uuid we already know. This way, if A1 pairs
with friend B1, and B1 pairs with device B2, then B1 can request A1 pair
with it and no confirmation is needed. (In future, may want to try to do
that automatically, to make a more robust network.)
Observed that the pushed refs were received, but not merged into master.
The merger never saw an add event for these refs. Either git is not writing
to a new file and renaming it into place, or the inotify code didn't notice
that. Changed it to also watch for modify events and that seems to have
fixed it!
Without this, a very large batch add has commits of sizes approx
5000, 2500, 1250, etc down to 10, and then starts over at 5000.
This fixes it so it's 5000+ every time.
That hook updates associated file bookkeeping info for direct mode.
But, everything already called addAssociatedFile when adding/changing a
file. It only needed to also call removeAssociatedFile when deleting a file,
or a directory.
This should make bulk adds faster, by some possibly significant amount.
Bulk removals may be a little slower, since it has to use catKeyFile now
on each removed file, but will still be faster than adds.
There's a tradeoff between making less frequent commits, and
needing to use memory to store all the changes that are coming
in. At 10 thousand, it needs 150 mb of memory. 5 thousand drops
that down to 90 mb or so.
This also turns out to have significant imact on total run time.
I benchmarked 10k changes taking 27 minutes. But two 5k batches
took only 21 minutes.
If an add failed, we should lose the KeySource, since it, presumably,
differs due to a change that was made to the file.
(The locked down file is already deleted.)
Turns out that a lot of the time spent in a bulk add was just updating the
add alert to rotate through each file that was added. Showing one alert
makes for a significant speedup.
Also, when the webapp is open, this makes it take quite a lot less cpu
during bulk adds.
Also, it lets the user know when a bulk add happened, which is sorta
nice..
This is so git remotes on servers without git-annex installed can be used
to keep clients' git repos in sync.
This is a behavior change, but since annex-sync can be set to disable
syncing with a remote, I think it's acceptable.
Incidentially should work around the last problem that prevented the webapp
building on Android. (Except for a few places I need to clean up after
hand-fixing the spliced TH code.)
* addurl: Register transfer so the webapp can see it.
* addurl: Automatically retry downloads that fail, as long as some
additional content was downloaded.
Fixed by storing a list of cached inodes for a key, instead of just one.
Backwards compatability note: An old git-annex version will fail to parse
an inode cache file that has been written by a new version, and has
multiple items. It will succees if just one. So old git-annexes will have
even worse behavior when there are duplicated files, if that is possible.
I don't think it will be a problem. (Famous last words.)
Also, note that it doesn't expire old and unused inode caches for a key.
It would be possible to add this if needed; just look through the
associated files for a key and if there are more cached inodes, throw out
any not corresponding to associated files. Unless a file is being copied
repeatedly and the old copy deleted, this lack of expiry should not be a
problem.
* since this is a crippled filesystem anyway, git-annex doesn't use
symlinks on it
* so there's no reason to use the mixed case hash directories that we're
stuck using to avoid breaking everyone's symlinks to the content
* so we can do what is already done for all bare repos, and make non-bare
repos on crippled filesystems use the all-lower case hash directories
* which are, happily, all 3 letters long, so they cannot conflict with
mixed case hash directories
* so I was able to 100% fix this and even resuming `git annex add` in the
test case will recover and it will all just work.
This avoids commit churn by the assistant when eg,
replacing a file with a symlink.
But, just as importantly, it prevents the working tree being left with a
deleted file if git-annex, or perhaps the whole system, crashes at the
wrong time.
(It also probably avoids confusing displays in file managers.)
My test case for this bug is to have the assistant running and syncing to
a remote, and create a file in the annex. Then at the command line run
git annex drop. The assistant sees that the file is gone, sees it's a wanted
file, and downloads it from the remote.
With a directory special remote and a small file, I was seeing around 1
time in 3, a race where the file got unstaged from git after it got
downloaded.
Looking at what direct mode content managing code does in this case, it
deletes the symlink, and then adds the file content back. It would be
possible, sometimes, to avoid removing the symlink and do this atomically.
And I probably should.. but in some cases, particularly where the file
needs to be run through `cp` (multiple direct mode files with same
content), there's no way to atomically replace the symlink with the
content.
Anyway, the bug turns out to be something that the watcher does right for
indirect mode, but not for direct mode. When it got an add event, it
checked to see if this was a new file, or one we've already added. In the
latter case, no add event was queued. But that means that only the rm event
is queued, and so it unstages the file.
Fixed by queueing an add event even when the file is already in git.
Tested by running hundreds of drops in a loop; file remained staged.
I would have sort of liked to put this in .gitattributes, but it seems
it does not support multi-word attribute values. Also, making this a single
config setting makes it easy to only parse the expression once.
A natural next step would be to make the assistant `git add` files that
are not annex.largefiles. OTOH, I don't think `git annex add` should
`git add` such files, because git-annex command line tools are
not in the business of wrapping git command line tools.
When a page is loaded, the javascript requests an notification url, and
does long polling on the url to be informed of changes. But if a change
occured before the notification url was requested, it would not be notified
of that change, and so the page display would not update.
I fixed this by *always* updating the page display after it gets
the notification url. This is extra work, but the overhead is not noticable
in the other overhead of loading a page.
(A nicer way would be to somehow record the version of a page initially
loaded, and then compare it with the current version when getting the
notification url, and only force an update if it's changed. But getting
the "version" of the different parts of the page that use long polling
is difficult.)
Like the old one, but does not mention which remotes are scanned.
I think this is less confusing, as it does not imply the remotes
were somehow accessed (which they are not; inaccessible remotes
can be scanned.)
If transferkey crashes or even fails to run, the TransferWatcher will not
see the transfer info file be created, so will not remove the transfer
from the list of active transfers. This causes the list to grow
continually, and all active transfers are displayed in the webapp. So, put
in a guard.
I assume that transferkey will not exit 0 while neglecting to clean up.
Rather than forking a git-annex transferkey only to have it fail,
just immediately record the failed transfer (so when the drive is plugged
in, the scan will retry it).
This may work around google talk's horrible presence handling, in which
clients often don't learn about other clients, at least when using the same
account. This way, every time we start a git push over xmpp, we'll waste
bandwidth asking clients to please try again to identify themselves.
Just before starting a transfer, do one last check that it's still
preferred content.
I was just doing this for uploads, as part of the smarter flood filling
bug, but realized it's also possible for a download that was preferred
content to change to not be before the download begins, so check that too.
Rather than wait a full second, which may be longer than needed, or too
short to get all the rename events, we start a mode where we wait 1/10th of
a second, and if there are Changes received, wait again. Basically we're
back in batch mode when this happens.