Unless the request is for repo uuid we already know. This way, if A1 pairs
with friend B1, and B1 pairs with device B2, then B1 can request A1 pair
with it and no confirmation is needed. (In future, may want to try to do
that automatically, to make a more robust network.)
Observed that the pushed refs were received, but not merged into master.
The merger never saw an add event for these refs. Either git is not writing
to a new file and renaming it into place, or the inotify code didn't notice
that. Changed it to also watch for modify events and that seems to have
fixed it!
(Except for the actual streaming of receive-pack through XMPP, which
can only run once we've gotten an appropriate uuid in a push initiation
message.)
Pushes are now only initiated when the initiation message comes from a
known uuid. This allows multiple distinct repositories to use the same xmpp
address.
Note: This probably breaks initial push after xmpp pairing, because at that
point we may not know about the paired uuid, and so reject the push from
it. It won't break in simple cases, because the annex-uuid of the remote
is checked. However, when there are multiple clients behind a single xmpp
address, only uuid of the first is recorded in annex-uuid, and so any
pushes from the others will be rejected (unless the first remote pushes their
uuids to us beforehand.
Without this, a very large batch add has commits of sizes approx
5000, 2500, 1250, etc down to 10, and then starts over at 5000.
This fixes it so it's 5000+ every time.
That hook updates associated file bookkeeping info for direct mode.
But, everything already called addAssociatedFile when adding/changing a
file. It only needed to also call removeAssociatedFile when deleting a file,
or a directory.
This should make bulk adds faster, by some possibly significant amount.
Bulk removals may be a little slower, since it has to use catKeyFile now
on each removed file, but will still be faster than adds.
There's a tradeoff between making less frequent commits, and
needing to use memory to store all the changes that are coming
in. At 10 thousand, it needs 150 mb of memory. 5 thousand drops
that down to 90 mb or so.
This also turns out to have significant imact on total run time.
I benchmarked 10k changes taking 27 minutes. But two 5k batches
took only 21 minutes.
If an add failed, we should lose the KeySource, since it, presumably,
differs due to a change that was made to the file.
(The locked down file is already deleted.)
Turns out that a lot of the time spent in a bulk add was just updating the
add alert to rotate through each file that was added. Showing one alert
makes for a significant speedup.
Also, when the webapp is open, this makes it take quite a lot less cpu
during bulk adds.
Also, it lets the user know when a bulk add happened, which is sorta
nice..
This better handles error messages formatted for console display, by
adding a <br> after each line.
Hmm, I wonder if it'd be worth pulling in a markdown formatter, and running
the messages through it?
In the case of the inotify limit warning, particularly, if it happens once
it will be happening repeatedly, and so combining alerts resulted in a
much too large alert message that took up a lot of memory and was too
large for the webapp to display.
Making this a tset of lists of Changes, rather than a tset of Changes
makes refilling it, in batch mode, much more efficient. Rather than needing
to add every Change it's collected one at a time, it can add them in one
fast batch operation.
It would be more efficient yet to use a Set, but that would need an Eq
instance for InodeCache.
This is so git remotes on servers without git-annex installed can be used
to keep clients' git repos in sync.
This is a behavior change, but since annex-sync can be set to disable
syncing with a remote, I think it's acceptable.
Incidentially should work around the last problem that prevented the webapp
building on Android. (Except for a few places I need to clean up after
hand-fixing the spliced TH code.)
assistant: Work around horrible, terrible, very bad behavior of
gnome-keyring, by not storing special-purpose ssh keys in ~/.ssh/*.pub.
Apparently gnome-keyring apparently will load and indiscriminately use such
keys in some cases, even if they are not using any of the standard ssh key
names. Instead store the keys in ~/.ssh/annex/, which gnome-keyring will
not check.
Note that neither I nor #debian-devel were able to quite reproduce this
problem, but I believe it exists, and that this fixes it. And it certianly
won't hurt anything..
* addurl: Register transfer so the webapp can see it.
* addurl: Automatically retry downloads that fail, as long as some
additional content was downloaded.
Unless highRandomQuality=false (or --fast) is set, use Libgcypt's
'GCRY_VERY_STRONG_RANDOM' level by default for cipher generation, like
it's done for OpenPGP key generation.
On the assistant side, the random quality is left to the old (lower)
level, in order not to scare the user with an enless page load due to
the blocking PRNG waiting for IO actions.
Fixed by storing a list of cached inodes for a key, instead of just one.
Backwards compatability note: An old git-annex version will fail to parse
an inode cache file that has been written by a new version, and has
multiple items. It will succees if just one. So old git-annexes will have
even worse behavior when there are duplicated files, if that is possible.
I don't think it will be a problem. (Famous last words.)
Also, note that it doesn't expire old and unused inode caches for a key.
It would be possible to add this if needed; just look through the
associated files for a key and if there are more cached inodes, throw out
any not corresponding to associated files. Unless a file is being copied
repeatedly and the old copy deleted, this lack of expiry should not be a
problem.
* since this is a crippled filesystem anyway, git-annex doesn't use
symlinks on it
* so there's no reason to use the mixed case hash directories that we're
stuck using to avoid breaking everyone's symlinks to the content
* so we can do what is already done for all bare repos, and make non-bare
repos on crippled filesystems use the all-lower case hash directories
* which are, happily, all 3 letters long, so they cannot conflict with
mixed case hash directories
* so I was able to 100% fix this and even resuming `git annex add` in the
test case will recover and it will all just work.
This avoids commit churn by the assistant when eg,
replacing a file with a symlink.
But, just as importantly, it prevents the working tree being left with a
deleted file if git-annex, or perhaps the whole system, crashes at the
wrong time.
(It also probably avoids confusing displays in file managers.)
My test case for this bug is to have the assistant running and syncing to
a remote, and create a file in the annex. Then at the command line run
git annex drop. The assistant sees that the file is gone, sees it's a wanted
file, and downloads it from the remote.
With a directory special remote and a small file, I was seeing around 1
time in 3, a race where the file got unstaged from git after it got
downloaded.
Looking at what direct mode content managing code does in this case, it
deletes the symlink, and then adds the file content back. It would be
possible, sometimes, to avoid removing the symlink and do this atomically.
And I probably should.. but in some cases, particularly where the file
needs to be run through `cp` (multiple direct mode files with same
content), there's no way to atomically replace the symlink with the
content.
Anyway, the bug turns out to be something that the watcher does right for
indirect mode, but not for direct mode. When it got an add event, it
checked to see if this was a new file, or one we've already added. In the
latter case, no add event was queued. But that means that only the rm event
is queued, and so it unstages the file.
Fixed by queueing an add event even when the file is already in git.
Tested by running hundreds of drops in a loop; file remained staged.
I would have sort of liked to put this in .gitattributes, but it seems
it does not support multi-word attribute values. Also, making this a single
config setting makes it easy to only parse the expression once.
A natural next step would be to make the assistant `git add` files that
are not annex.largefiles. OTOH, I don't think `git annex add` should
`git add` such files, because git-annex command line tools are
not in the business of wrapping git command line tools.
When a page is loaded, the javascript requests an notification url, and
does long polling on the url to be informed of changes. But if a change
occured before the notification url was requested, it would not be notified
of that change, and so the page display would not update.
I fixed this by *always* updating the page display after it gets
the notification url. This is extra work, but the overhead is not noticable
in the other overhead of loading a page.
(A nicer way would be to somehow record the version of a page initially
loaded, and then compare it with the current version when getting the
notification url, and only force an update if it's changed. But getting
the "version" of the different parts of the page that use long polling
is difficult.)
Needed to send a trailing NUL to end a request, and set the read handle
non-blocking.
Also, set fileSystemEncoding on all handles, since there's a filename in
there.
Like the old one, but does not mention which remotes are scanned.
I think this is less confusing, as it does not imply the remotes
were somehow accessed (which they are not; inaccessible remotes
can be scanned.)
If transferkey crashes or even fails to run, the TransferWatcher will not
see the transfer info file be created, so will not remove the transfer
from the list of active transfers. This causes the list to grow
continually, and all active transfers are displayed in the webapp. So, put
in a guard.
I assume that transferkey will not exit 0 while neglecting to clean up.
Rather than forking a git-annex transferkey only to have it fail,
just immediately record the failed transfer (so when the drive is plugged
in, the scan will retry it).
This may work around google talk's horrible presence handling, in which
clients often don't learn about other clients, at least when using the same
account. This way, every time we start a git push over xmpp, we'll waste
bandwidth asking clients to please try again to identify themselves.
Just before starting a transfer, do one last check that it's still
preferred content.
I was just doing this for uploads, as part of the smarter flood filling
bug, but realized it's also possible for a download that was preferred
content to change to not be before the download begins, so check that too.
Rather than wait a full second, which may be longer than needed, or too
short to get all the rename events, we start a mode where we wait 1/10th of
a second, and if there are Changes received, wait again. Basically we're
back in batch mode when this happens.
This cleaned up the code quite a bit; now the committer just looks at the
Change to see if it's a change that needs to have a transfer queued for it.
If I later want to add dropping keys for files that were removed, or
something like that, this should make it straightforward.
This also fixes a bug. In direct mode, moving a file out of an archive
directory failed to start a transfer to get its content. The problem
was that the file had not been committed to git yet, and so the transfer
code didn't want to touch it, since fileKey failed to get its key.
Only starting transfers after a commit avoids this problem.
I saw this happen in real life, when syncing to a newly added usb drive.
I think it got scanned twice, and files were doubled in the queue.
This could be optimised a little bit more, to only read from the mvar
once, rather than twice.
This is not perfect, because on loss of connection, we do not currently
immediately detect it and stop the client. It has to time out, and then
the buddy list will clear.
The NetWatcher should detect disconnects too..
I have a theory that some google xmpp servers don't send prsense for xa
clients, while others do. Seeing some weird lack of presence messages
sometimes there.
Noticed that, At startup or network reconnect, git push messages were sent,
often before presence info has been gathered, so were not sent to any
buddies.
To fix this, keep track of which buddies have seen such messages,
and when new presence is received from a buddy that has not yet seen it,
resend.
This is done only for push initiation messages, so very little data needs
to be stored.
This fixes the issue mentioned in the last commit.
Turns out just collecting UUID of clients behind a XMPP remote is
insufficient (although I should probably still do it for other reasons),
because a single remote repo might be connected via both XMPP and local
pairing. So a way is needed to know when a push was received from any
client using a given XMPP remote over XMPP, as opposed to via ssh.
Make manualPull send push requests over XMPP.
When reconnecting with remotes, those that are XMPP remotes cannot
immediately be pulled from and scanned, so instead maintain a set of
(probably) desynced remotes, and put XMPP remotes on it. (This set could be
used in other ways later, if we can detect we're out of sync with other
types of remotes.)
The merger handles detecting when a XMPP push is received from a desynced
remote, and triggers a scan then, if they have in fact diverged.
This has one known bug: A single XMPP remote can have multiple clients
behind it. When this happens, only the UUID of one client is recorded
as the UUID of the XMPP remote. Pushes from the other XMPP clients will not
trigger a scan. If the client whose UUID is expected responds to the push
request, it'll work, but when that client is offline, we're SOL.
Clean up from 9769235d6b.
In some cases, looking up a remote by name even though it has no UUID is
desirable. This includes git annex sync, which can operate on remotes
without an annex, and XMPP pairing, which runs addRemote (with calls
byName) before the UUID of the XMPP remote has been configured in git.
Watcher wants to rewrite symlink to fix it. But in direct mode, the symlink
could be replaced at any time with file content that has finished being
transferred by some other process. So, just don't touch it.
FWIW, I audited the rest of the assistant for places where it removes
files, and the rest is ok. I have not audited the rest of git-annex.
assistant: Fix bug in direct mode that could occur when a symlink is moved
out of an archive directory, and resulted in the file not being set to
direct mode when it was transferred.
The bug was that the direct mode mapping was not up-to-date when the
transferrer finished. So, finding no direct mode place to store the object,
it was put into .git/annex in indirect mode.
To fix this, just make the watcher update the direct mode mapping to
include the new file before it starts the transfer. (Seems we don't need to
update it to remove the old file if the link was moved, because the direct
mode code will notice it's not present and the mapping gets updated for its
removal later.)
The reason this was a race, and was probably not seen often is because
the committer came along and updated the direct mode mapping as part of
adding the moved symlink. But when the file was sufficiently small or
the remote sufficiently fast, this could happen after the transfer
finished.