Unfortunately ReceiveMessage didn't handle unknown messages the way it
was documented to; client sending VERSION would cause the server to
return an ERROR and hang up. Fixed that, but old releases of git-annex
use the P2P protocol for tor and will still have that behavior.
So, version is not negotiated for Remote.P2P connections, only for
Remote.Git connections, which will support VERSION from their first
release. There will need to be a later flag day to change Remote.P2P;
left a commented out line that is the only thing that will need to be
changed then.
Version 1 of the P2P protocol is not implemented yet, but updated
the docs for the DATA change that will be allowed by that version.
This commit was sponsored by Jeff Goeke-Smith on Patreon.
The former can be useful to make remotes that don't get fully synced with
local changes, which comes up in a lot of situations.
The latter was mostly added for symmetry, but could be useful (though less
likely to be).
Implementing `remote.<name>.annex-pull` was a bit tricky, as there's no one
place where git-annex pulls/fetches from remotes. I audited all
instances of "fetch" and "pull". A few cases were left not checking this
config:
* Git.Repair can try to pull missing refs from a remote, and if the local
repo is corrupted, that seems a reasonable thing to do even though
the config would normally prevent it.
* Assistant.WebApp.Gpg and Remote.Gcrypt and Remote.Git do fetches
as part of the setup process of a remote. The config would probably not
be set then, and having the setup fail seems worse than honoring it if it
is already set.
I have not prevented all the code that does a "merge" from merging branches
from remotes with remote.<name>.annex-pull=false. That could perhaps
be done, but it would need a way to map from branch name to remote name,
and the way refspecs work makes that hard to get really correct. So if the
user fetches manually, the git-annex branch will get merged, for example.
Anther way of looking at/justifying this is that the setting is called
"annex-pull", not "annex-merge".
This commit was supported by the NSF-funded DataLad project.
... to avoid it consuming stdin that it shouldn't.
This fixes git-annex-checkpresentkey --batch remote, which didn't output
results for all keys passed into it.
Other git-annex commands that communicate with a remote over ssh may also
have been consuming stdin that they shouldn't have, which could have
impacted using them in eg, shell scripts. For example, a shell script
reading files from stdin and passing them to git annex drop would be
impacted by this bug, whenever git annex drop ran git-annex-shell
checkpresent, it would consume part/all of the stdin that the shell script
was supposed to consume.
Fixed by adding a ConsumeStdin parameter to Annex.Ssh.sshOptions, which
is used throughout git-annex to run ssh (in order for ssh connection
caching to work). Every call site was checked to see if it used
CreatePipe for stdin, and if not was marked NoConsumeStdin.
weasel explained that apparmor limits on what files tor can read do not
apply to sockets (because they're not files). And apparently the
problems I was seeing with hidden services not being accessible had to
do with onion address propigation and not the location of the socket
file.
remotedaemon looks up the HiddenServicePort in torrc, so if it was
previously configured with the socket in /etc, that will still work.
This commit was sponsored by Denis Dzyubenko on Patreon.
Old stm lacks isFullTMQueue.
To avoid needing to update stm on the Android autobuilder, I switched to
a TBMQueue. It never needs to be closed, but the overhead is minimal.
10 seemed too low because more than 10 friends could be linked to a repo
over tor, and if all were running the remotedaemon, which makes a
persistent connection for change notification, then the 11th friend
would not be able to access that repo.
100 might be too low, but it's a much larger group of people. And at
that size group, it probably makes sense to structure the network so
that 100 peers are not all trying to access one central node.
This is more efficient. Note that the peer will get CHANGED messages for
all refs changed since the connection opened, even if those changes
happened before it sent NOTIFYCHANGE.
Added to change notification to P2P protocol.
Switched to a TBChan so that a single long-running thread can be
started, and serve perhaps intermittent requests for change
notifications, without buffering all changes in memory.
The P2P runner currently starts up a new thread each times it waits
for a change, but that should allow later reusing a thread. Although
each connection from a peer will still need a new watcher thread to run.
The dependency on stm-chans is more or less free; some stuff in yesod
uses it, so it was already indirectly pulled in when building with the
webapp.
This commit was sponsored by Francois Marier on Patreon.
The attacker could just send a very lot of data, with no \n and it would
all be buffered in memory until the kernel killed git-annex or perhaps OOM
killed some other more valuable process.
This is a low impact security hole, only affecting communication between
local git-annex and git-annex-shell on the remote system. (With either
able to be the attacker). Only those with the right ssh key can do it. And,
there are probably lots of ways to construct git repositories that make git
use a lot of memory in various ways, which would have similar impact as
this attack.
The fix in P2P/IO.hs would have been higher impact, if it had made it to a
released version, since it would have allowed DOSing the tor hidden
service without needing to authenticate.
(The LockContent and NotifyChanges instances may not be really
exploitable; since the line is read and ignored, it probably gets read
lazily and does not end up staying buffered in memory.)
Each worker thread needs to run in the Annex monad, but the
remote-daemon's liftAnnex can only run 1 action at a time. Used
Annex.Concurrent to deal with that.
P2P.Annex is incomplete as of yet.
Almost working, but there's a bug in the relaying.
Also, made tor hidden service setup pick a random port, to make it harder
to port scan.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
* sync: Use the ssh-options git config when doing git pull and push.
* remotedaemon: Use the ssh-options git config.
Note that the rename env var means that if a new git-annex calls an old one
for git-annex ssh, or a new calls an old, nothing much will go wrong;
just ssh caching won't happen.
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.
Done as a background task while listening to episode 2 of the Type Theory
podcast.
Removed old extensible-exceptions, only needed for very old ghc.
Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.
Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.
However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.
Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
For sync, saves 1 ssh connection per remote. For remotedaemon, the same
ssh connection that is already open to run git-annex-shell notifychanges
is reused to pull from the remote.
Only potential problem is that this also enables connection caching
when the assistant syncs with a ssh remote. Including the sync it does
when a network connection has just come up. In that case, cached ssh
connections are likely to be stale, and so using them would hang.
Until I'm sure such problems have been dealt with, this commit needs to
stay on the remotecontrol branch, and not be merged to master.
This commit was sponsored by Alexandre Dupas.
* Remote system might be available, and connection get lost. Should
reconnect, but needs to avoid bad behavior (ie, constant reconnect
attempts.) Use exponential backoff.
* Detect if old system had a too old git-annex-shell, and show the user
a nice message in the webapp. Required parsing error messages, so perhaps
this code shoudl be removed once enough time has passed..
* Switch the protocol to using remote URI's, rather than remote names.
Names change. Also avoids issues with serialization of names containing
whitespace.
This is nearly ready for merge into master now. I'd still like to make the ssh
transport smarter about reusing ssh connection caching during git pull.
This commit was sponsored by Jim Paris.