Added to change notification to P2P protocol.
Switched to a TBChan so that a single long-running thread can be
started, and serve perhaps intermittent requests for change
notifications, without buffering all changes in memory.
The P2P runner currently starts up a new thread each times it waits
for a change, but that should allow later reusing a thread. Although
each connection from a peer will still need a new watcher thread to run.
The dependency on stm-chans is more or less free; some stuff in yesod
uses it, so it was already indirectly pulled in when building with the
webapp.
This commit was sponsored by Francois Marier on Patreon.
The attacker could just send a very lot of data, with no \n and it would
all be buffered in memory until the kernel killed git-annex or perhaps OOM
killed some other more valuable process.
This is a low impact security hole, only affecting communication between
local git-annex and git-annex-shell on the remote system. (With either
able to be the attacker). Only those with the right ssh key can do it. And,
there are probably lots of ways to construct git repositories that make git
use a lot of memory in various ways, which would have similar impact as
this attack.
The fix in P2P/IO.hs would have been higher impact, if it had made it to a
released version, since it would have allowed DOSing the tor hidden
service without needing to authenticate.
(The LockContent and NotifyChanges instances may not be really
exploitable; since the line is read and ignored, it probably gets read
lazily and does not end up staying buffered in memory.)
Would have liked to make the Parser parse the file and key pairs, but it
seems that optparse-applicative is unable to handle eg:
many ((,) <$> argument <*> argument)
This commit was sponsored by Thomas Hochstein on Patreon.
* rmurl: Multiple pairs of files and urls can be provided on the
command line.
* rmurl: Added --batch mode.
This commit was sponsored by Trenton Cronholm on Patreon.
* map: Run xdot if it's available in PATH. On OSX, the dot command
does not support graphical display, while xdot does.
* Debian: xdot is a better interactive viewer than dot, so Suggest
xdot, rather than graphviz.
The file matcher needs to be run on the destination file not the tmp
file, in order for filename matches to work properly. However, it also
needs to be able to probe the file for size and mime type.
This is a quick fix to a regression. The double rename is not pretty.
It would be good to either have a way to run the largeFileMatcher
such that it is matching on the final filename but looks at the temp
file, or to make addAnnexedFile not need the temp file in a different
location.
Almost working, but there's a bug in the relaying.
Also, made tor hidden service setup pick a random port, to make it harder
to port scan.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.
Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.
This commit was sponsored by Ethan Aubin.
This makes merging a remote into a freshly created direct mode repository
work the same as it works in indirect mode.
The git-annex branches would get merged in any case by a sync,
since that doesn't use git merge.
This might need to be revisited later to better mirror git's behavior.
This avoids needing to bind to the right port before something else
does.
The socket is in /var/run/user/$uid/ which ought to be writable by only
that uid. At least it is on linux systems using systemd.
For Windows, may need to revisit this and use ports or something.
The first version of tor to support sockets for hidden services
was 0.2.6.3. That is not in Debian stable, but is available in
backports.
This commit was sponsored by andrea rota.
Tor unfortunately does not come out of the box configured to let hidden
services register themselves on the fly via the ControlPort.
And, changing the config to enable the ControlPort and a particular type
of auth for it may break something already using the ControlPort, or
lessen the security of the system.
So, this leaves only one option to us: Add a hidden service to the
torrc. git-annex enable-tor does so, and picks an unused high port for
tor to listen on for connections to the hidden service.
It's up to the caller to somehow pick a local port to listen on
that won't be used by something else. That may be difficult to do..
This commit was sponsored by Jochen Bartl on Patreon.
If a transfer fails for some reason, but some data managed to be sent, the
transfer will be retried. (The assistant already did this.)
Possible impacts:
* More ssh prompts if ssh needs to prompt for a password to connect to a
host, or is prompting about some other problem like a ssh key mismatch.
* More data transfer due to retrying, epecially when a remote does not
support resuming a transfer.
In the worst case, a lot of data will be transferred but it fails before
the end, and then all that data gets transferred again plus one byte more;
repeat until it manages to get the whole file.
Closes https://github.com/datalad/datalad/issues/1020
The use of runWriter in scanUnlockedFiles broke due to this change;
it failed with blocked indefinitely in mvar, because the database write
handle was taken while linkFromAnnex needed to also write to it (to update
the inode cache). So, switched to using a separate runWriter for each call
to addAssociatedFileFast. A little less efficient, but not greatly; the
writes should all still be cached.
In the case where the pointer file is in place, and not the content
of the object, lock's performNew was called with filemodified=True,
which caused it to try to repopulate the object from an unmodified
associated file, of which there were none. So, the content of the object
got thrown away incorrectly. This was the cause (although not the root
cause) of data loss in https://github.com/datalad/datalad/issues/1020
The same problem could also occur when the work tree file is modified,
but the object is not, and lock is called with --force. Added a test case
for this, since it's excercising the same code path and is easier to set up
than the problem above.
Note that this only occurred when the keys database did not have an inode
cache recorded for the annex object. Normally, the annex object would be in
there, but there are of course circumstances where the inode cache is out
of sync with reality, since it's only a cache.
Fixed by checking if the object is unmodified; if so we don't need to
try to repopulate it. This does add an additional checksum to the unlock
path, but it's already checksumming the worktree file in another case,
so it doesn't slow it down overall.
Further investigation found a similar problem occurred when smudge --clean
is called on a file and the inode cache is not populated. cleanOldKeys
deleted the unmodified old object file in this case. This was also
fixed by checking if the object is unmodified.
In general, use of getInodeCaches and sameInodeCache is potentially
dangerous if the inode cache has not gotten populated for some reason.
Better to use isUnmodified. I breifly auited other places that check the
inode cache, and did not see any immediate problems, but it would be easy
to miss this kind of problem.
Fixes a bug introduced with v6 mode that I didn't notice until now.
Probably not many v3 repos left out there, and upgrading them to v6 mode
is not disastrous, only a little premature.
This commit was sponsored by Riku Voipio
* sync: Previously, when run in a branch with a slash in its name,
such as "foo/bar", the sync branch was "synced/bar". That conflicted
with the sync branch used for branch "bar", so has been changed to
"synced/foo/bar".
* adjust: Previously, when adjusting a branch with a slash in its name,
such as "foo/bar", the adjusted branch was "adjusted/bar(unlocked)".
That conflicted with the adjusted branch used for branch "bar",
so has been changed to "adjusted/foo/bar(unlocked)"
* Also, running sync in an adjusted branch did not correctly sync
changes back to the parent branch when it had a slash in its name.
This bug has been fixed.
Eliminate use of Git.Ref.under and Git.Ref.basename; using
Git.Ref.underBase and Git.Ref.base make everything handle deep branches
correctly.
Probably noone was adjusting deep branches, and v6 is still experimental
anyway, so I'm not going to worry about the mess that was left by that bug.
In the case of git-annex sync, using a fixed git-annex with an old unfixed
one will mean they use different sync branches for a deep branch, and so
they may stop syncing until the old one is upgraded. However, that's only
a problem when syncing between repositories without going via a central
bare repository. Added a warning about this to the CHANGELOG, but it's
probably not going to affect many people at all.
This commit was sponsored by Riku Voipio.
Avoid threads emitting json at the same time and scrambling, which was
still possible even with the buffering, just less likely.
Converted json IO actions to JSONChunk data too.
This makes -Jn work with --json and --quiet, where before
setting -Jn disabled those options.
Concurrent json output is currently a mess though since threads output
chunks over top of one-another.