* Only export to remotes that were initialized to support it.
* Prevent storing key/value on export remotes.
* Prevent enabling exporttree=yes and encryption in the same remote.
SetupStage Enable was changed to take the old RemoteConfig.
This allowed only setting exporttree when initially setting up a
remote, and not configuring it later after stuff might already be stored
in the remote.
Went with =yes rather than =true for consistency with other parts of
git-annex. Changed docs accordingly.
This commit was supported by the NSF-funded DataLad project.
This will allow disabling exports for remotes that are not configured to
allow them. Also, exportSupported will be useful for the external
special remote to probe.
This commit was supported by the NSF-funded DataLad project
This avoids needing to deal with the complexity of partially transferred
files in the export. We'd not be able to resume uploading to such a file
anyway, so just avoid them.
The implementation in Remote.Directory is not completely ideal, because
it could leave the temp file hanging around in the export directory.
This only happens if it's killed with -9, or there's a power failure;
normally viaTmp cleans up after itself, even when interrupted. I could
not see a better way to do it though, since the export directory might
be the root of a filesystem.
Also some design thoughts on resuming, which depend on storeExport being
atomic.
This commit was sponsored by Fernando Jimenez on Partreon.
Rather than providing the key to export, provide the file.
When exporting a treeish that contains files that are not annexed,
this will let the content of those files also be exported.
There's still a Key in the interface; it will be used by the external
special remote protocol. A SHA1 key can be used when exporting
non-annexed files.
This commit was sponsored by Brock Spratlen on Patreon.
Implemented so far for the directory special remote.
Several remotes don't make sense to export to. Regular Git remotes,
obviously, do not. Bup remotes almost certianly do not, since bup would
need to be used to extract the export; same store for Ddar. Web and
Bittorrent are download-only. GCrypt is always encrypted so exporting to
it would be pointless. There's probably no point complicating the Hook
remotes with exporting at this point. External, S3, Glacier, WebDAV,
Rsync, and possibly Tahoe should be modified to support export.
Thought about trying to reuse the storeKey/retrieveKeyFile/removeKey
interface, rather than adding a new interface. But, it seemed better to
keep it separate, to avoid a complicated interface that sometimes
encrypts/chunks key/value storage and sometimes users non-key/value
storage. Any common parts can be factored out.
Note that storeExport is not atomic.
doc/design/exporting_trees_to_special_remotes.mdwn has some things in
the "resuming exports" section that bear on this decision. Basically,
I don't think, at this time, that an atomic storeExport would help with
resuming, because exports are not key/value storage, and we can't be
sure that a partially uploaded file is the same content we're currently
trying to export.
Also, note that ExportLocation will always use unix path separators.
This is important, because users may export from a mix of windows and
unix, and it avoids complicating the API with path conversions,
and ensures that in such a mix, they always use the same locations for
exports.
This commit was sponsored by Bruno BEAUFILS on Patreon.
Security fix: Disallow hostname starting with a dash, which would get
passed to ssh and be treated an option. This could be used by an attacker
who provides a crafted ssh url (for eg a git remote) to execute arbitrary
code via ssh -oProxyCommand.
No CVE has yet been assigned for this hole.
The same class of security hole recently affected git itself,
CVE-2017-1000117.
Method: Identified all places where ssh is run, by git grep '"ssh"'
Converted them all to use a SshHost, if they did not already, for
specifying the hostname.
SshHost was made a data type with a smart constructor, which rejects
hostnames starting with '-'.
Note that git-annex already contains extensive use of Utility.SafeCommand,
which fixes a similar class of problem where a filename starting with a
dash gets passed to a program which treats it as an option.
This commit was sponsored by Jochen Bartl on Patreon.
External special remotes will refuse to operate on keys with spaces in
their names. That has never worked correctly due to the design of the
external special remote protocol. Display an error message suggesting
migration.
Not super happy with this, but it's a pragmatic solution. Better than
complicating the external special remote interface and all external special
remotes.
Note that I only made it use SafeKey in Request, not Response. git-annex
does not construct a Response, so that would not add any safety. And
presumably, if git-annex avoids feeding any such keys to an external
special remote, it will never have a reason to make a Response using such a
key. If it did, it would result in a protocol error anyway.
There's still a Serializeable instance for Key; it's used by P2P.Protocol.
There, the Key is always in the final position, so it's ok if it contains
spaces.
Note that the protocol documentation has been fixed to say that the File
may contain spaces. One way that can happen, even though the Key can't,
is when using direct mode, and the work tree filename contains spaces.
When sending such a file to the external special remote the worktree
filename is used.
This commit was sponsored by Thom May on Patreon.
Added remote configuration settings annex-ignore-command and
annex-sync-command, which are dynamic equivilants of the annex-ignore
and annex-sync configurations.
For this I needed a new DynamicConfig infrastructure. Its implementation
should be as fast as before when there is no dynamic config, and it caches
so shell commands are only run once.
Note that annex-ignore-command exits nonzero when the remote should be ignored.
While that may seem backwards, it allows using the same command for it as
for annex-sync-command when you want to disable both.
This commit was sponsored by Trenton Cronholm on Patreon.
Removed dependency on MissingH, instead depending on the split
library.
After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.
Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.
This commit was sponsored by Shane-o on Patreon.
This was never supported before. And it doesn't re-encrypt the
gcrypt repo to the new gcrypt-participants, but it does at least now not
crash, and set gcrypt-participants.
This commit was sponsored by andrea rota.
They are handled close the same as they are by git. However, unlike git,
git-annex sometimes needs to pass the -n parameter when using these.
So, this has the potential for breaking some setup, and perhaps there ought
to be a ANNEX_USE_GIT_SSH=1 needed to use these. But I'd rather avoid that
if possible, so let's see if anyone complains.
Almost all places where "ssh" was run have been changed to support the env
vars. Anything still calling sshOptions does not support them. In
particular, rsync special remotes don't. Seems that annex-rsync-transport
already gives sufficient control there.
(Fixed in passing: Remote.Helper.Ssh.toRepo used to extract
remoteAnnexSshOptions and pass them to sshOptions, which was redundant
since sshOptions also extracts those.)
This commit was sponsored by Jeff Goeke-Smith on Patreon.
findShellCommand needs a full path to a file in order to check it for a
shebang on Windows. It was being run with only the base name of the external
special remote program, which would only work when it was in the current
directory.
This is why users in
https://github.com/DanielDent/git-annex-remote-rclone/pull/10 and elsewhere
were complaining that the previous improvements to git-annex didn't make
git-remote-rclone work on Windows.
Also, reworked checkearlytermination, which while it worked, seemed
to rely on a race condition. And, improved its error messages.
This commit was sponsored by Shane-o on Patreon.
sync: When syncing with a local repository located on a crippled
filesystem, run the post-receive hook there, since it wouldn't get run
otherwise. This makes pushing to repos on FAT-formatted removable drives
update them when receive.denyCurrentBranch=updateInstead.
Made Remote.Git export onLocal, which was cleaned up to not have so many
caveats about its use.
This commit was sponsored by Jeff Goeke-Smith on Patreon.
... to avoid it consuming stdin that it shouldn't.
This fixes git-annex-checkpresentkey --batch remote, which didn't output
results for all keys passed into it.
Other git-annex commands that communicate with a remote over ssh may also
have been consuming stdin that they shouldn't have, which could have
impacted using them in eg, shell scripts. For example, a shell script
reading files from stdin and passing them to git annex drop would be
impacted by this bug, whenever git annex drop ran git-annex-shell
checkpresent, it would consume part/all of the stdin that the shell script
was supposed to consume.
Fixed by adding a ConsumeStdin parameter to Annex.Ssh.sshOptions, which
is used throughout git-annex to run ssh (in order for ssh connection
caching to work). Every call site was checked to see if it used
CreatePipe for stdin, and if not was marked NoConsumeStdin.
The check was broken in two ways.. First, nowhere did it error out when
checkUUIDFile found a different UUID already in the file. Instead,
it overwrote the uuid file.
And, checkUUIDFile's implementation was for some reason always failing with
a ConnectionClosed exception. Apparently something to do with using two
different runResourceT's and a response getting GCed inbetween. I'm pretty
sure that used to work, but changed to a more obviously correct
implementation.
This commit was sponsored by Peter Hogg on Patreon.
Most remotes have an idempotent setup that can be reused for
enableremote, but in a few cases, it needs to tell which, and whether
a UUID was provided to setup was used.
This is groundwork for making initremote be able to provide a UUID.
It should not change any behavior.
Note that it would be nice to make the UUID always be provided to setup,
and make setup not need to generate and return a UUID. What prevented
this simplification is Remote.Git.gitSetup, which needs to reuse the
UUID of the git remote when setting it up, and so has to return that
UUID.
This commit was sponsored by Thom May on Patreon.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.
As well as being an optimisation, this helps move toward eliminating use of
missingh.
(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)
I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
The attacker could just send a very lot of data, with no \n and it would
all be buffered in memory until the kernel killed git-annex or perhaps OOM
killed some other more valuable process.
This is a low impact security hole, only affecting communication between
local git-annex and git-annex-shell on the remote system. (With either
able to be the attacker). Only those with the right ssh key can do it. And,
there are probably lots of ways to construct git repositories that make git
use a lot of memory in various ways, which would have similar impact as
this attack.
The fix in P2P/IO.hs would have been higher impact, if it had made it to a
released version, since it would have allowed DOSing the tor hidden
service without needing to authenticate.
(The LockContent and NotifyChanges instances may not be really
exploitable; since the line is read and ignored, it probably gets read
lazily and does not end up staying buffered in memory.)
Display progress meter on send and receive from remote.
Added a new hGetMetered that can read an exact number of bytes (or
less), updating a meter as it goes.
This commit was sponsored by Andreas on Patreon.
In case the repo on the peer changes uuid (eg by a new repo being moved
into place).
Also, added some warning messages when unable to communicate with a
peer.
This commit was sponsored by Anthony DeRobertis on Patreon.
Not tested at all, but it just might work.
Only known problem is that progress is not updated when storing to a P2P
remote.
This commit was sponsored by Nick Daly on Patreon.
Similar to GCrypt remotes, P2P remotes have an url, so Remote.Git has to
separate them out and handle them, passing off to Remote.P2P.
This commit was sponsored by Ignacio on Patreon.
git upload-pack makes some uncessary writes in sequence, this tries to
gather them together to avoid needing to send multiple DATA packets when
just one will do.
In a small pull, this reduces the average number of DATA packets from
4.5 to 2.5.
Still a couple bugs:
* Closing the connection to the server leaves git upload-pack /
receive-pack running, which could be used to DOS.
* Sometimes the data is transferred, but it fails at the end, sometimes
with:
git-remote-tor-annex: <socket: 10>: commitBuffer: resource vanished (Broken pipe)
Must be a race condition around shutdown.
Almost working, but there's a bug in the relaying.
Also, made tor hidden service setup pick a random port, to make it harder
to port scan.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
This is most of the way to having the p2p protocol working over tor
hidden services, at least enough to do git push/pull.
The free monad was split into two, one for network operations and the
other for local (Annex) operations. This will allow git-remote-tor-annex
to run only an IO action, not needing the Annex monad.
This commit was sponsored by Remy van Elst on Patreon.
A bit tricky since Proto doesn't support threads. Rather than adding
threading support to it, ended up using a callback that waits for both
data on a Handle, and incoming messages at the same time.
This commit was sponsored by Denis Dzyubenko on Patreon.
Is content locking needed in the P2P protocol? Based on re-reading
bugs/concurrent_drop--from_presence_checking_failures.mdwn,
I think so: Peers can form cycles, and multiple peers can all be trying
to drop the same content.
So, added content locking to the protocol, with some difficulty.
The implementation is fine as far as it goes, but note the warning
comment for lockContentWhile -- if the connection to the peer is dropped
unexpectedly, the peer will then unlock the content, and yet the local
side will still think it's locked.
To be honest I'm not sure if Remote.Git's lockKey for ssh remotes
doesn't have the same problem. It checks that the
"ssh remote git-annex-shell lockcontent"
process has not exited, but if the connection closes afer that check,
the lockcontent command will unlock it, and yet the local side will
still think it's locked.
Probably this needs to be fixed by eg, making lockcontent catch any
execptions due to the connection closing, and in that case, wait a
significantly long time before dropping the lock.
This commit was sponsored by Anthony DeRobertis on Patreon.
For use with tor hidden services, and perhaps other transports later.
Based on Utility.SimpleProtocol, it's a line-based protocol,
interspersed with transfers of bytestrings of a specified size.
Implementation of the local and remote sides of the protocol is done
using a free monad. This lets monadic code be included here, without
tying it to any particular way to get bytes peer-to-peer.
This adds a dependency on the haskell package "free", although that
was probably pulled in transitively from other dependencies already.
This commit was sponsored by Jeff Goeke-Smith on Patreon.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.
Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.
This commit was sponsored by Ethan Aubin.
* S3: Support the special case endpoint needed for the cn-north-1 region.
* Webapp: Don't list the Frankfurt region, as this (and some other new
regions) need V4 authorization which the aws library does not yet use.
This commit was sponsored by Nick Daly on Patreon.
If a transfer fails for some reason, but some data managed to be sent, the
transfer will be retried. (The assistant already did this.)
Possible impacts:
* More ssh prompts if ssh needs to prompt for a password to connect to a
host, or is prompting about some other problem like a ssh key mismatch.
* More data transfer due to retrying, epecially when a remote does not
support resuming a transfer.
In the worst case, a lot of data will be transferred but it fails before
the end, and then all that data gets transferred again plus one byte more;
repeat until it manages to get the whole file.
Multiple external special remote processes for the same remote will be
started as needed when using -J.
This should not beak any existing external special remotes, because running
multiple git-annex commands at the same time could already start multiple
processes for the same external special remotes.
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.
Noisy due to some refactoring into Types/
Ignore exceptions when getting the cost and availability for the remote,
and return sane defaults. These defaults are not cached, so if a special
remote program has a transient problem, it will re-query it later.
Removed the instance LensGpgEncParams RemoteConfig because it encouraged
code that does not take the RemoteGitConfig into account.
RemoteType's setup was changed to take a RemoteGitConfig,
although the only place that is able to provide a non-empty one is
enableremote, when it's changing an existing remote. This led to several
folow-on changes, and got RemoteGitConfig plumbed through.
This is useful for makking a special remote that anyone with a clone of the
repo and your public keys can upload files to, but only you can decrypt the
files stored in it.
The naming is unofrtunately not consistent, but the gnupg-options
were only used for encrypting, and it's too late to change that.
It would be nice to have a third setting that is always passed to gnupg,
but ~/.gnupg/options can be used to specify such global options when really
needed.
Since git-annex unsets these when started, they have to be explicitly
propigated. Also, this makes --git-dir and --work-tree settings be
reflected in the environment.
The need for this came up in
https://github.com/DanielDent/git-annex-remote-rclone/issues/3