Commit graph

312 commits

Author SHA1 Message Date
Joey Hess
d3ab5e626b
rename key2file and file2key
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.

Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
2019-01-14 13:03:35 -04:00
Joey Hess
370757087d
catch lockContentForRemoval exception
removeKey should not throw exceptions, so catch exception there

In Assistant.Unused, keep trying to drop other keys if one drop fails
2018-11-15 15:39:57 -04:00
Joey Hess
303d10cee6
Improve display when git config download from a http remote fails.
The error message displayed used to only come from curl/wget and perhaps
was clearer than the one displayed now that http-client is used. In any
case, it does make sense to hide it because git-annex prints its own
warning message.

This commit was sponsored by Jake Vosloo on Patreon.
2018-10-03 12:31:09 -04:00
Joey Hess
02630b39ee
add Remote.readonly
Does nothing yet.

Considered making bup readonly, but while the content can't be removed,
it is able to delete a branch, so didn't.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:12:18 -04:00
Joey Hess
ae11394efa
added annex.commitmessage
Added annex.commitmessage config that can specify a commit message for the
git-annex branch instead of the usual "update".

This commit was supported by the NSF-funded DataLad project.
2018-08-02 14:06:06 -04:00
Joey Hess
9f3a346f25
fix nested exception bug
Fix reversion introduced in version 6.20180316 that caused git-annex to
stop processing files when unable to contact a ssh remote.

The bug was not in any of the changed lines, but this one in inAnnex:

P2PHelper.checkpresent (Ssh.runProto rmt connpool (cantCheck rmt) fallback) key

cantCheck throws an exception, but that parameter to runProto expects a
value, which it returns. So, inAnnex is returning a Bool containing an
exception. This defeats the usual checks for checkPresent throwing an
exception, crashing git-annex.

Fixed by making runProto take an `Annex a` instead of an `a`, so
passing cantCheck to it doesn't nest exceptions.

This commit was sponsored by andrea rota.
2018-07-03 13:10:43 -04:00
Joey Hess
b657242f5d
enforce retrievalSecurityPolicy
Leveraged the existing verification code by making it also check the
retrievalSecurityPolicy.

Also, prevented getViaTmp from running the download action at all when the
retrievalSecurityPolicy is going to prevent verifying and so storing it.

Added annex.security.allow-unverified-downloads. A per-remote version
would be nice to have too, but would need more plumbing, so KISS.
(Bill the Cat reference not too over the top I hope. The point is to
make this something the user reads the documentation for before using.)

A few calls to verifyKeyContent and getViaTmp, that don't
involve downloads from remotes, have RetrievalAllKeysSecure hard-coded.
It was also hard-coded for P2P.Annex and Command.RecvKey,
to match the values of the corresponding remotes.

A few things use retrieveKeyFile/retrieveKeyFileCheap without going
through getViaTmp.
* Command.Fsck when downloading content from a remote to verify it.
  That content does not get into the annex, so this is ok.
* Command.AddUrl when using a remote to download an url; this is new
  content being added, so this is ok.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-06-21 13:37:01 -04:00
Joey Hess
4315bb9e42
add retrievalSecurityPolicy
This will be used to protect against CVE-2018-10859, where an encrypted
special remote is fed the wrong encrypted data, and so tricked into
decrypting something that the user encrypted with their gpg key and did
not store in git-annex.

It also protects against CVE-2018-10857, where a remote follows a http
redirect to a file:// url or to a local private web server. While that's
already been prevented in git-annex's own use of http, external special
remotes, hooks, etc use other http implementations and could still be
vulnerable.

The policy is not yet enforced, this commit only adds the appropriate
metadata to remotes.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-06-21 11:36:36 -04:00
Joey Hess
0f566ed242
removal of the rest of remoteGitConfig
In keyUrls, the GitConfig is used only by annexLocations
to support configured Differences. Since such configurations affect all
clones of a repository, the local repo's GitConfig must have the same
information as the remote's GitConfig would have. So, used getGitConfig
to get the local GitConfig, which is cached and so available cheaply.

That actually fixed a bug noone had ever noticed: keyUrls is
used for remotes accessed over http. The full git config of such a
remote is normally not available, so the remoteGitConfig that keyUrls
used would not have the necessary information in it.

In copyFromRemoteCheap', it uses gitAnnexLocation,
which does need the GitConfig of the remote repo itself in order to
check if it's crippled, supports symlinks, etc. So, made the
State include that GitConfig, cached. The use of gitAnnexLocation is
within a (not $ Git.repoIsUrl repo) guard, so it's local, and so
its git config will always be read and available.

(Note that gitAnnexLocation in turn calls annexLocations, so the
Differences config it uses in this case comes from the remote repo's
GitConfig and not from the local repo's GitConfig. As explained above
this is ok since they must have the same value.)

Not very happy with this mess of different GitConfigs not type-safe and
some read only sometimes etc. Very hairy. Think I got it this change
right. Test suite passes..

This commit was sponsored by Ethan Aubin.
2018-06-05 14:48:37 -04:00
Joey Hess
fc5888300f
fix annex-checkuuid
Fixed annex-checkuuid implementation, so that remotes configured that way
can be used. This was 100% broken from the first commit of it, oops.

This commit was sponsored by Øyvind Andersen Holm.
2018-06-04 16:52:22 -04:00
Joey Hess
67e46229a5
change Remote.repo to Remote.getRepo
This is groundwork for letting a repo be instantiated the first time
it's actually used, instead of at startup.

The only behavior change is that some old special cases for xmpp remotes
were removed. Where before git-annex silently did nothing with those
no-longer supported remotes, it may now fail in some way.

The additional IO action should have no performance impact as long as
it's simply return.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2018-06-04 15:30:26 -04:00
Joey Hess
c34152777b
Use http-conduit for url downloads by default, annex.web-options enables curl
* For url downloads, git-annex now defaults to using a http library,
  rather than wget or curl. But, if annex.web-options is set, it will
  use curl. To use the .netrc file, run:
    git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
  git-annex builds).

Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 17:36:20 -04:00
Joey Hess
9b98d3f630
better HTTP connection reuse
Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.

Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.

Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.

This commit was supported by the NSF-funded DataLad project.
2018-04-04 15:39:40 -04:00
Joey Hess
2ec07bc29f
Avoid running annex.http-headers-command more than once. 2018-04-04 15:15:08 -04:00
Joey Hess
46d4316954
implement annex.retry et al
Added annex.retry, annex.retry-delay, and per-remote versions to configure
transfer retries.

This commit was supported by the NSF-funded DataLad project.
2018-03-29 13:04:07 -04:00
Joey Hess
31e1adc005
deal with unlocked files
P2P protocol version 1 adds VALID|INVALID after DATA; INVALID means the
file was detected to change content while it was being sent and so we
may not have received the valid content of the file.

Added new MustVerify constructor for Verification, which forces
verification even when annex.verify=false etc. This is used when INVALID
and in protocol version 0.

As well as changing git-annex-shell p2psdio, this makes git-annex tor
remotes always force verification, since they don't yet use protocol
version 1. Previously, annex.verify=false could skip verification when
using tor remotes, and let bad data into the repository.

This commit was sponsored by Jack Hill on Patreon.
2018-03-13 14:27:14 -04:00
Joey Hess
b96b845ffd
fix nested progress meters when using git-annex-shell fallback
Caused an ugly blank line when the first progress meter was not used,
but also it may have confused -J display.
2018-03-12 19:20:10 -04:00
Joey Hess
1c2c8995ac
hide rsync progress output when metered but not in other uses of rsync 2018-03-12 18:36:07 -04:00
Joey Hess
cb05ef06bf
fix lost metering for fallback rsyncs
08814327ff accidentially got rid of it,
when it removed commandMetered.
2018-03-12 18:22:48 -04:00
Joey Hess
c3df5d1f10
avoid double-connect to unreachable ssh remote
When git-annex-shell p2pstdio fails with 255, it's because the ssh
server is not reachable. Avoid running the fallback action in this case,
since it would just try a second time to connect, and presumably fail.

Note that the closed P2PSshConnection will not be stored in the pool,
so the next request tries again to connect. This is just the right
behavior; when the remote becomes reachable again, the same git-annex
process will start using it.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-03-12 16:50:21 -04:00
Joey Hess
d7f54671bf
refactoring 2018-03-09 13:48:10 -04:00
Joey Hess
936ab43932
use P2P for locking keys
The P2P protocol is now fully used for git-annex-shell.

This commit was sponsored by Ewen McNeill on Patreon.
2018-03-09 13:42:55 -04:00
Joey Hess
08814327ff
use P2P protocol for checkpresent, retrieve, and store
Note that, due to not using rsync to transfer files to ssh remotes
any longer, permissions and other file metadata of annexed files
will no longer be preserved when copying them to ssh remotes.
Other remotes never supported preserving that information, so
this is not considered a regression. Added NEWS item about this.

Another significant side effect of this is that, even when rsync is run to
retrieve a file, its progress display will no longer be shown, and
instead the native git-annex progress display will appear. It would be
possible to use the rsync process display when rsync is used (old
git-annex-shell and also retrieval from a local repository), but it
would have complicated the code unncessarily, and been inconsistent
behavior.

(I'd been thinking for a while about eliminating the rsync progress
display, since it's got some annoying verbosities, including display of
the key and the "(xfr#1, to-chk=0/1)" bit and was already somewhat
inconsistent.)

retrieveKeyFileCheap still uses rsync, since that ensures that it gets
the actual file content from the remote. Using the P2P protocol would
use the local content, as long as the local and remote size are the
same.

This commit was sponsored by John Pellman on Patreon.
2018-03-09 13:25:16 -04:00
Joey Hess
5bc0ab3f31
going AGPL
Remote/Git.hs now contains AGPL licensed code, thus the license
of git-annex as a whole is AGPL. This was already the case when git-annex
was built with the webapp enabled.

The AGPL license will apply to all code added to Remote/Git.hs in the
future, which is going to include support for using
`git-annex-shell p2pstdio`.
2018-03-09 01:03:46 -04:00
Joey Hess
6a59bc4845
use P2P protocol for drop
Not yet used for everything else, but this is enough to
verify that it works, and do some benchmarking.

Some bugfixes included, which got it working. Also fallback to old
actions has been verified to work correctly.

Benchmarked dropping one thousand files from a ssh remote on localhost.
Using the old git-annex	40.867 seconds.
With the P2P protocol	9.905 seconds!

This commit was sponsored by Jochen Bartl on Patreon.
2018-03-08 16:56:17 -04:00
Joey Hess
16af259209
refactor p2p remote action code
Make a Remote.Helper.P2P using code that was in Remote.P2P, converted to
use generic protocol runner actions.

This will allow it to be reused in Remote.Git.

This commit was sponsored by mo on Patreon.
2018-03-08 16:11:00 -04:00
Joey Hess
c036a380b2
p2p ssh connection pools
Much like Remote.P2P, there's a pool of connections to a peer, in order
to support concurrent operations.

Deals with old git-annex-ssh on the remote that does not support p2pstdio,
by only trying once to use it, and remembering if it's not supported.

Made p2pstdio send an AUTH_SUCCESS with its uuid, which serves the dual
purposes of something to detect to see that the connection is working,
and a way to verify that it's connected to the right uuid.
(There's a redundant uuid check since the uuid field is sent
by git_annex_shell, but I anticipate that being removed later when
the legacy git-annex-shell stuff gets removed.)

Not entirely happy with Remote.Git.runSsh's behavior
when the proto action fails. Running the fallback will work ok, but what
will we do when the fallbacks later get removed? It might be better to
try to reconnect, in case the connection got closed.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-03-08 15:11:31 -04:00
Joey Hess
f4103744c3
make sure that lockContentShared is always paired with an inAnnex check
lockContentShared had a screwy caveat that it didn't verify that the content
was present when locking it, but in the most common case, eg indirect mode,
it failed to lock when the content is not present.

That led to a few callers forgetting to check inAnnex when using it,
but the potential data loss was unlikely to be noticed because it only
affected direct mode I think.

Fix data loss bug when the local repository uses direct mode, and a
locally modified file is dropped from a remote repsitory. The bug
caused the modified file to be counted as a copy of the original file.
(This is not a severe bug because in such a situation, dropping
from the remote and then modifying the file is allowed and has the same
end result.)

And, in content locking over tor, when the remote repository is
in direct mode, it neglected to check that the content was actually
present when locking it. This could cause git annex drop to remove
the only copy of a file when it thought the tor remote had a copy.

So, make lockContentShared do its own inAnnex check. This could perhaps
be optimised for direct mode, to avoid the check then, since locking
the content necessarily verifies it exists there, but I have not bothered
with that.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-03-07 14:23:52 -04:00
Joey Hess
a28c541e23
add remote.<name>.annex-checkuuid
Added remote.<name>.annex-checkuuid config, which can be set to false to
disable the default checking of the uuid of remotes that point to
directories. This can be useful to avoid unncessary drive spin-ups and
automounting.

Note that the UUID check is still done before writing to the repository,
to avoid writing to the wrong repository if it got relocated. Check is
also done before checkPresent to avoid getting confused about what is in
which repo. This is effectively the same as the use of git-annex-shell
with a uuid to check that the remote repository is the expected one.
Did not bother with the check for retrieveKeyFile because it doesn't
matter if the wrong repo is used then.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-01-10 14:21:18 -04:00
Joey Hess
2b66492d6e
Improve startup time for commands that do not operate on remotes
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.

Got rid of the remotes member of Git.Repo. This was a bit painful.

Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.

This commit was sponsored by Jake Vosloo on Patreon.
2018-01-09 16:22:07 -04:00
Joey Hess
f5edb16729
Display progress meter when uploading a key without size information
Getting the size by statting the content file.

This commit was supported by the NSF-funded DataLad project.
2017-11-14 16:40:49 -04:00
Joey Hess
5c32196a37
fix process and FD leak
Fix process and file descriptor leak that was exposed when git-annex was
built with ghc 8.2.1. Apparently ghc has changed its behavior of GC
of open file handles that are pipes to running processes. That
broke git-annex test on OSX due to running out of FDs.

Audited for all uses of Annex.new and made stopCoProcesses be called
once it's done with the state. Fixed several places that might have
leaked in other situations than running the test suite.

This commit was sponsored by Ewen McNeill.
2017-09-29 22:36:08 -04:00
Joey Hess
16eb2f976c
prevent exporttree=yes on remotes that don't support exports
Don't allow "exporttree=yes" to be set when the special remote
does not support exports. That would be confusing since the user would
set up a special remote for exports, but `git annex export` to it would
later fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 13:48:44 -04:00
Joey Hess
28e2cad849
implement exporttree=yes configuration
* Only export to remotes that were initialized to support it.
* Prevent storing key/value on export remotes.
* Prevent enabling exporttree=yes and encryption in the same remote.

SetupStage Enable was changed to take the old RemoteConfig.
This allowed only setting exporttree when initially setting up a
remote, and not configuring it later after stuff might already be stored
in the remote.

Went with =yes rather than =true for consistency with other parts of
git-annex. Changed docs accordingly.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 13:09:38 -04:00
Joey Hess
a4328b49d2
refactor ExportActions
This will allow disabling exports for remotes that are not configured to
allow them. Also, exportSupported will be useful for the external
special remote to probe.

This commit was supported by the NSF-funded DataLad project
2017-09-01 13:05:09 -04:00
Joey Hess
e55e445a36
add API for exporting
Implemented so far for the directory special remote.

Several remotes don't make sense to export to. Regular Git remotes,
obviously, do not. Bup remotes almost certianly do not, since bup would
need to be used to extract the export; same store for Ddar. Web and
Bittorrent are download-only. GCrypt is always encrypted so exporting to
it would be pointless. There's probably no point complicating the Hook
remotes with exporting at this point. External, S3, Glacier, WebDAV,
Rsync, and possibly Tahoe should be modified to support export.

Thought about trying to reuse the storeKey/retrieveKeyFile/removeKey
interface, rather than adding a new interface. But, it seemed better to
keep it separate, to avoid a complicated interface that sometimes
encrypts/chunks key/value storage and sometimes users non-key/value
storage. Any common parts can be factored out.

Note that storeExport is not atomic.
doc/design/exporting_trees_to_special_remotes.mdwn has some things in
the "resuming exports" section that bear on this decision. Basically,
I don't think, at this time, that an atomic storeExport would help with
resuming, because exports are not key/value storage, and we can't be
sure that a partially uploaded file is the same content we're currently
trying to export.

Also, note that ExportLocation will always use unix path separators.
This is important, because users may export from a mix of windows and
unix, and it avoids complicating the API with path conversions,
and ensures that in such a mix, they always use the same locations for
exports.

This commit was sponsored by Bruno BEAUFILS on Patreon.
2017-08-29 13:00:41 -04:00
Joey Hess
d39c120afa
add annex-ignore-command and annex-sync-command configs
Added remote configuration settings annex-ignore-command and
annex-sync-command, which are dynamic equivilants of the annex-ignore
and annex-sync configurations.

For this I needed a new DynamicConfig infrastructure. Its implementation
should be as fast as before when there is no dynamic config, and it caches
so shell commands are only run once.

Note that annex-ignore-command exits nonzero when the remote should be ignored.
While that may seem backwards, it allows using the same command for it as
for annex-sync-command when you want to disable both.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-08-17 13:54:14 -04:00
Joey Hess
db1600b2de
de-Maybe remoteGitConfig
It's always set, so does not need to be a Maybe.
2017-05-11 16:05:01 -04:00
Joey Hess
3c8eb59860
When a http remote does not expose an annex.uuid config, only warn about it once, not every time git-annex is run.
Same behavior as for a ssh remote.
2017-03-29 12:43:47 -04:00
Joey Hess
c8e1e3dada
AssociatedFile newtype
To prevent any further mistakes like 301aff34c4

This commit was sponsored by Francois Marier on Patreon.
2017-03-10 13:35:31 -04:00
Joey Hess
e6857e75a6
sync hack to make updateInstead work on eg FAT
sync: When syncing with a local repository located on a crippled
filesystem, run the post-receive hook there, since it wouldn't get run
otherwise. This makes pushing to repos on FAT-formatted removable drives
update them when receive.denyCurrentBranch=updateInstead.

Made Remote.Git export onLocal, which was cleaned up to not have so many
caveats about its use.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-02-17 15:21:52 -04:00
Joey Hess
00464fbed7
have onLocal stop any coprocesses, not only cat-file
I have not seen any other coprocesses being started, but let's avoid
problems if any do for whatever reason.
2017-02-17 14:30:18 -04:00
Joey Hess
f07af03018
Run ssh with -n whenever input is not being piped into it
... to avoid it consuming stdin that it shouldn't.

This fixes git-annex-checkpresentkey --batch remote, which didn't output
results for all keys passed into it.

Other git-annex commands that communicate with a remote over ssh may also
have been consuming stdin that they shouldn't have, which could have
impacted using them in eg, shell scripts. For example, a shell script
reading files from stdin and passing them to git annex drop would be
impacted by this bug, whenever git annex drop ran git-annex-shell
checkpresent, it would consume part/all of the stdin that the shell script
was supposed to consume.

Fixed by adding a ConsumeStdin parameter to Annex.Ssh.sshOptions, which
is used throughout git-annex to run ssh (in order for ssh connection
caching to work). Every call site was checked to see if it used
CreatePipe for stdin, and if not was marked NoConsumeStdin.
2017-02-15 15:08:46 -04:00
Joey Hess
5c804cf42e
add SetupStage parameter to RemoteType.setup
Most remotes have an idempotent setup that can be reused for
enableremote, but in a few cases, it needs to tell which, and whether
a UUID was provided to setup was used.

This is groundwork for making initremote be able to provide a UUID.
It should not change any behavior.

Note that it would be nice to make the UUID always be provided to setup,
and make setup not need to generate and return a UUID. What prevented
this simplification is Remote.Git.gitSetup, which needs to reuse the
UUID of the git remote when setting it up, and so has to return that
UUID.

This commit was sponsored by Thom May on Patreon.
2017-02-07 14:55:58 -04:00
Joey Hess
15be5c04a6
git-annex-shell, remotedaemon, git remote: Fix some memory DOS attacks.
The attacker could just send a very lot of data, with no \n and it would
all be buffered in memory until the kernel killed git-annex or perhaps OOM
killed some other more valuable process.

This is a low impact security hole, only affecting communication between
local git-annex and git-annex-shell on the remote system. (With either
able to be the attacker). Only those with the right ssh key can do it. And,
there are probably lots of ways to construct git repositories that make git
use a lot of memory in various ways, which would have similar impact as
this attack.

The fix in P2P/IO.hs would have been higher impact, if it had made it to a
released version, since it would have allowed DOSing the tor hidden
service without needing to authenticate.

(The LockContent and NotifyChanges instances may not be really
exploitable; since the line is read and ignored, it probably gets read
lazily and does not end up staying buffered in memory.)
2016-12-09 13:34:32 -04:00
Joey Hess
58f5d41cac
fix 2016-12-09 12:56:38 -04:00
Joey Hess
0f3a3ff1e5
make clear that log is only updated after successful removal
This does not change behavior, because an exception is thrown on
unsuccessful removal. But is clearer.
2016-12-09 12:54:18 -04:00
Joey Hess
b29088b8dc
stub Remote.P2P
Similar to GCrypt remotes, P2P remotes have an url, so Remote.Git has to
separate them out and handle them, passing off to Remote.P2P.

This commit was sponsored by Ignacio on Patreon.
2016-12-06 12:27:58 -04:00
Joey Hess
0a4479b8ec
Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.

Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.

This commit was sponsored by Ethan Aubin.
2016-11-15 21:29:54 -04:00
Joey Hess
8dcf79694d
enable forwardRetry for command-line transfers
If a transfer fails for some reason, but some data managed to be sent, the
transfer will be retried. (The assistant already did this.)

Possible impacts:

* More ssh prompts if ssh needs to prompt for a password to connect to a
  host, or is prompting about some other problem like a ssh key mismatch.

* More data transfer due to retrying, epecially when a remote does not
  support resuming a transfer.

  In the worst case, a lot of data will be transferred but it fails before
  the end, and then all that data gets transferred again plus one byte more;
  repeat until it manages to get the whole file.
2016-10-26 15:38:27 -04:00