Commit graph

449 commits

Author SHA1 Message Date
Joey Hess
b657242f5d
enforce retrievalSecurityPolicy
Leveraged the existing verification code by making it also check the
retrievalSecurityPolicy.

Also, prevented getViaTmp from running the download action at all when the
retrievalSecurityPolicy is going to prevent verifying and so storing it.

Added annex.security.allow-unverified-downloads. A per-remote version
would be nice to have too, but would need more plumbing, so KISS.
(Bill the Cat reference not too over the top I hope. The point is to
make this something the user reads the documentation for before using.)

A few calls to verifyKeyContent and getViaTmp, that don't
involve downloads from remotes, have RetrievalAllKeysSecure hard-coded.
It was also hard-coded for P2P.Annex and Command.RecvKey,
to match the values of the corresponding remotes.

A few things use retrieveKeyFile/retrieveKeyFileCheap without going
through getViaTmp.
* Command.Fsck when downloading content from a remote to verify it.
  That content does not get into the annex, so this is ok.
* Command.AddUrl when using a remote to download an url; this is new
  content being added, so this is ok.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-06-21 13:37:01 -04:00
Joey Hess
f34faad9aa
finalize changelog for release 2018-06-19 11:41:50 -04:00
Joey Hess
c81b879d39
got a CVE number 2018-06-18 17:56:18 -04:00
Joey Hess
3c0a538335
allow ftp urls by default
They're no worse than http certianly. And, the backport of these
security fixes has to deal with wget, which supports http https and ftp
and has no way to turn off individual schemes, so this will make that
easier.
2018-06-18 15:37:17 -04:00
Joey Hess
cc08135e65
prevent using local http proxies per annex.security.allowed-http-addresses
A local http proxy would bypass the security configuration. So,
the security configuration has to be applied when choosing whether to
use the proxy.

While http rebinding attacks against the dns lookup of the proxy IP
address seem very unlikely, this implementation does prevent them, since
it resolves the IP address once, checks it, and then reconfigures
http-client's proxy using the resolved address.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-06-18 13:32:20 -04:00
Joey Hess
e62c4543c3
default to not using youtube-dl, for security
Pity, but same reasoning as curl applies to it.

This commit was sponsored by Peter on Patreon.
2018-06-17 14:51:02 -04:00
Joey Hess
b54b2cdc0e
prevent http connections to localhost and private ips by default
Security fix!

* git-annex will refuse to download content from http servers on
  localhost, or any private IP addresses, to prevent accidental
  exposure of internal data. This can be overridden with the
  annex.security.allowed-http-addresses setting.
* Since curl's interface does not have a way to prevent it from accessing
  localhost or private IP addresses, curl defaults to not being used
  for url downloads, even if annex.web-options enabled it before.
  Only when annex.security.allowed-http-addresses=all will curl be used.

Since S3 and WebDav use the Manager, the same policies apply to them too.

youtube-dl is not handled yet, and a http proxy configuration can bypass
these checks too. Those cases are still TBD.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-06-17 13:30:28 -04:00
Joey Hess
28720c795f
limit url downloads to whitelisted schemes
Security fix! Allowing any schemes, particularly file: and
possibly others like scp: allowed file exfiltration by anyone who had
write access to the git repository, since they could add an annexed file
using such an url, or using an url that redirected to such an url,
and wait for the victim to get it into their repository and send them a copy.

* Added annex.security.allowed-url-schemes setting, which defaults
  to only allowing http and https URLs. Note especially that file:/
  is no longer enabled by default.

* Removed annex.web-download-command, since its interface does not allow
  supporting annex.security.allowed-url-schemes across redirects.
  If you used this setting, you may want to instead use annex.web-options
  to pass options to curl.

With annex.web-download-command removed, nearly all url accesses in
git-annex are made via Utility.Url via http-client or curl. http-client
only supports http and https, so no problem there.
(Disabling one and not the other is not implemented.)

Used curl --proto to limit the allowed url schemes.

Note that this will cause git annex fsck --from web to mark files using
a disallowed url scheme as not being present in the web. That seems
acceptable; fsck --from web also does that when a web server is not available.

youtube-dl already disabled file: itself (probably for similar
reasons). The scheme check was also added to youtube-dl urls for
completeness, although that check won't catch any redirects it might
follow. But youtube-dl goes off and does its own thing with other
protocols anyway, so that's fine.

Special remotes that support other domain-specific url schemes are not
affected by this change. In the bittorrent remote, aria2c can still
download magnet: links. The download of the .torrent file is
otherwise now limited by annex.security.allowed-url-schemes.

This does not address any external special remotes that might download
an url themselves. Current thinking is all external special remotes will
need to be audited for this problem, although many of them will use
http libraries that only support http and not curl's menagarie.

The related problem of accessing private localhost and LAN urls is not
addressed by this commit.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-16 11:57:50 -04:00
Joey Hess
b6e4ed9aa7
export: re-send lost exported files after fsck notices they're gone
When content has been lost from an export remote and  git-annex fsck --from
remote has noticed it's gone, re-running git-annex export or git-annex sync
--content will re-upload it.

Note that normally there's no way to remove a single file from an export.
doc/design/exporting_trees_to_special_remotes.mdwn talks about this
in the section "dropping from exports and copying to exports". But, if
a file is somehow deleted or corrupted on the export, and fsck notices
this, it will update the location log to say it's missing.

So, checking the location log when determining if a file needs to be sent
to the export will let such missing files be added back in. There's
otherwise no way to do so. It does not fall afoul of the races documented
in the abovementioned section, I think.

This commit was sponsored by Ryan Newton on Patreon.
2018-06-14 12:22:12 -04:00
Joey Hess
760f66829a
display p2pstdio stderr after auth
Display error messages that come from git-annex-shell when the p2p protocol
is used, so that diskreserve messages, IO errors, etc from the remote side
are visible again.

Felt like it should perhaps use outputError, so --json-error-messages would
include these, but as an async IO action, it can't, and this would need
MessageState to be converted to a tvar. Anyway, when not using p2pstdio,
that's not done; nor is it done for stderr from external special remotes
or other commands, so punted on the idea for now.

This commit was sponsored by mo on Patreon.
2018-06-12 14:59:05 -04:00
Joey Hess
90a3afb60f
adb: Android serial numbers are not all 16 characters long, so accept other lengths.
I can't find any documentation of how long it should be. Hard to imagine
it being shorter than 4 characters though, so put that in as a conservative
lower bound.

This commit was sponsored by Nick Piper on Patreon.
2018-06-12 13:56:01 -04:00
Joey Hess
c3c28f7617
add GETINFO to external protocol (for ronnypfa)
External special remotes can now add info to `git annex info $remote`, by
replying to the GETINFO message.

Had to generalize some helpers to allow consuming multiple messages from
the remote.

The code added to Remote/* here is AGPL licensed, thus changed the license
of the files.

This commit was sponsored by Jake Vosloo on Patreon.
2018-06-08 11:56:24 -04:00
Joey Hess
0f566ed242
removal of the rest of remoteGitConfig
In keyUrls, the GitConfig is used only by annexLocations
to support configured Differences. Since such configurations affect all
clones of a repository, the local repo's GitConfig must have the same
information as the remote's GitConfig would have. So, used getGitConfig
to get the local GitConfig, which is cached and so available cheaply.

That actually fixed a bug noone had ever noticed: keyUrls is
used for remotes accessed over http. The full git config of such a
remote is normally not available, so the remoteGitConfig that keyUrls
used would not have the necessary information in it.

In copyFromRemoteCheap', it uses gitAnnexLocation,
which does need the GitConfig of the remote repo itself in order to
check if it's crippled, supports symlinks, etc. So, made the
State include that GitConfig, cached. The use of gitAnnexLocation is
within a (not $ Git.repoIsUrl repo) guard, so it's local, and so
its git config will always be read and available.

(Note that gitAnnexLocation in turn calls annexLocations, so the
Differences config it uses in this case comes from the remote repo's
GitConfig and not from the local repo's GitConfig. As explained above
this is ok since they must have the same value.)

Not very happy with this mess of different GitConfigs not type-safe and
some read only sometimes etc. Very hairy. Think I got it this change
right. Test suite passes..

This commit was sponsored by Ethan Aubin.
2018-06-05 14:48:37 -04:00
Joey Hess
fc5888300f
fix annex-checkuuid
Fixed annex-checkuuid implementation, so that remotes configured that way
can be used. This was 100% broken from the first commit of it, oops.

This commit was sponsored by Øyvind Andersen Holm.
2018-06-04 16:52:22 -04:00
Joey Hess
2e6a6024c2
avoid unncessary version output differences in different contexts
Show operating system and repository version list when run outside
a git repo too.

Also made it only display the local repository version when in a git-annex
repo. Before it showed "unknown" when run in a git repo that was not
git-annex initialized. That seemed like confusing behavior.

This commit was sponsored by Jochen Bartl on Patreon.
2018-06-04 12:26:18 -04:00
Joey Hess
1c8ee99b46
Fix build with ghc 8.4+, which broke due to the Semigroup Monoid change
https://prime.haskell.org/wiki/Libraries/Proposals/SemigroupMonoid

I am not happy with the fragile pile of CPP boilerplate required to support
ghc back to 7.0, which git-annex still targets for both the android build
and the standalone build targeting old linux kernels. It makes me unlikely
to want to use Semigroup more in git-annex, because the benefit of the
abstraction is swamped by the ugliness. I actually considered ripping out
all the Semigroup instances, but some are needed to use
optparse-applicative.

The problem, I think, is they made this transaction on too fast a timeline.
(Although ironically, work on it started in 2015 or earlier!)
In particular, Debian oldstable is not out of security support, and it's
not possible to follow the simpler workarounds documented on the wiki and
have it build on oldstable (because the semigroups package in it is too
old).

I have only tested this build with ghc 8.2.2, not the newer and older
versions that branches of the CPP support. So there could be typoes, we'll
see.

This commit was sponsored by Brock Spratlen on Patreon.
2018-05-30 12:28:43 -04:00
Joey Hess
33834140e6
releasing package git-annex version 6.20180529 2018-05-29 13:06:56 -04:00
Joey Hess
c3064edac9
setpresentkey: Added --batch support (for ronnypfa)
This commit was sponsored by Peter on Patreon.
2018-05-27 14:56:14 -04:00
Joey Hess
85f9360d9b
GIT_ANNEX_SHELL_APPENDONLY
Makes it allow writes, but not deletion of annexed content. Note that
securing pushes to the git repository is left up to the user.

This commit was sponsored by Jack Hill on Patreon.
2018-05-25 13:17:56 -04:00
Joey Hess
4b748970ad
reorder 2018-05-25 12:10:49 -04:00
Joey Hess
2da2ae0919
fix migration bug and make fsck warn
* migrate: Fix bug in migration between eg SHA256 and SHA256E,
  that caused the extension to be included in SHA256 keys,
  and omitted from SHA256E keys.
  (Bug introduced in version 6.20170214)
* migrate: Check for above bug when migrating from SHA256 to SHA256
  (and same for SHA1 to SHA1 etc), and remove the extension that should
  not be in the SHA256 key.
* fsck: Detect and warn when keys need an upgrade, either to fix up
  from the above migrate bug, or to add missing size information
  (a long ago transition), or because of a few other past key related
  bugs.

This commit was sponsored by Henrik Riomar on Patreon.
2018-05-23 14:07:51 -04:00
Joey Hess
caaedb2993
fix http-client gzip decompression bug
Prevent haskell http-client from decompressing gzip files, so downloads of
such files works the same as it used to with wget and curl.

Explicitly setting accept-encoding to "identity" is probably not needed,
but that's what wget sends (curl does not send the header), and since
http-client is trying to be excessively smart, it seems we need to set
hAcceptEncoding to something to prevent it from inserting its own,
and this seems better than some hack like "".

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-05-21 15:10:25 -04:00
Joey Hess
2fabd7cdb5
remove the older move --force, which never behaved as documented and seems useless
* move: --force was accidentially enabling two unrelated behaviors
  since 6.20180427. The older behavior, which has never been well
  documented and seems almost entirely useless, has been removed.
* copy: --force no longer does anything.

This commit was sponsored by Øyvind Andersen Holm.
2018-05-21 13:21:19 -04:00
Joey Hess
5204e1dd9d
Workaround for bug in an old version of cryptonite that broke https downloads, by using curl for downloads when git-annex is built with it.
This commit was supported by the NSF-funded DataLad project.
2018-05-20 14:12:37 -04:00
Joey Hess
442e607b0a
Don't allow entering a view with staged or unstaged changes.
In some cases, unstaged changes are safe, eg dotfiles in the top which
are not affected by a view. Or non-annexed files in general which would
prevent view branch checkout from proceeding. But in other cases,
particularly unstaged changes to annexed files, entering a view would wipe
out those changes! And so don't allow entering a view with any unstaged
changes.

Staged changes are not safe when entering a view, because the changes get
committed to the view branch, and so the user is unlikely to remember them
when they exit the view, and so will effectively lose them, even if they're
still present in the view branch.

Also, improved the git status parser, although the improvement turned out
to not really be needed.

This commit was sponsored by Eric Drechsel on Patreon.
2018-05-14 16:51:06 -04:00
Joey Hess
d7021d420f
reuse hashes of dotfiles/dirs/submodules when entering view
This fixes a crash when a git submodule has a name starting with a dot.
Such a submodule might contain dotfiles that are intended to be used when
inside the view (since a dot-directory that's not a submodule was already
preserved when entering a view). So, rather than eliminating the submodule
from the view, its git ls-files --stage hash is copied over into the view.

dotfiles/dirs have their git ls-files --stage hashes similarly copied over
to the view. This is more efficient and simpler than the old method,
and also won't break if git ever adds a new type of tree item, like was
done with submodules.

Since the content of dotfiles in the working tree is no longer hashed
when entering a view, when there are unstaged modifications, they are
not included in the view branch. Entering the view branch still works,
but git checkout shows "M .dotfile", and git diff will show the unstaged
changes. This seems like an improvement over the old behavior.

Also made Command.View not delete empty directories that are submodules
when entering a view, while still deleting other empty directories.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 15:35:20 -04:00
Joey Hess
0632c49c22
releasing package git-annex version 6.20180509 2018-05-09 16:20:43 -04:00
Joey Hess
db720f6a9c
Display error message when http download fails.
* Display error message when http download fails.

  There's nothing in the http-client library to nicely format a http
  exception, so in some cases it has to fall back to using show on it.
  Seems better than just saying "it failed" or only showing the http
  status code.

* Avoid forward retry when 0 bytes were received.

  forwardRetry was comparing Nothing to Just 0, and so thought there had
  been progress made when 0 bytes were received.

This commit was supported by the NSF-funded DataLad project.
2018-05-08 16:11:45 -04:00
Joey Hess
c0ffd02ac5
close almost all old Android app bug reports
The old git-annex Android app is now deprecated in favor of running
git-annex in termux. I suspect all or nearly all of these no longer apply.

This commit was sponsored by Jochen Bartl on Patreon.
2018-05-08 15:00:46 -04:00
Joey Hess
7dc28dc705
Support building with hinotify-0.3.10.
Kept backwards compat with old versions via a shim.

This commit was sponsored by mo on Patreon.
2018-05-08 14:43:06 -04:00
Joey Hess
2948f6d916
avoid uname -o on !linux and catch any exception from it
Fix bug in last release that prevented the webapp opening on non-Linux systems.

This commit was sponsored by Jake Vosloo on Patreon.
2018-05-08 14:06:19 -04:00
Joey Hess
71f450f677
use proot to support Android 8
runshell: Use proot when running on Android, to work around Android 8's
ill-advised seccomp filtering of system calls, including ones crucial for
reliable thread locking. (This will only work with termux's version of
proot.)

See https://github.com/termux/termux-packages/issues/420#issuecomment-386636938

This commit was sponsored by andrea rota.
2018-05-08 13:55:10 -04:00
Joey Hess
d1961e4498
back out incorrect IO interleaving change
Fix regression in last release that crashes when using --all or running
git-annex in a bare repository. May have also affected git-annex unused and
git-annex info.

Reversed the order of the (++) in Annex.Branch.files so --all will stream
lazily still when there are not a bunch of uncommitted journal files.
Added a todo to maybe improve this later.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-05-08 13:54:42 -04:00
Joey Hess
f98251c97c
releasing package git-annex version 6.20180427 2018-04-27 12:37:01 -04:00
Joey Hess
2fc768ce72
avoid git annex info remote buffering list of keys
This leaves git annex unused --from remote still using loggedKeysFor
and buffering more than ought to be necessary, but I can't see a way to
improve that.
2018-04-26 16:13:05 -04:00
Joey Hess
bea0ad220a
avoid --all buffering list of all keys
In Annex.Branch.branch, the (++) was killing laziness.
Rewrote so it streams lazily.

filterM also kills laziness, so made loggedKeys use a Unchecked type,
and check if the key is dead in the seek loop.

Note that loggedKeysFor still buffers, so git-annex info <remote> and
git-annex unused --from remote still use more memory than necessary.

Also removed some unused functions from Annex.Journal.
2018-04-26 16:00:20 -04:00
Joey Hess
bfa26661d1
import: Avoid buffering all filenames to be imported in memory.
Test case is 24 directories each containing files named 1..10000.
The concat and filterM destroyed what laziness there is in
dirContentsRecursive, making it buffer all the filenames. Memory
use was around 300 mb (possibly growing slightly as it progressed).
After this fix, memory use drops to a constant 59 mb.

Note that dirContentsRecursive still buffers the entire content of a
directory (not subdirectories) so this is still not optimal.
2018-04-26 12:06:12 -04:00
Joey Hess
b2accf9da1
Assistant: Fix installation of menus, icons, etc when run from within runshell.
runshell followed by git annex webapp didn't install that stuff, because
GIT_ANNEX_APP_BASE is not set. Running git-annex.linux/git-annex-webapp did
install that stuff, since that script set the env var. I noticed this with
the termux port whose instructions currently go that way.

Seems the right thing to do is to move the env var setting to runshell.
2018-04-25 17:58:00 -04:00
Joey Hess
de491ad20f
Termux:Boot integration
Assistant: Integrate with Termux:Boot, so when it's installed, the
assistant is autostarted on boot.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-04-25 15:31:25 -04:00
Joey Hess
118ed8f92b
runshell: hacks for termux; add tip
Added some tweaks to make git-annex work in termux on Android. The regular
arm standalone tarball now works in termux.

I guess the test for "$base/bin/git" is not really necessary, since it
tests for git-annex. Since that gets deleted on android, removed that test.

These are pretty hackish hacks, especially adding it to PATH. The goal is
to make it work well enough out of the box on Android.

This commit was sponsored by Eric Drechsel on Patreon.
2018-04-25 13:48:37 -04:00
Joey Hess
dd7ab91f97
runshell: Unset LD_PRELOAD
Preloaded libraries from the host system may not get along with the bundled
linker.

This was observed by users in termux:

ERROR: ld.so: object '/data/data/com.termux/files/usr/lib/libtermux-exec.so' from LD_PRELOAD cannot be preloaded (wrong ELF class:
ELFCLASS64): ignored.
Bad system call

But it could also affect more usual systems; the preloaded library might rely
on symbols from the host libc that are not available or have the wrong versions
in the bundled libc. Unsetting LD_PRELOAD entirely seems safest.
2018-04-25 13:40:48 -04:00
Joey Hess
6ea356034d
update 2018-04-22 13:58:18 -04:00
Joey Hess
aebf9e6dd5
Fix build with yesod 1.6.
Also avoid some depreaction warnings.
2018-04-22 13:56:35 -04:00
Joey Hess
89e1a05a8f
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.

It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.

This commit was supported by the NSF-funded DataLad project.
2018-04-16 16:21:21 -04:00
Joey Hess
64980db7d9
move: Avoid drops that make bad situations worse, but otherwise allow
See the big comment at the bottom of Command.Drop for the full details.

(The --safe/--unsafe options were never released.)

This commit was sponsored by Jake Vosloo on Patreon.
2018-04-13 14:36:43 -04:00
Joey Hess
af8546990d
move: --safe/--unsafe and potential drop race fix
move: Added --safe option, which makes move honor numcopies settings.
Also --unsafe enables the default behavior, anticipating that the
default may one day change.

This commit was sponsored by Ethan Aubin.
2018-04-09 16:20:10 -04:00
Joey Hess
ba8a3156ea
releasing package git-annex version 6.20180409 2018-04-09 13:24:45 -04:00
Joey Hess
c34152777b
Use http-conduit for url downloads by default, annex.web-options enables curl
* For url downloads, git-annex now defaults to using a http library,
  rather than wget or curl. But, if annex.web-options is set, it will
  use curl. To use the .netrc file, run:
    git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
  git-annex builds).

Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 17:36:20 -04:00
Joey Hess
36e6b8abbf
Fix resuming a download when using curl.
Noticed a bug; when using curl a workaround for its empty file behavior
overwrote the file content, so it never resumed and always started over.
2018-04-06 16:09:53 -04:00
Joey Hess
6cb5b7294f
info: Changed sorting of numcopies stats table, so it's ordered by the variance from the desired number of copies.
Compare these...

numcopies stats:
	numcopies -1: 1986
	numcopies +0: 1170
	numcopies -2: 769
	numcopies +1: 716
	numcopies -4: 696
	numcopies -3: 485
	numcopies -6: 230
	numcopies -5: 111
	numcopies -7: 91
	numcopies -9: 9

numcopies stats:
	numcopies +1: 716
	numcopies +0: 1170
	numcopies -1: 1986
	numcopies -2: 769
	numcopies -3: 485
	numcopies -4: 696
	numcopies -5: 111
	numcopies -6: 230
	numcopies -7: 91
	numcopies -9: 9

I feel that the former is a jumbled mess that doesn't tell much overall,
while the second shows pretty clearly that most files are within 1 degree
of the desired number of copies, with some outliers without enough.
2018-04-05 14:54:39 -04:00