git-annex/doc/special_remotes/web.mdwn

43 lines
1.9 KiB
Text
Raw Normal View History

git-annex can use the web as a special remote, associating an url with an
annexed file, and downloading the file content from the web.
2011-11-08 16:16:02 +00:00
See [[tips/using_the_web_as_a_special_remote]] for usage examples.
2011-07-01 20:01:26 +00:00
The web special remote is always enabled, without any manual setup being
needed. Its name is "web".
2011-07-01 20:01:26 +00:00
This special remote can only be used for downloading content,
not uploading content, or removing content from the web.
limit url downloads to whitelisted schemes Security fix! Allowing any schemes, particularly file: and possibly others like scp: allowed file exfiltration by anyone who had write access to the git repository, since they could add an annexed file using such an url, or using an url that redirected to such an url, and wait for the victim to get it into their repository and send them a copy. * Added annex.security.allowed-url-schemes setting, which defaults to only allowing http and https URLs. Note especially that file:/ is no longer enabled by default. * Removed annex.web-download-command, since its interface does not allow supporting annex.security.allowed-url-schemes across redirects. If you used this setting, you may want to instead use annex.web-options to pass options to curl. With annex.web-download-command removed, nearly all url accesses in git-annex are made via Utility.Url via http-client or curl. http-client only supports http and https, so no problem there. (Disabling one and not the other is not implemented.) Used curl --proto to limit the allowed url schemes. Note that this will cause git annex fsck --from web to mark files using a disallowed url scheme as not being present in the web. That seems acceptable; fsck --from web also does that when a web server is not available. youtube-dl already disabled file: itself (probably for similar reasons). The scheme check was also added to youtube-dl urls for completeness, although that check won't catch any redirects it might follow. But youtube-dl goes off and does its own thing with other protocols anyway, so that's fine. Special remotes that support other domain-specific url schemes are not affected by this change. In the bittorrent remote, aria2c can still download magnet: links. The download of the .torrent file is otherwise now limited by annex.security.allowed-url-schemes. This does not address any external special remotes that might download an url themselves. Current thinking is all external special remotes will need to be audited for this problem, although many of them will use http libraries that only support http and not curl's menagarie. The related problem of accessing private localhost and LAN urls is not addressed by this commit. This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-15 20:52:24 +00:00
This special remote uses urls on the web as the source for content.
There are several other ways http can be used to download annexed objects,
including a git remote accessible by http, S3 with a `publicurl` configured,
and the [[httpalso]] special remote.
## configuration
These parameters can be passed to `git annex initremote` or
`git-annex enableremote` to configure a web remote:
* `urlinclude` - Only use urls that match the specified glob.
For example, `urlinclude="https://s3.amazonaws.com/*"`
* `urlexclude` - Don't use urls that match the specified glob.
For example, to prohibit http urls, but allow https,
use `urlexclude="http:*"`
Globs are matched case-insensitively.
When there are multiple special remotes of type web, and some are not
configured with `urlinclude` and/or `urlexclude`, those will avoid using
urls that are matched by the configuration of other web remotes.
For example, this creates a second web special remote named "slowweb" that
is only used for urls on one host, and that has a higher cost than the
"web" special remote. With this configuration, `git-annex get` will first
try to get the file from the "web" special remote, which will avoid
using any urls that match slowweb's urlinclude. Only if the content
can't be downloaded from "web" (or some other remote) will it fall back
to downloading from slowweb.
git-annex initremote --sameas=web slowweb type=web urlinclude='*//slowhost.com/*'
git config remote.slowweb.cost 300