git-annex/doc/bugs/http_remotes_ignore_annex.web-options_--netrc.mdwn
nick.guenther@e418ed3c763dff37995c2ed5da4232a7c6cee0a9 50cdf369b9 closing
2022-09-13 23:32:49 +00:00

294 lines
14 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

### Please describe the problem.
The documentation for `git config annex.web-options` says that I should be able to use it to set up HTTP credentials in a ~/.netrc file, but it doesn't work.
I have been given some repos that are password-protected, I want to be able to download them non-interactively in a CI system. I won't sit there typing in the password 500 times for 500 files, and ideally I don't want to even type it once.
`git` reads `~/.netrc` if it exists, and does so consistently enough that http://droneci.com/ has built that in as the default way it passes CI credentials to workers. It would be really great if `git-annex` did the same, and did it instead of spawning `curl`. When using an ssh remote, git and git-annex already share the same ssh credentials; it would be awesome if the same could be transparently true for http remotes as well :)
### What steps will reproduce the problem?
1. Set up an HTTP server following https://git-annex.branchable.com/tips/setup_a_public_repository_on_a_web_site/, but password-protect it.
I set up my server on Arch, but I tested the client from both Arch and Ubuntu. Here's the server set up; it should adapt to Debian or Fedora easily enough:
1. `sudo pacman -S --noconfirm apache`
2. `echo 'Include conf/extra/git-annex.conf' | sudo tee -a /etc/httpd/conf/httpd.conf`
3. `sudo mkdir -p /srv/http/annex && sudo chown -R http:http /srv/http/annex`
4. ```
cat <<EOF | sudo tee /etc/httpd/conf/extra/git-annex.conf
DocumentRoot "/srv/http/annex"
<Directory "/srv/http/annex">
AllowOverride All
Options FollowSymlinks Indexes
Require all granted
</Directory>
```
5. Set up a repo:
1. Switch to `http`: `sudo -u http bash`; `cd /srv/http/annex`
2. `git config user.name httpd; git config user.email httpd@httpd`
3. `git init; git annex init`
4. `git config core.sharedrepository world; git config receive.denyCurrentBranch updateInstead`
5. `mv .git/hooks/post-update.sample .git/hooks/post-update`
6. `echo Hello > README.md && git add README.md && git commit -m "README.md"`
7. `dd if=/dev/urandom of=large.bin bs=1M count=1 && git annex add large.bin && git add large.bin && git commit -m "large.bin"`
5. (optional): verify the repo is functional:
1. `git clone http://localhost/.git annex-test; cd annex-test`
2. `git config annex.security.allowed-ip-addresses all`
3. `sha256sum large.bin` should fail
4. `git annex get`
3. `sha256sum large.bin` should succeed, and match the value shown in the symlink in `ls -l large.bin`
6. Password protect the repo
While still in `/srv/http/annex`:
1. ```
cat <<EOF | tee .htaccess
AuthType Basic
AuthName gitannex
AuthUserFile /srv/http/annex/.htpasswd
Require valid-user
```
2. `htpasswd -bc .htpasswd user4 password`
2. Download the password-protected repo
0. If the test server is on the same machine: `git config --global annex.security.allowed-ip-addresses all`
1. Download the repo without any password helper: 🫤🫤🫤
1. `git clone http://localhost/.git annex-test; cd annex-test`; this will prompt for the password set above, e.g.
```
$ git clone http://localhost/.git annex-test
Cloning into 'annex-test'...
Username for 'http://localhost': user4
Password for 'http://user4@localhost':
```
2. `git annex get`; this will prompt for the password **twice**: once for the implicit `git annex init` (that needs to read the remote `.git/config`) and once for downloading large.bin.
Running `pstree` while the prompts are waiting, or using `git config annex.debug true`, reveals that the prompts are coming from [`git credential fill`](https://git-scm.com/docs/git-credential).
8. Drop the annoying redundant password prompts using [git-credential-store(1)](https://git-scm.com/docs/git-credential-store):
1. `cd $(mktemp -d)`
2. `git config --global credential.helper store`
3. `git clone http://localhost/.git annex-test; cd annex-test`; this will prompt for the password
4. `git annex get`; but this will not prompt for any passwords
This works. So that's awesome, I can use `credential.helper store` to make my passworded downloads non-interactive by filling in `~/.git-credentials`, which, for the record, has one credential per line in this format:
```
$ cat ~/.git-credentials
http://user4:password@localhost
```
or if a non-standard port is involved:
```
$ cat ~/.git-credentials
http://user4:password@localhost%3a8080
```
5. (Undo: `git config --global --unset credential.helper` to avoid contaminating the next test)
9. Attempt to drop the redundant password prompts using `annex.web-options`: ❌❌❌
1. `cd $(mktemp -d)`
2. `git config --global annex.web-options --netrc`
3. Set up a `~/.netrc`:
```
cat <<EOF | tee -a ~/.netrc
machine localhost
login user4
password password localhost
```
4. (optional) verify it works as expected with directly through curl:
```
$ curl --no-progress-meter -f -o /dev/null http://localhost/.git; echo $? # fails
curl: (22) The requested URL returned error: 401
22
$ curl --netrc --no-progress-meter -f -o /dev/null http://localhost/.git; echo $? # works
0
```
5. `git clone http://localhost/.git annex-test; cd annex-test`; this will **not** prompt for a password because `git` picks up `~/.netrc` automatically.
6. `git annex get`; this **will** prompt for passwords, n+1 times in fact for n=the number of annexed files
I don't understand why this isn't working. The docs say
> Setting this option makes git-annex use curl, but only when annex.security.allowed-ip-addresses is configured in a specific way.
and I set `allowed-ip-addressess` in the specific way, so why is this no bueno?
I've searched the wiki and all I've found is:
* https://git-annex.branchable.com/news/security_fix_release/
* https://git-annex.branchable.com/devblog/day_494__url_download_changes/
* https://git-annex.branchable.com/forum/Use_addurl_with_a_file_on_an_HPC_cluster/
From these, I understand I need to `git config --global annex.security.allowed-ip-addresses all`, which I did, but otherwise my best guess is that `web-options` only works when [using the web as as _special remote_](https://git-annex.branchable.com/tips/using_the_web_as_a_special_remote/) with `addurl`. But here I'm using the web as a _regular remote_, something which [git-annex has support for](https://git-annex.branchable.com/tips/setup_a_public_repository_on_a_web_site/). But seemingly this corner case isn't working.
I can work around it by rewriting the contents of `~/.netrc` into `~/.git-credentials` and setting `git config --global credential.helper store`, but I don't want to duplicate the credentials every time I'm in this situation.
### What version of git-annex are you using? On what operating system?
git-annex 10.20220504-g4e4c44ed8 on ArchLinux, and git-annex 8.20210223 on Ubuntu 22.04.
### Please provide any additional information below.
[[!format sh """
[kousu@nigiri tmp.ztnHTYA3ZC]$ cd $(mktemp -d)
[kousu@nigiri tmp.H5EkrNMUPc]$ git config --global annex.security.allowed-ip-addresses all
[kousu@nigiri tmp.H5EkrNMUPc]$ git config --global annex.web-options --netrc
[kousu@nigiri tmp.H5EkrNMUPc]$ cat <<EOF | tee ~/.netrc
machine localhost
login user4
password password
EOF
machine localhost
login user4
password password
[kousu@nigiri tmp.H5EkrNMUPc]$
[kousu@nigiri tmp.H5EkrNMUPc]$ # demonstrate that curl respects --netrc and behaves as expected:
[kousu@nigiri tmp.H5EkrNMUPc]$ curl -v -o /dev/null -f --no-progress-meter http://localhost:80
* Trying 127.0.0.1:80...
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.84.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 401 Unauthorized
< Date: Thu, 08 Sep 2022 00:49:44 GMT
< Server: Apache/2.4.54 (Unix)
< WWW-Authenticate: Basic realm="gitannex"
< Vary: accept-language,accept-charset
< Accept-Ranges: bytes
< Transfer-Encoding: chunked
< Content-Type: text/html; charset=utf-8
< Content-Language: en
* The requested URL returned error: 401
* Closing connection 0
curl: (22) The requested URL returned error: 401
[kousu@nigiri tmp.H5EkrNMUPc]$ curl --netrc -v -o /dev/null -f --no-progress-meter http://localhost:80
* Trying 127.0.0.1:80...
* Connected to localhost (127.0.0.1) port 80 (#0)
* Server auth using Basic with user 'user4'
> GET / HTTP/1.1
> Host: localhost
> Authorization: Basic dXNlcjQ6cGFzc3dvcmQ=
> User-Agent: curl/7.84.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Thu, 08 Sep 2022 00:49:41 GMT
< Server: Apache/2.4.54 (Unix)
< Content-Length: 693
< Content-Type: text/html;charset=ISO-8859-1
<
{ [693 bytes data]
* Connection #0 to host localhost left intact
[kousu@nigiri tmp.H5EkrNMUPc]$
[kousu@nigiri tmp.H5EkrNMUPc]$ # demonstrate git respects .netrc:
[kousu@nigiri tmp.H5EkrNMUPc]$ git clone http://localhost/.git annex-test
Cloning into 'annex-test'...
[kousu@nigiri tmp.H5EkrNMUPc]$ cd annex-test/
[kousu@nigiri tmp.H5EkrNMUPc]$
[kousu@nigiri tmp.H5EkrNMUPc]$ # demonstrate that git-annex *does not* respect .netrc
[kousu@nigiri annex-test]$ git annex get
Username for 'http://localhost': ^C
[kousu@nigiri tmp.H5EkrNMUPc]$
[kousu@nigiri tmp.H5EkrNMUPc]$
[kousu@nigiri tmp.H5EkrNMUPc]$
[kousu@nigiri annex-test]$ git annex version
git-annex version: 10.20220504-g4e4c44ed8
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.30 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.11 persistent-sqlite-2.13.0.3 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 8
[kousu@nigiri annex-test]$ cat /etc/os-release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
LOGO=archlinux-logo
"""]]
With the older git-annex, I set up a proxy so I could reuse the same server, which changed the port, but otherwise everything else is the same:
[[!format sh """
$ ssh -R 8080:localhost:80 joplin
p115628@joplin:~$ cd $(mktemp -d)
p115628@joplin:/tmp/tmp.glF9EdYhnR$ git config --global annex.security.allowed-ip-addresses all
p115628@joplin:/tmp/tmp.glF9EdYhnR$ git config --global annex.web-options "--netrc"
p115628@joplin:/tmp/tmp.glF9EdYhnR$ git clone http://localhost:8080/.git annex-test # verify it's password protected
Cloning into 'annex-test'...
Username for 'http://localhost:8080': ^C
p115628@joplin:/tmp/tmp.glF9EdYhnR$ cat <<EOF | tee ~/.netrc
machine localhost
login user4
password password
EOF
machine localhost
login user4
password password
p115628@joplin:/tmp/tmp.glF9EdYhnR$ git clone http://localhost:8080/.git annex-test # verify the .netrc file works with git
Cloning into 'annex-test'...
p115628@joplin:/tmp/tmp.glF9EdYhnR$ cd annex-test/
p115628@joplin:/tmp/tmp.glF9EdYhnR/annex-test$ git annex get # but does not work with git-annex
(merging origin/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
Username for 'http://localhost:8080': ^C
p115628@joplin:/tmp/tmp.glF9EdYhnR/annex-test$
p115628@joplin:/tmp/tmp.glF9EdYhnR/annex-test$
p115628@joplin:/tmp/tmp.glF9EdYhnR/annex-test$
p115628@joplin:/tmp/tmp.glF9EdYhnR/annex-test$ git annex version
git-annex version: 8.20210223
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.4 feed-1.3.0.1 ghc-8.8.4 http-client-0.6.4.1 persistent-sqlite-2.10.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
local repository version: 8
p115628@joplin:/tmp/tmp.glF9EdYhnR/annex-test$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Sure! Lots! We use it to share a large open access dataset at https://github.com/spine-generic, and [I'm working on](https://github.com/neuropoly/gitea/pull/1) helping other researchers share their datasets on their own infrastructure using git-annex + gitea.
[[done]]