Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2022-07-15 15:07:05 -04:00
commit ee8acd5b5d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 167 additions and 0 deletions

View file

@ -0,0 +1,158 @@
### Please describe the problem.
Amazon has [deprecated ACLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/about-object-ownership.html)
> A majority of modern use cases in Amazon S3 no longer require the use of ACLs, and we recommend that you disable ACLs except in unusual circumstances where you need to control access for each object individually. With Object Ownership, you can disable ACLs and rely on policies for access control. When you disable ACLs, you can easily maintain a bucket with objects uploaded by different AWS accounts. You, as the bucket owner, own all the objects in the bucket and can manage access to them using policies.
They are encouraging everyone to [migrate to bucket policies](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-ownership-migrating-acls-prerequisites.html) instead.
But if such a bucket is public (meaning: write needs credentials, reads are open to the world):
* `public=yes` causes `git annex copy --to` to always set an ACL, which fails, which fails the entire upload
* But setting `public=no` causes `publicurl` to be ignored by `git annex copy --from`, failing the download
#### Feature Request
* If `public=yes`, instead of trying to set an ACL, first try `HTTP HEAD` on the newly uploaded object without using the AWS credentials. Only if that fails, fall over to trying to set an ACL using credential. And if you get AccessControlListNotSupported (i.e. the error due to BucketOwnerEnforced), then give a warning that the bucket policy is not configured for public access.
* Make `publicurl` orthogonal to `public`: if set, `git annex copy --from` should _always_ use it unconditionally.
* Update [the docs here](https://git-annex.branchable.com/special_remotes/S3/) to explain how to set up a public bucket policy as recommended by Amazon, and that `public=yes` will either try to confirm that the bucket policy is public, or will fallback to using (legacy) ACLs.
### What steps will reproduce the problem?
In a bucket I run, I reset the ACLs on that bucket to Amazon's default permissions:
* Bucket owner (your AWS account):
* Objects:
* List
* Write
* Bucket ACL (i.e. what ACLs are applied by default to all objects):
* Read
* Write
and with that set Amazon let me also set
> Object ownership: Bucket owner enforced
This should be the **default configuration** for any new bucket created now, so you only need to do the above if you're migrating an existing bucket like I was; for reproducing, just creating an empty bucket should be enough.
It should look like this:
[[!img S3_bucket.png align="right" size="" alt=""]]
With that in place, I wrote and attached this Bucket Policy to make it public:
```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::BUCKET-NAME",
"arn:aws:s3:::BUCKET-NAME/*"
]
}
]
}
```
[[!img S3_bucket-perms.png align="right" size="" alt=""]]
I told `git-annex` about it with
```
git annex initremote amazon type=S3 bucket=BUCKET_NAME public=yes publicurl=https://BUCKET_NAME.s3.amazonaws.com autoenable=true
```
But, attempting to use the new setup failed:
```
$ git annex copy --to amazon what.nii.gz
copy what.nii.gz (checking amazon...) (to amazon...)
41% 8.15 MiB 20 MiB/s 0s
S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "a6+ieujj4z3Z4P8ooA306DdbGAoxWDiXd6O2ZwjdfapGnuOGPyL5/WQ4UBEytR80FG+5b6xdlsM=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
32% 6.43 MiB 16 MiB/s 0s
S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "bFOgMomROCOes9yI6HZHysQGoZaTbsPI5b7rHjcTI0wA8Yx5Dm1JOky9BvXvpcXxzY1kVt48FRQ=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
37% 7.37 MiB 21 MiB/s 0s
S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "hqd4HRNk5yp3tKJ6yMhcECEpCjBw8qB6oTpKF3PaOsYFeVG0C+dGI06xq3zgmvnPoFUttI040sY=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
39% 7.81 MiB 21 MiB/s 0s
S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "7m7wwG5woSPmICIuXr9QnBOEjUikuyzHSebMLuaNyZMc2Ki2vaqKpU9U+GOTYmR/NzFjOeyxngk=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed
git-annex: copy: 1 failed
```
### What version of git-annex are you using? On what operating system?
```
$ git annex version
git-annex version: 8.20210223
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.4 feed-1.3.0.1 ghc-8.8.4 http-client-0.6.4.1 persistent-sqlite-2.10.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
local repository version: 8
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
```
### Please provide any additional information below.
Disabling `public` allows uploads:
```
$ git annex enableremote amazon public=no
enableremote amazon ok
(recording state in git...)
$ git annex copy --to amazon what.nii.gz
copy what.nii.gz (checking amazon...) (to amazon...)
ok
```
But then causes the new problem that downloads fail when other users try to download the updated dataset:
```
$ git annex get what.nii.gz
get what.nii.gz (from amazon...)
S3 bucket does not allow public access; Set both AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to use S3
cannot download content
Unable to access these remotes: amazon
(Note that these git remotes have annex-ignore set: origin)
failed
git-annex: get: 1 failed
```
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
We use git-annex to share large datasets with the scientific community at https://github.com/spine-generic/data-multi-subject !

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="nick.guenther@e418ed3c763dff37995c2ed5da4232a7c6cee0a9"
nickname="nick.guenther"
avatar="http://cdn.libravatar.org/avatar/9e85c6ca61c3f877fef4f91c2bf6e278"
subject="comment 38"
date="2022-07-15T15:55:38Z"
content="""
Thanks for taking the time to direct me, Joey. I usually find myself getting lost in this wiki amongst all the old notes and worklogs and documentation about old features, so sometimes someone coming along with a sign post is a very kind help :)
"""]]