deal with Amazon S3 breaking change for public=yes

* S3: Amazon S3 buckets created after April 2023 do not support ACLs,
  so public=yes cannot be used with them. Existing buckets configured
  with public=yes will keep working.
* S3: Allow setting publicurl=yes without public=yes, to support
  buckets that are configured with a Bucket Policy that allows public
  access.

Sponsored-by: Joshua Antonishen on Patreon
This commit is contained in:
Joey Hess 2023-07-21 13:48:49 -04:00
parent ddc7f36d53
commit 33ba537728
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
7 changed files with 105 additions and 23 deletions

View file

@ -155,4 +155,4 @@ git-annex: get: 1 failed
We use git-annex to share large datasets with the scientific community at https://github.com/spine-generic/data-multi-subject !
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,29 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2023-07-21T17:53:25Z"
content="""
This only affects new S3 buckets. Existing S3 buckets that were
created before April 2023 and were set up to allow public access should
keep working, including ACL settings when storing new files in them.
Per [Amazon's announcement](https://aws.amazon.com/about-aws/whats-new/2022/12/amazon-s3-automatically-enable-block-public-access-disable-access-control-lists-buckets-april-2023/),
"There is no change for existing buckets."
I've made `publicurl` orthogonal to `public`.
As for the idea of `HTTP HEAD` before trying to set the ACL,
the ACL is currently sent at past of the PutObject request. And
either there is not a way to change the ACL later, or the aws haskell library
is missing support for the API to do that.
While git-annex could HEAD without creds when publicyes=yes to verify that the
user has configured the bucket correctly, and at least warn about a
misconfiguration, that would add some overhead, and I guess if the user has not
configured the bucket correctly, they will notice in some other way eventually
and can fix its bucket policy after the fact. So I'm inclined not to do
that.
Instead I've simply depredated `public`, noting that it should not be set
on new buckets. The user will have to deal with setting up the Bucket
Policy themselves.
"""]]

View file

@ -37,3 +37,5 @@ upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
I work daily with git-annex and I never fail to be amazed by it. Thank you for your work!
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,43 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-07-21T17:04:19Z"
content="""
This only affects new S3 buckets. Existing S3 buckets that were
created before April 2023 and were set up to allow public access should
keep working, including ACL settings when storing new files in them.
Per Amazon's announcement, "There is no change for existing buckets."
So users who create new buckets will need to set `public=no`
(the default) and set a bucket policy instread. See
[this comment](https://git-annex.branchable.com/special_remotes/S3/#comment-fcfba0021592de4c1425d3bf3c9563d3)
for an example policy.
That comment also suggests:
* If public=yes, instead of trying to set an ACL, first try HEAD on the
newly uploaded object without using the AWS_ACCESS_KEY. Only if that
fails, fall over to trying to set an ACL. And if you get
AccessControlListNotSupported (i.e. the error due to
BucketOwnerEnforced), then give a warning that the bucket policy is not
configured for public access.
However, the ACL is currently sent at past of the PutObject request. And
either there is not a way to change the ACL later, or the aws haskell library
is missing support for the API to do that.
I think what needs to be done is discourage initializing new S3 remotes
with public=yes, since it won't work. (Assuming some other S3
implementation than Amazon doesn't keep on supporting ACLs.)
And allow setting publicurl=yes without public=yes, so users who create
new buckets and configure a bucket policy to allow public access can tell
git-annex it's set up that way, so it will download from the bucket w/o S3
credentials.
While git-annex could HEAD without creds when publicyes=yes to verify that
the user has configured the bucket correctly, that would add some overhead,
and I guess if the user has not configured the bucket correctly, they will
notice in some other way eventually and can fix its bucket policy after the
fact. So I'm inclined not to do that.
"""]]