S3 export finalization
Fixed ACL issue, and updated some documentation.
This commit is contained in:
parent
44cd5ae313
commit
650d0955a0
10 changed files with 120 additions and 80 deletions
|
@ -357,14 +357,16 @@ checkPresentExportS3 r info _k loc =
|
|||
go = withS3Handle (config r) (gitconfig r) (uuid r) $ \h -> do
|
||||
checkKeyHelper info h (T.pack $ bucketExportLocation info loc)
|
||||
|
||||
-- S3 has no move primitive; copy and delete.
|
||||
renameExportS3 :: Remote -> S3Info -> Key -> ExportLocation -> ExportLocation -> Annex Bool
|
||||
renameExportS3 r info _k src dest = catchNonAsync go (\e -> warning (show e) >> return False)
|
||||
where
|
||||
go = withS3Handle (config r) (gitconfig r) (uuid r) $ \h -> do
|
||||
-- S3 has no move primitive; copy and delete.
|
||||
void $ sendS3Handle h $ S3.copyObject (bucket info) dstobject
|
||||
let co = S3.copyObject (bucket info) dstobject
|
||||
(S3.ObjectId (bucket info) srcobject Nothing)
|
||||
S3.CopyMetadata
|
||||
-- ACL is not preserved by copy.
|
||||
void $ sendS3Handle h $ co { S3.coAcl = acl info }
|
||||
void $ sendS3Handle h $ S3.DeleteObject srcobject (bucket info)
|
||||
return True
|
||||
srcobject = T.pack $ bucketExportLocation info src
|
||||
|
|
|
@ -2,6 +2,9 @@ Here's how to create a Amazon [[S3 special remote|special_remotes/S3]] that
|
|||
can be read by anyone who gets a clone of your git-annex repository,
|
||||
without them needing Amazon AWS credentials.
|
||||
|
||||
If you want to publish files to S3 so they can be accessed without using
|
||||
git-annex, see [[publishing_your_files_to_the_public]].
|
||||
|
||||
Note: Bear in mind that Amazon will charge the owner of the bucket
|
||||
for public downloads from that bucket.
|
||||
|
||||
|
@ -52,6 +55,3 @@ who are not using git-annex. To find the url, use `git annex whereis`.
|
|||
----
|
||||
|
||||
See [[special_remotes/S3]] for details about configuring S3 remotes.
|
||||
|
||||
See [[publishing_your_files_to_the_public]] for other ways to use a public
|
||||
S3 bucket.
|
||||
|
|
|
@ -1,88 +1,39 @@
|
|||
# Creating a special S3 remote to hold files shareable by URL
|
||||
|
||||
(In this example, I'll assume you'll be creating a bucket in S3 named **public-annex** and a special remote in git-annex, which will store its files in the previous bucket, named **public-s3**, but change these names if you are going to do the thing for real)
|
||||
In this example, I'll assume you'll be creating a bucket in Amazon S3 named
|
||||
$BUCKET and a special remote named public-s3. Be sure to replace $BUCKET
|
||||
with something like "public-bucket-joey" when you follow along in your
|
||||
shell.
|
||||
|
||||
Set up your special [S3](http://git-annex.branchable.com/special_remotes/S3/) remote with (at least) these options:
|
||||
Set up your special [[S3 remote|special_remotes/S3]] with (at least) these options:
|
||||
|
||||
git annex initremote public-s3 type=s3 encryption=none bucket=public-annex chunk=0 public=yes
|
||||
git annex initremote public-s3 type=s3 encryption=none bucket=$BUCKET exporttree=yes public=yes encryption=none
|
||||
|
||||
This way git-annex will upload the files to this repo, (when you call `git
|
||||
annex copy [FILES...] --to public-s3`) without encrypting them and without
|
||||
chunking them. And, thanks to the public=yes, they will be
|
||||
accessible by anyone with the link.
|
||||
Then export the files in the master branch to the remote:
|
||||
|
||||
(Note that public=yes was added in git-annex version 5.20150605.
|
||||
If you have an older version, it will be silently ignored, and you
|
||||
will instead need to use the AWS dashboard to configure a public get policy
|
||||
for the bucket.)
|
||||
git annex export master --to public-s3
|
||||
|
||||
Following the example, the files will be accessible at `http://public-annex.s3.amazonaws.com/KEY` where `KEY` is the file key created by git-annex and which you can discover running
|
||||
You can run that command again to update the export. See
|
||||
[[git-annex-export]] for details.
|
||||
|
||||
git annex lookupkey FILEPATH
|
||||
Each exported file will be available to the public from
|
||||
`http://$BUCKET.s3.amazonaws.com/$file`
|
||||
|
||||
This way you can share a link to each file you have at your S3 remote.
|
||||
Note: Bear in mind that Amazon will charge the owner of the bucket
|
||||
for public downloads from that bucket.
|
||||
|
||||
## Sharing all links in a folder
|
||||
# Indexes
|
||||
|
||||
To share all the links in a given folder, for example, you can go to that folder and run (this is an example with the _fish_ shell, but I'm sure you can do the same in _bash_, I just don't know exactly):
|
||||
By default, there is no index.ntml file exported, so if you open
|
||||
`http://$BUCKET.s3.amazonaws.com/` in a web browser, you'll see an
|
||||
XML document listing the files.
|
||||
|
||||
for filename in (ls)
|
||||
echo $filename": https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename)
|
||||
end
|
||||
For a nicer list of files, you can make an index.html file, check it into
|
||||
git, and export it to the bucket. You'll need to configure the bucket to
|
||||
use index.html as its index document, as
|
||||
[explained here](https://stackoverflow.com/questions/27899/is-there-a-way-to-have-index-html-functionality-with-content-hosted-on-s3).
|
||||
|
||||
## Sharing all links matching certain metadata
|
||||
# Old method
|
||||
|
||||
The same applies to all the filters you can do with git-annex.
|
||||
|
||||
For example, let's share links to all the files whose _author_'s name starts with "Mario" and are, in fact, stored at your public-s3 remote.
|
||||
However, instead of just a list of links we will output a markdown-formatted list of the filenames linked to their S3 urls:
|
||||
|
||||
for filename in (git annex find --metadata "author=Mario*" --and --in public-s3)
|
||||
echo "* ["$filename"](https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename)")"
|
||||
end
|
||||
|
||||
Very useful.
|
||||
|
||||
## Sharing links with time-limited URLs
|
||||
|
||||
By using pre-signed URLs it is possible to create limits on how long a URL is valid for retrieving an object.
|
||||
To enable use a private S3 bucket for the remotes and then pre-sign actual URL with the script in [AWS-Tools](https://github.com/gdbtek/aws-tools).
|
||||
Example:
|
||||
|
||||
key=`git annex lookupkey "$fname"`; sign_s3_url.bash --region 'eu-west-1' --bucket 'mybuck' --file-path $key --aws-access-key-id XX --aws-secret-access-key XX --method 'GET' --minute-expire 10
|
||||
|
||||
## Adding the S3 URL as a source
|
||||
|
||||
Assuming all files in the current directory are available on S3, this will register the public S3 url for the file in git-annex, making it available for everyone *through git-annex*:
|
||||
|
||||
<pre>
|
||||
git annex find --in public-s3 | while read file ; do
|
||||
key=$(git annex lookupkey $file)
|
||||
echo $key https://public-annex.s3.amazonaws.com/$key
|
||||
done | git annex registerurl
|
||||
</pre>
|
||||
|
||||
`registerurl` was introduced in `5.20150317`.
|
||||
|
||||
## Manually configuring a public get policy
|
||||
|
||||
Here is how to manually configure a public get policy
|
||||
for a bucket, in the AWS dashboard.
|
||||
|
||||
{
|
||||
"Version": "2008-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Sid": "AllowPublicRead",
|
||||
"Effect": "Allow",
|
||||
"Principal": {
|
||||
"AWS": "*"
|
||||
},
|
||||
"Action": "s3:GetObject",
|
||||
"Resource": "arn:aws:s3:::public-annex/*"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
This should not be necessary if using a new enough version
|
||||
of git-annex, which can instead be configured with public=yet.
|
||||
To use `git annex export`, you need git-annex version 6.20170909 or
|
||||
newer. Before we had `git annex export` an [[old_method]] was used instead.
|
||||
|
|
88
doc/tips/publishing_your_files_to_the_public/old_method.mdwn
Normal file
88
doc/tips/publishing_your_files_to_the_public/old_method.mdwn
Normal file
|
@ -0,0 +1,88 @@
|
|||
# Creating a special S3 remote to hold files shareable by URL
|
||||
|
||||
(In this example, I'll assume you'll be creating a bucket in S3 named **public-annex** and a special remote in git-annex, which will store its files in the previous bucket, named **public-s3**, but change these names if you are going to do the thing for real)
|
||||
|
||||
Set up your special [S3](http://git-annex.branchable.com/special_remotes/S3/) remote with (at least) these options:
|
||||
|
||||
git annex initremote public-s3 type=s3 encryption=none bucket=public-annex chunk=0 public=yes
|
||||
|
||||
This way git-annex will upload the files to this repo, (when you call `git
|
||||
annex copy [FILES...] --to public-s3`) without encrypting them and without
|
||||
chunking them. And, thanks to the public=yes, they will be
|
||||
accessible by anyone with the link.
|
||||
|
||||
(Note that public=yes was added in git-annex version 5.20150605.
|
||||
If you have an older version, it will be silently ignored, and you
|
||||
will instead need to use the AWS dashboard to configure a public get policy
|
||||
for the bucket.)
|
||||
|
||||
Following the example, the files will be accessible at `http://public-annex.s3.amazonaws.com/KEY` where `KEY` is the file key created by git-annex and which you can discover running
|
||||
|
||||
git annex lookupkey FILEPATH
|
||||
|
||||
This way you can share a link to each file you have at your S3 remote.
|
||||
|
||||
## Sharing all links in a folder
|
||||
|
||||
To share all the links in a given folder, for example, you can go to that folder and run (this is an example with the _fish_ shell, but I'm sure you can do the same in _bash_, I just don't know exactly):
|
||||
|
||||
for filename in (ls)
|
||||
echo $filename": https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename)
|
||||
end
|
||||
|
||||
## Sharing all links matching certain metadata
|
||||
|
||||
The same applies to all the filters you can do with git-annex.
|
||||
|
||||
For example, let's share links to all the files whose _author_'s name starts with "Mario" and are, in fact, stored at your public-s3 remote.
|
||||
However, instead of just a list of links we will output a markdown-formatted list of the filenames linked to their S3 urls:
|
||||
|
||||
for filename in (git annex find --metadata "author=Mario*" --and --in public-s3)
|
||||
echo "* ["$filename"](https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename)")"
|
||||
end
|
||||
|
||||
Very useful.
|
||||
|
||||
## Sharing links with time-limited URLs
|
||||
|
||||
By using pre-signed URLs it is possible to create limits on how long a URL is valid for retrieving an object.
|
||||
To enable use a private S3 bucket for the remotes and then pre-sign actual URL with the script in [AWS-Tools](https://github.com/gdbtek/aws-tools).
|
||||
Example:
|
||||
|
||||
key=`git annex lookupkey "$fname"`; sign_s3_url.bash --region 'eu-west-1' --bucket 'mybuck' --file-path $key --aws-access-key-id XX --aws-secret-access-key XX --method 'GET' --minute-expire 10
|
||||
|
||||
## Adding the S3 URL as a source
|
||||
|
||||
Assuming all files in the current directory are available on S3, this will register the public S3 url for the file in git-annex, making it available for everyone *through git-annex*:
|
||||
|
||||
<pre>
|
||||
git annex find --in public-s3 | while read file ; do
|
||||
key=$(git annex lookupkey $file)
|
||||
echo $key https://public-annex.s3.amazonaws.com/$key
|
||||
done | git annex registerurl
|
||||
</pre>
|
||||
|
||||
`registerurl` was introduced in `5.20150317`.
|
||||
|
||||
## Manually configuring a public get policy
|
||||
|
||||
Here is how to manually configure a public get policy
|
||||
for a bucket, in the AWS dashboard.
|
||||
|
||||
{
|
||||
"Version": "2008-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Sid": "AllowPublicRead",
|
||||
"Effect": "Allow",
|
||||
"Principal": {
|
||||
"AWS": "*"
|
||||
},
|
||||
"Action": "s3:GetObject",
|
||||
"Resource": "arn:aws:s3:::public-annex/*"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
This should not be necessary if using a new enough version
|
||||
of git-annex, which can instead be configured with public=yet.
|
|
@ -29,7 +29,6 @@ Work is in progress. Todo list:
|
|||
Would need git-annex sync to export to the master tree?
|
||||
This is similar to the little-used preferreddir= preferred content
|
||||
setting and the "public" repository group.
|
||||
* Test S3 export.
|
||||
* Test export to IA via S3. In particualar, does removing an exported file
|
||||
work?
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue