S3 versioning=yes config

Not yet used.

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2018-08-30 13:45:28 -04:00
parent 358178fbfb
commit 0ff5a41311
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 57 additions and 10 deletions

View file

@ -559,6 +559,7 @@ data S3Info = S3Info
, metaHeaders :: [(T.Text, T.Text)] , metaHeaders :: [(T.Text, T.Text)]
, partSize :: Maybe Integer , partSize :: Maybe Integer
, isIA :: Bool , isIA :: Bool
, versioning :: Bool
, public :: Bool , public :: Bool
, getpublicurl :: Maybe (BucketObject -> URLString) , getpublicurl :: Maybe (BucketObject -> URLString)
} }
@ -577,9 +578,8 @@ extractS3Info c = do
, metaHeaders = getMetaHeaders c , metaHeaders = getMetaHeaders c
, partSize = getPartSize c , partSize = getPartSize c
, isIA = configIA c , isIA = configIA c
, public = case M.lookup "public" c of , versioning = boolcfg "versioning"
Just "yes" -> True , public = boolcfg "public"
_ -> False
, getpublicurl = case M.lookup "publicurl" c of , getpublicurl = case M.lookup "publicurl" c of
Just u -> Just $ \p -> genericPublicUrl p u Just u -> Just $ \p -> genericPublicUrl p u
Nothing -> case M.lookup "host" c of Nothing -> case M.lookup "host" c of
@ -591,6 +591,10 @@ extractS3Info c = do
_ -> Nothing _ -> Nothing
} }
return info return info
where
boolcfg k = case M.lookup k c of
Just "yes" -> True
_ -> False
putObject :: S3Info -> T.Text -> RequestBody -> S3.PutObject putObject :: S3Info -> T.Text -> RequestBody -> S3.PutObject
putObject info file rbody = (S3.putObject (bucket info) file rbody) putObject info file rbody = (S3.putObject (bucket info) file rbody)

View file

@ -38,6 +38,13 @@ verification of content downloaded from an export. Some types of keys,
that are not based on checksums, cannot be downloaded from an export. that are not based on checksums, cannot be downloaded from an export.
And, git-annex will never trust an export to retain the content of a key. And, git-annex will never trust an export to retain the content of a key.
However, some special remotes, notably S3, support keeping track of old
versions of files stored in them. If a special remote is set up to do
that, it can be used as a key/value store and the limitations in the above
paragraph do not appy. Note that dropping content from such a remote is
not supported. See individual special remotes' documentation for
details of how to enable such versioning.
# OPTIONS # OPTIONS
* `--to=remote` * `--to=remote`

View file

@ -70,8 +70,20 @@ the S3 remote.
and UUID. This can be specified to pick a bucket name. and UUID. This can be specified to pick a bucket name.
* `exporttree` - Set to "yes" to make this special remote usable * `exporttree` - Set to "yes" to make this special remote usable
by [[git-annex-export]]. It will not be usable as a general-purpose by [[git-annex export|git-annex-export]].
special remote. It will not be usable as a general-purpose special remote.
* `versioning` - Setting this to "yes" along with "exporttree=yes",
and [manually enabling versioning for the S3 bucket in the AWS console](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/enable-versioning.html)
allows git-annex to access old versions of files exported to the
special remote with [[git-annex export|git-annex-export]].
Note that git-annex needs to remember S3 version IDs for files
sent to a remote configured this way, which will make the git-annex
branch a bit larger.
Also note that git-annex does not support dropping content from versioned
S3 buckets.
* `public` - Set to "yes" to allow public read access to files sent * `public` - Set to "yes" to allow public read access to files sent
to the S3 remote. This is accomplished by setting an ACL when each to the S3 remote. This is accomplished by setting an ACL when each

View file

@ -5,6 +5,8 @@ content and it can be accessed using a version ID (that S3 returns when
storing the content). So it should be possible for git-annex to allow storing the content). So it should be possible for git-annex to allow
downloading old versions of files from such a remote. downloading old versions of files from such a remote.
<https://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html>
Basically, store the S3 version ID in git-annex branch and support Basically, store the S3 version ID in git-annex branch and support
downloading using it. downloading using it.
@ -28,6 +30,17 @@ which files still need to be sent to an export. It uses the export database
to keep track of that. This is important, because the location tracking to keep track of that. This is important, because the location tracking
won't be updated, as discussed above. won't be updated, as discussed above.
The haskell aws library does not seem to support enabling versioning when
creating a bucket, so it would need to be done from the web console.
If the user enables versioning in git-annex but forgets to enable it
in the bucket (or later suspends versioning in the bucket), it's no
big problem; old files will not be retained and git-annex will notice
this in the usual way (drop locking, fsck). So, it seems that initremote
does not need to check if the versioning=yes setting matches the bucket
configuration. For same reasons, it's ok to enable versioning for an
existing remote.
## final plan ## final plan
Add an "appendOnly" field to Remote, indicating it retains all content stored Add an "appendOnly" field to Remote, indicating it retains all content stored
@ -44,15 +57,26 @@ Make exporttree=yes remotes that are appendOnly not be untrusted, and not force
verification of content, since the usual concerns about losing data when an verification of content, since the usual concerns about losing data when an
export is updated by someone else don't apply. done export is updated by someone else don't apply. done
Let S3 remotes be configured with versioned=yes or something like that Let S3 remotes be configured with versioning=yes which enables appendOnly.
(what does S3 call the feature?) which enables appendOnly. done
Make S3 store version IDs for uploaded keys in the per-remote log when so Make S3 store version IDs for exported files in the per-remote log when so
configured, and use them for when retrieving keys and for checkpresent. configured, and use them for when retrieving keys and for checkpresent.
Make S3 refuse to removeKey when configured appendOnly, failing with an error.
When a file was deleted from an exported tree, and then put back When a file was deleted from an exported tree, and then put back
in a later exported tree, it might get re-uploaded even though the content in a later exported tree, it might get re-uploaded even though the content
is still retained in the versioned remote. S3 might have a way to avoid is still retained in the versioned remote. S3 might have a way to avoid
such a redundant upload, if so it could support using it. such a redundant upload, if so it could support using it.
S3 does allow DELETE of a version of an object from a bucket. So it would
be possible to support `git annex drop` of old versions of a file from an
export remote. Dropping the current version though, would make the export
database inconsistent; it would not know that a file in the exported tree
was no longer present. I don't think that inconsitency can easily be
resolved -- bear in ming that multiple repositories can have an export db,
so it would need to look at location tracking for all objects in the export
to find ones that some other repository dropped. And dropping of only
keys that are not used in the current export doesn't help because another
repository may have changed the exported tree and be relying on the dropped
key being present in the export. So, DELETE from an appendonly export
won't be supported, at least for now.