S3 versioning=yes config
Not yet used. This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
parent
358178fbfb
commit
0ff5a41311
4 changed files with 57 additions and 10 deletions
10
Remote/S3.hs
10
Remote/S3.hs
|
@ -559,6 +559,7 @@ data S3Info = S3Info
|
||||||
, metaHeaders :: [(T.Text, T.Text)]
|
, metaHeaders :: [(T.Text, T.Text)]
|
||||||
, partSize :: Maybe Integer
|
, partSize :: Maybe Integer
|
||||||
, isIA :: Bool
|
, isIA :: Bool
|
||||||
|
, versioning :: Bool
|
||||||
, public :: Bool
|
, public :: Bool
|
||||||
, getpublicurl :: Maybe (BucketObject -> URLString)
|
, getpublicurl :: Maybe (BucketObject -> URLString)
|
||||||
}
|
}
|
||||||
|
@ -577,9 +578,8 @@ extractS3Info c = do
|
||||||
, metaHeaders = getMetaHeaders c
|
, metaHeaders = getMetaHeaders c
|
||||||
, partSize = getPartSize c
|
, partSize = getPartSize c
|
||||||
, isIA = configIA c
|
, isIA = configIA c
|
||||||
, public = case M.lookup "public" c of
|
, versioning = boolcfg "versioning"
|
||||||
Just "yes" -> True
|
, public = boolcfg "public"
|
||||||
_ -> False
|
|
||||||
, getpublicurl = case M.lookup "publicurl" c of
|
, getpublicurl = case M.lookup "publicurl" c of
|
||||||
Just u -> Just $ \p -> genericPublicUrl p u
|
Just u -> Just $ \p -> genericPublicUrl p u
|
||||||
Nothing -> case M.lookup "host" c of
|
Nothing -> case M.lookup "host" c of
|
||||||
|
@ -591,6 +591,10 @@ extractS3Info c = do
|
||||||
_ -> Nothing
|
_ -> Nothing
|
||||||
}
|
}
|
||||||
return info
|
return info
|
||||||
|
where
|
||||||
|
boolcfg k = case M.lookup k c of
|
||||||
|
Just "yes" -> True
|
||||||
|
_ -> False
|
||||||
|
|
||||||
putObject :: S3Info -> T.Text -> RequestBody -> S3.PutObject
|
putObject :: S3Info -> T.Text -> RequestBody -> S3.PutObject
|
||||||
putObject info file rbody = (S3.putObject (bucket info) file rbody)
|
putObject info file rbody = (S3.putObject (bucket info) file rbody)
|
||||||
|
|
|
@ -38,6 +38,13 @@ verification of content downloaded from an export. Some types of keys,
|
||||||
that are not based on checksums, cannot be downloaded from an export.
|
that are not based on checksums, cannot be downloaded from an export.
|
||||||
And, git-annex will never trust an export to retain the content of a key.
|
And, git-annex will never trust an export to retain the content of a key.
|
||||||
|
|
||||||
|
However, some special remotes, notably S3, support keeping track of old
|
||||||
|
versions of files stored in them. If a special remote is set up to do
|
||||||
|
that, it can be used as a key/value store and the limitations in the above
|
||||||
|
paragraph do not appy. Note that dropping content from such a remote is
|
||||||
|
not supported. See individual special remotes' documentation for
|
||||||
|
details of how to enable such versioning.
|
||||||
|
|
||||||
# OPTIONS
|
# OPTIONS
|
||||||
|
|
||||||
* `--to=remote`
|
* `--to=remote`
|
||||||
|
|
|
@ -70,8 +70,20 @@ the S3 remote.
|
||||||
and UUID. This can be specified to pick a bucket name.
|
and UUID. This can be specified to pick a bucket name.
|
||||||
|
|
||||||
* `exporttree` - Set to "yes" to make this special remote usable
|
* `exporttree` - Set to "yes" to make this special remote usable
|
||||||
by [[git-annex-export]]. It will not be usable as a general-purpose
|
by [[git-annex export|git-annex-export]].
|
||||||
special remote.
|
It will not be usable as a general-purpose special remote.
|
||||||
|
|
||||||
|
* `versioning` - Setting this to "yes" along with "exporttree=yes",
|
||||||
|
and [manually enabling versioning for the S3 bucket in the AWS console](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/enable-versioning.html)
|
||||||
|
allows git-annex to access old versions of files exported to the
|
||||||
|
special remote with [[git-annex export|git-annex-export]].
|
||||||
|
|
||||||
|
Note that git-annex needs to remember S3 version IDs for files
|
||||||
|
sent to a remote configured this way, which will make the git-annex
|
||||||
|
branch a bit larger.
|
||||||
|
|
||||||
|
Also note that git-annex does not support dropping content from versioned
|
||||||
|
S3 buckets.
|
||||||
|
|
||||||
* `public` - Set to "yes" to allow public read access to files sent
|
* `public` - Set to "yes" to allow public read access to files sent
|
||||||
to the S3 remote. This is accomplished by setting an ACL when each
|
to the S3 remote. This is accomplished by setting an ACL when each
|
||||||
|
|
|
@ -5,6 +5,8 @@ content and it can be accessed using a version ID (that S3 returns when
|
||||||
storing the content). So it should be possible for git-annex to allow
|
storing the content). So it should be possible for git-annex to allow
|
||||||
downloading old versions of files from such a remote.
|
downloading old versions of files from such a remote.
|
||||||
|
|
||||||
|
<https://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html>
|
||||||
|
|
||||||
Basically, store the S3 version ID in git-annex branch and support
|
Basically, store the S3 version ID in git-annex branch and support
|
||||||
downloading using it.
|
downloading using it.
|
||||||
|
|
||||||
|
@ -28,6 +30,17 @@ which files still need to be sent to an export. It uses the export database
|
||||||
to keep track of that. This is important, because the location tracking
|
to keep track of that. This is important, because the location tracking
|
||||||
won't be updated, as discussed above.
|
won't be updated, as discussed above.
|
||||||
|
|
||||||
|
The haskell aws library does not seem to support enabling versioning when
|
||||||
|
creating a bucket, so it would need to be done from the web console.
|
||||||
|
|
||||||
|
If the user enables versioning in git-annex but forgets to enable it
|
||||||
|
in the bucket (or later suspends versioning in the bucket), it's no
|
||||||
|
big problem; old files will not be retained and git-annex will notice
|
||||||
|
this in the usual way (drop locking, fsck). So, it seems that initremote
|
||||||
|
does not need to check if the versioning=yes setting matches the bucket
|
||||||
|
configuration. For same reasons, it's ok to enable versioning for an
|
||||||
|
existing remote.
|
||||||
|
|
||||||
## final plan
|
## final plan
|
||||||
|
|
||||||
Add an "appendOnly" field to Remote, indicating it retains all content stored
|
Add an "appendOnly" field to Remote, indicating it retains all content stored
|
||||||
|
@ -44,15 +57,26 @@ Make exporttree=yes remotes that are appendOnly not be untrusted, and not force
|
||||||
verification of content, since the usual concerns about losing data when an
|
verification of content, since the usual concerns about losing data when an
|
||||||
export is updated by someone else don't apply. done
|
export is updated by someone else don't apply. done
|
||||||
|
|
||||||
Let S3 remotes be configured with versioned=yes or something like that
|
Let S3 remotes be configured with versioning=yes which enables appendOnly.
|
||||||
(what does S3 call the feature?) which enables appendOnly.
|
done
|
||||||
|
|
||||||
Make S3 store version IDs for uploaded keys in the per-remote log when so
|
Make S3 store version IDs for exported files in the per-remote log when so
|
||||||
configured, and use them for when retrieving keys and for checkpresent.
|
configured, and use them for when retrieving keys and for checkpresent.
|
||||||
|
|
||||||
Make S3 refuse to removeKey when configured appendOnly, failing with an error.
|
|
||||||
|
|
||||||
When a file was deleted from an exported tree, and then put back
|
When a file was deleted from an exported tree, and then put back
|
||||||
in a later exported tree, it might get re-uploaded even though the content
|
in a later exported tree, it might get re-uploaded even though the content
|
||||||
is still retained in the versioned remote. S3 might have a way to avoid
|
is still retained in the versioned remote. S3 might have a way to avoid
|
||||||
such a redundant upload, if so it could support using it.
|
such a redundant upload, if so it could support using it.
|
||||||
|
|
||||||
|
S3 does allow DELETE of a version of an object from a bucket. So it would
|
||||||
|
be possible to support `git annex drop` of old versions of a file from an
|
||||||
|
export remote. Dropping the current version though, would make the export
|
||||||
|
database inconsistent; it would not know that a file in the exported tree
|
||||||
|
was no longer present. I don't think that inconsitency can easily be
|
||||||
|
resolved -- bear in ming that multiple repositories can have an export db,
|
||||||
|
so it would need to look at location tracking for all objects in the export
|
||||||
|
to find ones that some other repository dropped. And dropping of only
|
||||||
|
keys that are not used in the current export doesn't help because another
|
||||||
|
repository may have changed the exported tree and be relying on the dropped
|
||||||
|
key being present in the export. So, DELETE from an appendonly export
|
||||||
|
won't be supported, at least for now.
|
||||||
|
|
Loading…
Reference in a new issue