S3 versioning=yes config
Not yet used. This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
parent
358178fbfb
commit
0ff5a41311
4 changed files with 57 additions and 10 deletions
10
Remote/S3.hs
10
Remote/S3.hs
|
@ -559,6 +559,7 @@ data S3Info = S3Info
|
|||
, metaHeaders :: [(T.Text, T.Text)]
|
||||
, partSize :: Maybe Integer
|
||||
, isIA :: Bool
|
||||
, versioning :: Bool
|
||||
, public :: Bool
|
||||
, getpublicurl :: Maybe (BucketObject -> URLString)
|
||||
}
|
||||
|
@ -577,9 +578,8 @@ extractS3Info c = do
|
|||
, metaHeaders = getMetaHeaders c
|
||||
, partSize = getPartSize c
|
||||
, isIA = configIA c
|
||||
, public = case M.lookup "public" c of
|
||||
Just "yes" -> True
|
||||
_ -> False
|
||||
, versioning = boolcfg "versioning"
|
||||
, public = boolcfg "public"
|
||||
, getpublicurl = case M.lookup "publicurl" c of
|
||||
Just u -> Just $ \p -> genericPublicUrl p u
|
||||
Nothing -> case M.lookup "host" c of
|
||||
|
@ -591,6 +591,10 @@ extractS3Info c = do
|
|||
_ -> Nothing
|
||||
}
|
||||
return info
|
||||
where
|
||||
boolcfg k = case M.lookup k c of
|
||||
Just "yes" -> True
|
||||
_ -> False
|
||||
|
||||
putObject :: S3Info -> T.Text -> RequestBody -> S3.PutObject
|
||||
putObject info file rbody = (S3.putObject (bucket info) file rbody)
|
||||
|
|
|
@ -38,6 +38,13 @@ verification of content downloaded from an export. Some types of keys,
|
|||
that are not based on checksums, cannot be downloaded from an export.
|
||||
And, git-annex will never trust an export to retain the content of a key.
|
||||
|
||||
However, some special remotes, notably S3, support keeping track of old
|
||||
versions of files stored in them. If a special remote is set up to do
|
||||
that, it can be used as a key/value store and the limitations in the above
|
||||
paragraph do not appy. Note that dropping content from such a remote is
|
||||
not supported. See individual special remotes' documentation for
|
||||
details of how to enable such versioning.
|
||||
|
||||
# OPTIONS
|
||||
|
||||
* `--to=remote`
|
||||
|
|
|
@ -70,8 +70,20 @@ the S3 remote.
|
|||
and UUID. This can be specified to pick a bucket name.
|
||||
|
||||
* `exporttree` - Set to "yes" to make this special remote usable
|
||||
by [[git-annex-export]]. It will not be usable as a general-purpose
|
||||
special remote.
|
||||
by [[git-annex export|git-annex-export]].
|
||||
It will not be usable as a general-purpose special remote.
|
||||
|
||||
* `versioning` - Setting this to "yes" along with "exporttree=yes",
|
||||
and [manually enabling versioning for the S3 bucket in the AWS console](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/enable-versioning.html)
|
||||
allows git-annex to access old versions of files exported to the
|
||||
special remote with [[git-annex export|git-annex-export]].
|
||||
|
||||
Note that git-annex needs to remember S3 version IDs for files
|
||||
sent to a remote configured this way, which will make the git-annex
|
||||
branch a bit larger.
|
||||
|
||||
Also note that git-annex does not support dropping content from versioned
|
||||
S3 buckets.
|
||||
|
||||
* `public` - Set to "yes" to allow public read access to files sent
|
||||
to the S3 remote. This is accomplished by setting an ACL when each
|
||||
|
|
|
@ -5,6 +5,8 @@ content and it can be accessed using a version ID (that S3 returns when
|
|||
storing the content). So it should be possible for git-annex to allow
|
||||
downloading old versions of files from such a remote.
|
||||
|
||||
<https://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html>
|
||||
|
||||
Basically, store the S3 version ID in git-annex branch and support
|
||||
downloading using it.
|
||||
|
||||
|
@ -28,6 +30,17 @@ which files still need to be sent to an export. It uses the export database
|
|||
to keep track of that. This is important, because the location tracking
|
||||
won't be updated, as discussed above.
|
||||
|
||||
The haskell aws library does not seem to support enabling versioning when
|
||||
creating a bucket, so it would need to be done from the web console.
|
||||
|
||||
If the user enables versioning in git-annex but forgets to enable it
|
||||
in the bucket (or later suspends versioning in the bucket), it's no
|
||||
big problem; old files will not be retained and git-annex will notice
|
||||
this in the usual way (drop locking, fsck). So, it seems that initremote
|
||||
does not need to check if the versioning=yes setting matches the bucket
|
||||
configuration. For same reasons, it's ok to enable versioning for an
|
||||
existing remote.
|
||||
|
||||
## final plan
|
||||
|
||||
Add an "appendOnly" field to Remote, indicating it retains all content stored
|
||||
|
@ -44,15 +57,26 @@ Make exporttree=yes remotes that are appendOnly not be untrusted, and not force
|
|||
verification of content, since the usual concerns about losing data when an
|
||||
export is updated by someone else don't apply. done
|
||||
|
||||
Let S3 remotes be configured with versioned=yes or something like that
|
||||
(what does S3 call the feature?) which enables appendOnly.
|
||||
Let S3 remotes be configured with versioning=yes which enables appendOnly.
|
||||
done
|
||||
|
||||
Make S3 store version IDs for uploaded keys in the per-remote log when so
|
||||
Make S3 store version IDs for exported files in the per-remote log when so
|
||||
configured, and use them for when retrieving keys and for checkpresent.
|
||||
|
||||
Make S3 refuse to removeKey when configured appendOnly, failing with an error.
|
||||
|
||||
When a file was deleted from an exported tree, and then put back
|
||||
in a later exported tree, it might get re-uploaded even though the content
|
||||
is still retained in the versioned remote. S3 might have a way to avoid
|
||||
such a redundant upload, if so it could support using it.
|
||||
|
||||
S3 does allow DELETE of a version of an object from a bucket. So it would
|
||||
be possible to support `git annex drop` of old versions of a file from an
|
||||
export remote. Dropping the current version though, would make the export
|
||||
database inconsistent; it would not know that a file in the exported tree
|
||||
was no longer present. I don't think that inconsitency can easily be
|
||||
resolved -- bear in ming that multiple repositories can have an export db,
|
||||
so it would need to look at location tracking for all objects in the export
|
||||
to find ones that some other repository dropped. And dropping of only
|
||||
keys that are not used in the current export doesn't help because another
|
||||
repository may have changed the exported tree and be relying on the dropped
|
||||
key being present in the export. So, DELETE from an appendonly export
|
||||
won't be supported, at least for now.
|
||||
|
|
Loading…
Reference in a new issue