From 0ff5a413110527573f739e5ac2e4cdebcb347571 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 30 Aug 2018 13:45:28 -0400 Subject: [PATCH] S3 versioning=yes config Not yet used. This commit was supported by the NSF-funded DataLad project. --- Remote/S3.hs | 10 +++++-- doc/git-annex-export.mdwn | 7 +++++ doc/special_remotes/S3.mdwn | 16 ++++++++-- doc/todo/versioning_in_export_remotes.mdwn | 34 ++++++++++++++++++---- 4 files changed, 57 insertions(+), 10 deletions(-) diff --git a/Remote/S3.hs b/Remote/S3.hs index a318e23229..eca71ea745 100644 --- a/Remote/S3.hs +++ b/Remote/S3.hs @@ -559,6 +559,7 @@ data S3Info = S3Info , metaHeaders :: [(T.Text, T.Text)] , partSize :: Maybe Integer , isIA :: Bool + , versioning :: Bool , public :: Bool , getpublicurl :: Maybe (BucketObject -> URLString) } @@ -577,9 +578,8 @@ extractS3Info c = do , metaHeaders = getMetaHeaders c , partSize = getPartSize c , isIA = configIA c - , public = case M.lookup "public" c of - Just "yes" -> True - _ -> False + , versioning = boolcfg "versioning" + , public = boolcfg "public" , getpublicurl = case M.lookup "publicurl" c of Just u -> Just $ \p -> genericPublicUrl p u Nothing -> case M.lookup "host" c of @@ -591,6 +591,10 @@ extractS3Info c = do _ -> Nothing } return info + where + boolcfg k = case M.lookup k c of + Just "yes" -> True + _ -> False putObject :: S3Info -> T.Text -> RequestBody -> S3.PutObject putObject info file rbody = (S3.putObject (bucket info) file rbody) diff --git a/doc/git-annex-export.mdwn b/doc/git-annex-export.mdwn index 1d7170cee3..95f56e024b 100644 --- a/doc/git-annex-export.mdwn +++ b/doc/git-annex-export.mdwn @@ -38,6 +38,13 @@ verification of content downloaded from an export. Some types of keys, that are not based on checksums, cannot be downloaded from an export. And, git-annex will never trust an export to retain the content of a key. +However, some special remotes, notably S3, support keeping track of old +versions of files stored in them. If a special remote is set up to do +that, it can be used as a key/value store and the limitations in the above +paragraph do not appy. Note that dropping content from such a remote is +not supported. See individual special remotes' documentation for +details of how to enable such versioning. + # OPTIONS * `--to=remote` diff --git a/doc/special_remotes/S3.mdwn b/doc/special_remotes/S3.mdwn index f432e6a6bb..2947db015e 100644 --- a/doc/special_remotes/S3.mdwn +++ b/doc/special_remotes/S3.mdwn @@ -70,8 +70,20 @@ the S3 remote. and UUID. This can be specified to pick a bucket name. * `exporttree` - Set to "yes" to make this special remote usable - by [[git-annex-export]]. It will not be usable as a general-purpose - special remote. + by [[git-annex export|git-annex-export]]. + It will not be usable as a general-purpose special remote. + +* `versioning` - Setting this to "yes" along with "exporttree=yes", + and [manually enabling versioning for the S3 bucket in the AWS console](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/enable-versioning.html) + allows git-annex to access old versions of files exported to the + special remote with [[git-annex export|git-annex-export]]. + + Note that git-annex needs to remember S3 version IDs for files + sent to a remote configured this way, which will make the git-annex + branch a bit larger. + + Also note that git-annex does not support dropping content from versioned + S3 buckets. * `public` - Set to "yes" to allow public read access to files sent to the S3 remote. This is accomplished by setting an ACL when each diff --git a/doc/todo/versioning_in_export_remotes.mdwn b/doc/todo/versioning_in_export_remotes.mdwn index 18e82e2479..e13a0aab3e 100644 --- a/doc/todo/versioning_in_export_remotes.mdwn +++ b/doc/todo/versioning_in_export_remotes.mdwn @@ -5,6 +5,8 @@ content and it can be accessed using a version ID (that S3 returns when storing the content). So it should be possible for git-annex to allow downloading old versions of files from such a remote. + + Basically, store the S3 version ID in git-annex branch and support downloading using it. @@ -28,6 +30,17 @@ which files still need to be sent to an export. It uses the export database to keep track of that. This is important, because the location tracking won't be updated, as discussed above. +The haskell aws library does not seem to support enabling versioning when +creating a bucket, so it would need to be done from the web console. + +If the user enables versioning in git-annex but forgets to enable it +in the bucket (or later suspends versioning in the bucket), it's no +big problem; old files will not be retained and git-annex will notice +this in the usual way (drop locking, fsck). So, it seems that initremote +does not need to check if the versioning=yes setting matches the bucket +configuration. For same reasons, it's ok to enable versioning for an +existing remote. + ## final plan Add an "appendOnly" field to Remote, indicating it retains all content stored @@ -44,15 +57,26 @@ Make exporttree=yes remotes that are appendOnly not be untrusted, and not force verification of content, since the usual concerns about losing data when an export is updated by someone else don't apply. done -Let S3 remotes be configured with versioned=yes or something like that -(what does S3 call the feature?) which enables appendOnly. +Let S3 remotes be configured with versioning=yes which enables appendOnly. +done -Make S3 store version IDs for uploaded keys in the per-remote log when so +Make S3 store version IDs for exported files in the per-remote log when so configured, and use them for when retrieving keys and for checkpresent. -Make S3 refuse to removeKey when configured appendOnly, failing with an error. - When a file was deleted from an exported tree, and then put back in a later exported tree, it might get re-uploaded even though the content is still retained in the versioned remote. S3 might have a way to avoid such a redundant upload, if so it could support using it. + +S3 does allow DELETE of a version of an object from a bucket. So it would +be possible to support `git annex drop` of old versions of a file from an +export remote. Dropping the current version though, would make the export +database inconsistent; it would not know that a file in the exported tree +was no longer present. I don't think that inconsitency can easily be +resolved -- bear in ming that multiple repositories can have an export db, +so it would need to look at location tracking for all objects in the export +to find ones that some other repository dropped. And dropping of only +keys that are not used in the current export doesn't help because another +repository may have changed the exported tree and be relying on the dropped +key being present in the export. So, DELETE from an appendonly export +won't be supported, at least for now.