S3 versioning=yes config

Not yet used.

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2018-08-30 13:45:28 -04:00
parent 358178fbfb
commit 0ff5a41311
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 57 additions and 10 deletions

View file

@ -5,6 +5,8 @@ content and it can be accessed using a version ID (that S3 returns when
storing the content). So it should be possible for git-annex to allow
downloading old versions of files from such a remote.
<https://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html>
Basically, store the S3 version ID in git-annex branch and support
downloading using it.
@ -28,6 +30,17 @@ which files still need to be sent to an export. It uses the export database
to keep track of that. This is important, because the location tracking
won't be updated, as discussed above.
The haskell aws library does not seem to support enabling versioning when
creating a bucket, so it would need to be done from the web console.
If the user enables versioning in git-annex but forgets to enable it
in the bucket (or later suspends versioning in the bucket), it's no
big problem; old files will not be retained and git-annex will notice
this in the usual way (drop locking, fsck). So, it seems that initremote
does not need to check if the versioning=yes setting matches the bucket
configuration. For same reasons, it's ok to enable versioning for an
existing remote.
## final plan
Add an "appendOnly" field to Remote, indicating it retains all content stored
@ -44,15 +57,26 @@ Make exporttree=yes remotes that are appendOnly not be untrusted, and not force
verification of content, since the usual concerns about losing data when an
export is updated by someone else don't apply. done
Let S3 remotes be configured with versioned=yes or something like that
(what does S3 call the feature?) which enables appendOnly.
Let S3 remotes be configured with versioning=yes which enables appendOnly.
done
Make S3 store version IDs for uploaded keys in the per-remote log when so
Make S3 store version IDs for exported files in the per-remote log when so
configured, and use them for when retrieving keys and for checkpresent.
Make S3 refuse to removeKey when configured appendOnly, failing with an error.
When a file was deleted from an exported tree, and then put back
in a later exported tree, it might get re-uploaded even though the content
is still retained in the versioned remote. S3 might have a way to avoid
such a redundant upload, if so it could support using it.
S3 does allow DELETE of a version of an object from a bucket. So it would
be possible to support `git annex drop` of old versions of a file from an
export remote. Dropping the current version though, would make the export
database inconsistent; it would not know that a file in the exported tree
was no longer present. I don't think that inconsitency can easily be
resolved -- bear in ming that multiple repositories can have an export db,
so it would need to look at location tracking for all objects in the export
to find ones that some other repository dropped. And dropping of only
keys that are not used in the current export doesn't help because another
repository may have changed the exported tree and be relying on the dropped
key being present in the export. So, DELETE from an appendonly export
won't be supported, at least for now.