store S3 version IDs

Only done when versioning=yes is configured. It could always do it when
S3 sends back a version id, but there may be buckets that have
versioning enabled by accident, so it seemed better to honor the
configuration.

S3's docs say version IDs are "randomly generated", so presumably
storing the same content twice gets two different ones not the same one.
So I considered storing a list of version IDs for a key. That would
allow removing the key completely. But.. The way Logs.RemoteState works,
when there are multiple writers, the last writer wins. So storing a list
would need a different log format that merges, which seemed overkill to support
removing a key from an append-only remote.

Note that Logs.RemoteState for S3 is now dedicated to version IDs.
If something else needs to be stored, a new log will be needed to do it.

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2018-08-30 14:22:26 -04:00
parent 0ff5a41311
commit 794e9a7a44
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 43 additions and 6 deletions

View file

@ -61,7 +61,11 @@ Let S3 remotes be configured with versioning=yes which enables appendOnly.
done
Make S3 store version IDs for exported files in the per-remote log when so
configured, and use them for when retrieving keys and for checkpresent.
configured. done
Use version IDs when retrieving keys and for checkpresent.
Can public urls be generated using version IDs?
When a file was deleted from an exported tree, and then put back
in a later exported tree, it might get re-uploaded even though the content
@ -80,3 +84,7 @@ keys that are not used in the current export doesn't help because another
repository may have changed the exported tree and be relying on the dropped
key being present in the export. So, DELETE from an appendonly export
won't be supported, at least for now.
Another reason DELETE from appendonly is not supported is that only one
version ID is stored per key, but the same key could have its content in
the bucket multiple times under different version IDs.