git-annex/doc/todo/versioning_in_export_remotes.mdwn

84 lines
3.9 KiB
Text
Raw Normal View History

2018-08-28 16:14:06 +00:00
Some remotes like S3 support versioning of data stored in them.
When git-annex updates an export, it deletes the old
content from eg the S3 bucket, but with versioning enabled, S3 retains the
content and it can be accessed using a version ID (that S3 returns when
storing the content). So it should be possible for git-annex to allow
downloading old versions of files from such a remote.
<https://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html>
2018-08-29 18:12:18 +00:00
Basically, store the S3 version ID in git-annex branch and support
downloading using it.
2018-08-28 16:14:06 +00:00
2018-08-29 18:12:18 +00:00
But this has the problem that dropping makes git-annex think it's not in S3
any more, while what we want for export is for it to be removed from the
current bucket, but still tracked as present in S3.
2018-08-28 16:14:06 +00:00
The drop from S3 could fail, or "succeed" in a way that prevents the location
tracking being updated to say it lacks the content. Failing is how bup deals
2018-08-29 17:59:52 +00:00
with it. It seems confusing to have a drop appear to succeed but not really drop,
especially since dropping again would seem to do something a second time.
This does mean that git-annex drop/sync --content/assistant might try to do a
lot of drops from the remote, and generate a lot of noise when they fail.
Which is kind of ok for drop, since the user should be told that they can't
delete the data. Could add a way to say "this remote does not support drop",
and make at sync --content/assistant use that.
Note that git-annex export does not rely on location tracking to determine
which files still need to be sent to an export. It uses the export database
2018-08-29 18:12:18 +00:00
to keep track of that. This is important, because the location tracking
won't be updated, as discussed above.
2018-08-29 17:59:52 +00:00
The haskell aws library does not seem to support enabling versioning when
creating a bucket, so it would need to be done from the web console.
2018-08-31 17:56:32 +00:00
## other considerations
If the user enables versioning in git-annex but forgets to enable it
in the bucket (or later suspends versioning in the bucket), it's no
big problem; old files will not be retained and git-annex will notice
this in the usual way (drop locking, fsck). So, it seems that initremote
does not need to check if the versioning=yes setting matches the bucket
configuration. For same reasons, it's ok to enable versioning for an
existing remote.
S3 does allow DELETE of a version of an object from a bucket. So it would
be possible to support `git annex drop` of old versions of a file from an
export remote. Dropping the current version though, would make the export
database inconsistent; it would not know that a file in the exported tree
was no longer present. I don't think that inconsitency can easily be
resolved -- bear in ming that multiple repositories can have an export db,
so it would need to look at location tracking for all objects in the export
to find ones that some other repository dropped. And dropping of only
keys that are not used in the current export doesn't help because another
repository may have changed the exported tree and be relying on the dropped
key being present in the export. Unless... Could export conflict resultion
somehow detect that?
2018-08-31 18:01:24 +00:00
## final plan
2018-08-29 17:59:52 +00:00
Add an "appendOnly" field to Remote, indicating it retains all content stored
in it. done
2018-08-29 17:59:52 +00:00
Make `git annex export` check appendOnly when removing a file from an
export, and not update the location log, since the remote still contains
the content. done
2018-08-29 17:59:52 +00:00
Make git-annex sync and the assistant skip trying to drop from appendOnly
remotes since it's just going to fail. done
2018-08-29 17:59:52 +00:00
Make exporttree=yes remotes that are appendOnly not be untrusted, and not force
2018-08-29 17:59:52 +00:00
verification of content, since the usual concerns about losing data when an
export is updated by someone else don't apply. done
Let S3 remotes be configured with versioning=yes which enables appendOnly.
done
Make S3 store version IDs for exported files in the per-remote log when so
configured. done
Use version IDs when retrieving keys and for checkpresent. done
2018-09-05 16:21:52 +00:00
> all [[done]]! (but see [[support_public_versioned_S3_access]]) --[[Joey]]