From dad627fa9e58258e651fab7f777cad8634b29760 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 29 Aug 2018 14:12:18 -0400 Subject: [PATCH] remove false starts, simplify --- doc/todo/versioning_in_export_remotes.mdwn | 63 +++------------------- 1 file changed, 7 insertions(+), 56 deletions(-) diff --git a/doc/todo/versioning_in_export_remotes.mdwn b/doc/todo/versioning_in_export_remotes.mdwn index 718443830c..86790d3fef 100644 --- a/doc/todo/versioning_in_export_remotes.mdwn +++ b/doc/todo/versioning_in_export_remotes.mdwn @@ -5,62 +5,12 @@ content and it can be accessed using a version ID (that S3 returns when storing the content). So it should be possible for git-annex to allow downloading old versions of files from such a remote. -## remote pair approach +Basically, store the S3 version ID in git-annex branch and support +downloading using it. -One way would be to have the S3 remote, when storing a file to a S3 bucket -that is known to support versioning, to add an url using the S3 version ID -to the web remote. - -However, some remotes that support versioning won't be accessible via the -web, so that's not a general solution. - -(Also, S3 buckets only support web access when configured to be public.) - -This generalizes to a pair of remotes, it could be S3+web or S3 could instantiate -two remotes automatically, and use the second for versioned data. - -Note that location tracking info has to be carefully managed, to avoid -there appearing to be two copies of data that's only really stored in one place. -When uploading to S3, it should not yet add the url or mark the content -as present in the web. Then when dropping from S3, after the -drop succeeds, it can mark the content as present in the web and add its url. - -There's a potential race there still, since the remote does not update location -tracking when dropping, the caller of the remote does. So if S3 marks content -as being present in the web, it will breifly appear present in both locations -and break numcopies counting. Would need to extend the API to avoid this race. - -> Ah, but: exporttree remotes are always untrusted for other reasons, -> so location tracking is less of a problem. Even if location tracking -> shows the content in two places, a drop will skip the exporttree remote -> so will only treat the pair as one copy. -> -> So the location tracking problem is limited to --copies=N matching incorrectly, -> and whereis listing both locations, and some preferred content -> expressions behaving in surprising ways. - -Unfortunately this remote pair approach will leak out into git-annex's interface; -it will show two remotes. Not a problem for S3+web really, but if S3 instantiates -an S3oldversions remote, that necessarily adds the potential for confusion, -and adds complexity in configuration of preferred content settings, repo groups, -etc. - -> Could flip it; make the main remote track the versioned data, and the -> exporttree remote be secondary. Since only git-annex export/sync need to -> access that remote, they could have a special case to look for such a -> secondary remote and act on it. All other commands would only operate on -> the main remote. Indeed, the secondary remote would not need to be -> in the RemoteList at all. -> -> Doesn't avoid preferred content etc complexity, still. - -## location tracking approach - -Another way is to store the S3 version ID in git-annex branch and support -downloading using it. But this has the problem that dropping makes -git-annex think it's not in S3 any more, while what we want for export -is for it to be removed from the current bucket, but still tracked as -present in S3. +But this has the problem that dropping makes git-annex think it's not in S3 +any more, while what we want for export is for it to be removed from the +current bucket, but still tracked as present in S3. The drop from S3 could fail, or "succeed" in a way that prevents the location tracking being updated to say it lacks the content. Failing is how bup deals @@ -75,7 +25,8 @@ and make at sync --content/assistant use that. Note that git-annex export does not rely on location tracking to determine which files still need to be sent to an export. It uses the export database -to keep track of that. +to keep track of that. This is important, because the location tracking +won't be updated, as discussed above. ## final plan