remove false starts, simplify
This commit is contained in:
parent
5b78952f78
commit
dad627fa9e
1 changed files with 7 additions and 56 deletions
|
@ -5,62 +5,12 @@ content and it can be accessed using a version ID (that S3 returns when
|
||||||
storing the content). So it should be possible for git-annex to allow
|
storing the content). So it should be possible for git-annex to allow
|
||||||
downloading old versions of files from such a remote.
|
downloading old versions of files from such a remote.
|
||||||
|
|
||||||
## remote pair approach
|
Basically, store the S3 version ID in git-annex branch and support
|
||||||
|
downloading using it.
|
||||||
|
|
||||||
One way would be to have the S3 remote, when storing a file to a S3 bucket
|
But this has the problem that dropping makes git-annex think it's not in S3
|
||||||
that is known to support versioning, to add an url using the S3 version ID
|
any more, while what we want for export is for it to be removed from the
|
||||||
to the web remote.
|
current bucket, but still tracked as present in S3.
|
||||||
|
|
||||||
However, some remotes that support versioning won't be accessible via the
|
|
||||||
web, so that's not a general solution.
|
|
||||||
|
|
||||||
(Also, S3 buckets only support web access when configured to be public.)
|
|
||||||
|
|
||||||
This generalizes to a pair of remotes, it could be S3+web or S3 could instantiate
|
|
||||||
two remotes automatically, and use the second for versioned data.
|
|
||||||
|
|
||||||
Note that location tracking info has to be carefully managed, to avoid
|
|
||||||
there appearing to be two copies of data that's only really stored in one place.
|
|
||||||
When uploading to S3, it should not yet add the url or mark the content
|
|
||||||
as present in the web. Then when dropping from S3, after the
|
|
||||||
drop succeeds, it can mark the content as present in the web and add its url.
|
|
||||||
|
|
||||||
There's a potential race there still, since the remote does not update location
|
|
||||||
tracking when dropping, the caller of the remote does. So if S3 marks content
|
|
||||||
as being present in the web, it will breifly appear present in both locations
|
|
||||||
and break numcopies counting. Would need to extend the API to avoid this race.
|
|
||||||
|
|
||||||
> Ah, but: exporttree remotes are always untrusted for other reasons,
|
|
||||||
> so location tracking is less of a problem. Even if location tracking
|
|
||||||
> shows the content in two places, a drop will skip the exporttree remote
|
|
||||||
> so will only treat the pair as one copy.
|
|
||||||
>
|
|
||||||
> So the location tracking problem is limited to --copies=N matching incorrectly,
|
|
||||||
> and whereis listing both locations, and some preferred content
|
|
||||||
> expressions behaving in surprising ways.
|
|
||||||
|
|
||||||
Unfortunately this remote pair approach will leak out into git-annex's interface;
|
|
||||||
it will show two remotes. Not a problem for S3+web really, but if S3 instantiates
|
|
||||||
an S3oldversions remote, that necessarily adds the potential for confusion,
|
|
||||||
and adds complexity in configuration of preferred content settings, repo groups,
|
|
||||||
etc.
|
|
||||||
|
|
||||||
> Could flip it; make the main remote track the versioned data, and the
|
|
||||||
> exporttree remote be secondary. Since only git-annex export/sync need to
|
|
||||||
> access that remote, they could have a special case to look for such a
|
|
||||||
> secondary remote and act on it. All other commands would only operate on
|
|
||||||
> the main remote. Indeed, the secondary remote would not need to be
|
|
||||||
> in the RemoteList at all.
|
|
||||||
>
|
|
||||||
> Doesn't avoid preferred content etc complexity, still.
|
|
||||||
|
|
||||||
## location tracking approach
|
|
||||||
|
|
||||||
Another way is to store the S3 version ID in git-annex branch and support
|
|
||||||
downloading using it. But this has the problem that dropping makes
|
|
||||||
git-annex think it's not in S3 any more, while what we want for export
|
|
||||||
is for it to be removed from the current bucket, but still tracked as
|
|
||||||
present in S3.
|
|
||||||
|
|
||||||
The drop from S3 could fail, or "succeed" in a way that prevents the location
|
The drop from S3 could fail, or "succeed" in a way that prevents the location
|
||||||
tracking being updated to say it lacks the content. Failing is how bup deals
|
tracking being updated to say it lacks the content. Failing is how bup deals
|
||||||
|
@ -75,7 +25,8 @@ and make at sync --content/assistant use that.
|
||||||
|
|
||||||
Note that git-annex export does not rely on location tracking to determine
|
Note that git-annex export does not rely on location tracking to determine
|
||||||
which files still need to be sent to an export. It uses the export database
|
which files still need to be sent to an export. It uses the export database
|
||||||
to keep track of that.
|
to keep track of that. This is important, because the location tracking
|
||||||
|
won't be updated, as discussed above.
|
||||||
|
|
||||||
## final plan
|
## final plan
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue