simplified design for indirect uploads

This commit is contained in:
Joey Hess 2024-10-28 13:29:33 -04:00
parent 9db69a4c2c
commit 926b632faa
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38

View file

@ -685,6 +685,9 @@ When a client wants to upload an object, the proxy could indicate that the
upload should not be sent to it, but instead be PUT to a HTTP url that it upload should not be sent to it, but instead be PUT to a HTTP url that it
provides to the client. provides to the client.
(This would presumably only be used with unencrypted and unchunked special
remotes.)
An example use case involves An example use case involves
[presigned S3 urls](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html). [presigned S3 urls](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html).
When the proxy is to a S3 bucket, having the client upload When the proxy is to a S3 bucket, having the client upload
@ -739,3 +742,49 @@ in cases like OpenNeuro's, where something other than git-annex is
communicating with the git-annex proxy. Alternatively, allow the wrong communicating with the git-annex proxy. Alternatively, allow the wrong
content to be sent to the url, but then indicate that with INVALID. A later content to be sent to the url, but then indicate that with INVALID. A later
reupload would overwrite the bad content. reupload would overwrite the bad content.
> Something like this can already be accomplished another way: Don't change
> the protocol at all, have the client generate the presigned url itself
> (or request it from something other than git-annex), upload the object,
> and update the git-annex branch to indicate it's stored in the S3
> special remote.
>
> That needs the client to be aware of what filename to use in the S3
> bucket (either a git-annex key or the exported filename depending on the
> special remote configuration). And it has to know how to update the
> git-annex location log. And for exporttree remotes, the export log.
> So effectively, the client needs to either be git-annex or implement a
> decent amount of its internals, or git-annex would need some additional
> plumbing commands for the client to use. (If the client is javascript
> running in the browser, it would be difficult for it to run git-annex
> though.)
>
> Perhaps there is something in the middle between these two extremes.
> Extend the P2P protocol to let the client indicate it has
> uploaded a key to the remote. The proxy then updates the git-annex branch
> to reflect that the upload happened. (Maybe it uses checkpresent to
> verify it first.)
>
> This leaves it up to the client to understand what filename to
> use to store a key in the S3 bucket (or wherever). For an exporttree=yes
> remote, it's simply the file being added, and for other remotes,
> `git-annex examinekey` can be used. Perhaps the protocol could indicate
> the filename for the client to use. But generally what filename or whatever
> to use for a key in a special remote is something only the special
> remote's implementation knows about, there is not an interface to get it.
> In practice, there are a few common patterns and anyway this would only
> be used with some particular special remote, like S3, that the client
> understands how to write to.
>
> The P2P protocol could be extended by letting ALREADY-STORED be
> sent by the client instead of DATA:
>
> PUT associatedfile key
> PUT-FROM 0
> ALREADY-STORED
> SUCCESS
>
> That lets the server send ALREADY-HAVE instead of PUT-FROM, preventing
> the client from uploading content that is already present. And it can
> send SUCCESS-PLUS at the end as well, or FAILURE if the checkpresent
> verification fails.