add tip for DATA-PRESENT feature
This commit is contained in:
parent
0117cdab11
commit
2ca6ecad58
3 changed files with 70 additions and 2 deletions
|
@ -1,4 +1,4 @@
|
|||
git-annex (10.20240928) UNRELEASED; urgency=medium
|
||||
git-annex (10.20241031) UNRELEASED; urgency=medium
|
||||
|
||||
* Sped up proxied downloads from special remotes, by streaming.
|
||||
* Added GETORDERED request to external special remote protocol.
|
||||
|
|
67
doc/tips/client_side_upload_to_a_special_remote.mdwn
Normal file
67
doc/tips/client_side_upload_to_a_special_remote.mdwn
Normal file
|
@ -0,0 +1,67 @@
|
|||
Suppose you are gathering files from users on the web and want to ingest
|
||||
that data into a git-annex repository, with a special remote that is eg, a
|
||||
S3 bucket.
|
||||
|
||||
You could have the web browser upload to your server, and run git-annex
|
||||
there, to add it to the git repository, and move it on to the S3 bucket.
|
||||
That is innefficient though, the file goes into the server and back out,
|
||||
and needs to be spooled to the server's disk as well.
|
||||
|
||||
This page shows a more efficient way to do it, where the web browser
|
||||
uploads directly to S3, and a git-annex repository is updated accordingly.
|
||||
There is not (currently) a way to run git-annex in a web browser.
|
||||
So you will need to write some custom code to do this. But with the
|
||||
method described here, you won't need to re-implement all of git-annex in
|
||||
the web browser.
|
||||
|
||||
Uploading from the browser to S3 is left an an exercise to the reader.
|
||||
All that matters really is, what filename to use in the S3 bucket? It's
|
||||
simplest to make the S3 special remote an exporttree=yes special remote,
|
||||
and then you can upload whatever filenames you want to it, rather than
|
||||
needing to use the same filenames git-annex uses for storing keys in a S3
|
||||
bucket.
|
||||
|
||||
Once the browser uploads the file to S3, you need to add a git-annex
|
||||
symlink or pointer file to the git repository. This can be done in the
|
||||
browser, using [js-git](https://github.com/creationix/js-git). Generating a
|
||||
git-annex key is not hard, just hash the file content before/while
|
||||
uploading it, and see [[internals/key_format]]. Write that to a pointer
|
||||
file, or make a symlink to the appropriate directory under
|
||||
.git/annex/objects (a bit harder). Commit it to git and push to your
|
||||
server using js-git.
|
||||
|
||||
Now git-annex knows about the file. But it doesn't yet know it's been
|
||||
uploaded to the S3 special remote. To do this, you will need have your
|
||||
server set up to run git-annex. Set up the S3 special
|
||||
remote there. And make git-annex on the server a
|
||||
[proxy|git-annex-updateproxy]] for the S3 special remote:
|
||||
|
||||
git-annex initremote s3 type=S3 exporttree=yes encryption=none bucket=mybucket
|
||||
git config remote.s3.annex-proxy true
|
||||
git-annex updateproxy
|
||||
|
||||
For the web browser to be able to easily talk with git-annex on the server,
|
||||
you can run [[git-annex p2phttp|git-annex-p2phttp]].
|
||||
The web browser will be speaking the [[doc/design/P2P_protocol_over_HTTP]].
|
||||
|
||||
Make sure you have git-annex 10.20241031 or newer installed. That version
|
||||
extended the [[design/p2p_protocol]] with a `DATA-PRESENT` feature, which
|
||||
is just what you need.
|
||||
|
||||
All the web browser needs to do is `POST /git-annex/$uuid/v4/put`
|
||||
with `data-present=true` included in the URL parameters, along with the
|
||||
key of the file that was added to the git repository.
|
||||
Replace `$uuid` with the UUID of the S3 special remote.
|
||||
You can look that up with eg `git config remote.s3.annex-uuid`.
|
||||
|
||||
When the git-annex HTTP server receives that request, since it is
|
||||
configured to be able to proxy for the S3 special remote, it will act the
|
||||
same as if the content of the file had been sent in the request. But thanks
|
||||
to `data-present=true`, it knows the data is already in the S3 special
|
||||
remote. So it updates the git-annex branch to reflect that the file is
|
||||
stored there.
|
||||
|
||||
Now if someone else clones the git repository, they can `git-annex get` the
|
||||
file, and it will be downloaded from the S3 bucket, if that bucket is
|
||||
configured to let them read it. Your server never needs to deal with the
|
||||
content of the file.
|
|
@ -127,12 +127,13 @@ Planned schedule of work:
|
|||
* Support using a proxy when its url is a P2P address.
|
||||
(Eg tor-annex remotes.)
|
||||
|
||||
|
||||
## completed items for October's work on streaming through proxy to special remotes
|
||||
|
||||
* Stream downloads through proxy for all special remotes that indicate
|
||||
they download in order.
|
||||
* Added ORDERED message to external special remote protocol.
|
||||
* Added DATA-PRESENT and documented in
|
||||
[[tips/client_side_upload_to_a_special_remote]]
|
||||
|
||||
## completed items for September's work on proving behavior of preferred content
|
||||
|
||||
|
|
Loading…
Reference in a new issue