finalized design for proxying to exporttree=yes annexobjects=yes special remotes

This commit is contained in:
Joey Hess 2024-08-06 11:45:45 -04:00
parent 84d27cf34f
commit 4750ffbd3b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 39 additions and 22 deletions

View file

@ -616,27 +616,33 @@ store any key:
* Configure annex-tracking-branch in the proxy's git config.
* Then the user's workflow is simply: `git-annex push`
* The proxy handles PUT/GET/REMOVE of a key that is not in the
annex-tracking branch that it currently knows about, by using
the special remote's .git/annex/objects/ location.
* Upon receiving a new annex-tracking-branch or any transfer of a key
used in the current annex-tracking-branch, the proxy can update
the exporttree=yes remote. This needs to happen incrementally,
eg upon receiving a key, just proxy it on to the exporttree=yes remote,
and update the export database. Once all keys are received, update
the git-annex branch to indicate a new tree has been exported.
* `git-annex sync` may optionally push updates to the annex-tracking-branch
before sending content. This can let the proxy be more efficient,
especially when the special remote does not support renaming.
Note that this necessarily means that an object that the client uploads
once to the proxy might need to be uploaded multiple times from the proxy
to the special remote. Eg, if a key is used 10 times in a tree, it will
need to upload 10 times. Adding a "copy" operation to exportActions would
avoid this problem, but only for special remotes that were able to
implement it. Even a rename of a single file can need the proxy to download
it from the special remote and upload it back under a new name, when the
special remote does not support renames.
* The proxy handles PUT by always storing to the special remote's
.git/annex/objects/ location, not updating the exported tree.
* The proxy allows REMOVE from the special remote's
.git/annex/objects/ location, but not removal of keys
that are in the currently exported tree.
* When `git-annex post-receive` is run by the post-receive hook
and the annex-tracking-branch has been updated, it exports
the tree to the special remote.
(But, `git-annex push` sends the updated tree first, so
this will often be an incomplete export.)
* When there is an incomplete export and a key is received
that is part of that export, check if it is the *last* key
that is needed to complete the export. If so, export the tree to the
special remote again.
(This avoids overhead and complication of incrementally updating
the export. It relies on the special remote supporting renameExport.
Incrementally updating the export might be worth doing eventually,
for special remotes that do no support renameExport.)
* When exporting a tree to the special remote, handle cases
where a single key is used by multiple files, and the key is not
present locally. In this case it currently fails to update
one of the files (and renames the annexobjects location to the other
one). It will need to download the content from the special remote and
send it back to it.
* When the special remote does not support renameExport, will need to
download from the annexobjects location in order to store to the export
location.
## possible enhancement: indirect uploads

View file

@ -351,7 +351,6 @@ content from the key-value store.
See [[git-annex-extendcluster](1) for details.
* `updateproxy`
Update records with proxy configuration.

View file

@ -33,6 +33,18 @@ Planned schedule of work:
* Working on `exportreeplus` branch which is groundwork for proxying to
exporttree=yes special remotes.
* `git-annex post-receive` of a proxied exporttree=yes special remote's
annex-tracking-branch needs to exporttree.
* When there is an incomplete export and a key is received, the proxy
should check if it's the *last* key that is needed to complete the
export, and when so, do a final exporttree.
* Handle cases where a single key is used by multiple files in the exported
tree. Need to download from the special remote in order to export
multiple copies to it.
* Handle case where the special remote does not support renameExport.
Each key will need to be downloaded from it in order to export the key
back to it, if the proxy is to support such a remote.
## items deferred until later for p2p protocol over http
* `git-annex p2phttp` should support serving several repositories at the same