git-annex/doc/todo/git-annex_proxies.mdwn

81 lines
3 KiB
Text
Raw Normal View History

This is a summary todo covering several subprojects, which would extend
git-annex to be able to use proxies which sit in front of a cluster of
repositories.
2024-05-01 16:19:12 +00:00
1. [[design/passthrough_proxy]]
2. [[design/p2p_protocol_over_http]]
3. [[design/balanced_preferred_content]]
4. [[todo/track_free_space_in_repos_via_git-annex_branch]]
5. [[todo/proving_preferred_content_behavior]]
Joey has received funding to work on this.
Planned schedule of work:
* June: git-annex proxy
* July, part 1: git-annex proxy support for exporttree
* July, part 2: p2p protocol over http
* August: balanced preferred content
* September: streaming through proxy to special remotes (especially S3)
* October: proving behavior of balanced preferred content with proxies
[[!tag projects/openneuro]]
2024-06-04 11:51:33 +00:00
# work notes
In development on the `proxy` branch.
2024-06-04 11:51:33 +00:00
For June's work on [[design/passthrough_proxy]], implementation plan:
2024-06-12 15:55:18 +00:00
* UUID discovery via git-annex branch. Add a log file listing UUIDs
accessible via proxy UUIDs. It also will contain the names
of the remotes that the proxy is a proxy for,
from the perspective of the proxy. (done)
2024-06-12 15:55:18 +00:00
* Add `git-annex updateproxy` command and remote.name.annex-proxy
configuration. (done)
2024-06-04 11:51:33 +00:00
2024-06-12 15:55:18 +00:00
* Remote instantiation for proxies. (done)
2024-06-12 15:55:18 +00:00
* Implement git-annex-shell proxying to git remotes. (done)
2024-06-04 11:51:33 +00:00
2024-06-12 15:55:18 +00:00
* Proxy should update location tracking information for proxied remotes,
so it is available to other users who sync with it. (done)
2024-06-12 15:55:18 +00:00
* Would it be possible to get instantiated remotes into git remote list?
This would make eg, tab completion of remote names work. Just setting
annex-uuid would suffice, but currently any such config prevents setting
up a remote as an instantiated remote. Perhaps if only annex-uuid and no
other config is set, treat that as an instantiated remote, and overwrite
the annex-uuid config as necessaery? Or, add a config that says this is
an instanatiated remote, and when set, allow overwriting configs.
This seems better, it would let `git push proxy-foo` work, for example.
2024-06-04 11:51:33 +00:00
2024-06-12 15:55:18 +00:00
* Cycle prevention. See design.
2024-06-04 11:51:33 +00:00
2024-06-12 15:55:18 +00:00
* Make `git-annex copy --from $proxy` pick a node that contains each
file, and use the instantiated remote for getting the file. Same for
similar commands.
2024-06-04 11:51:33 +00:00
2024-06-12 15:55:18 +00:00
* Make `git-annex drop --from $proxy` drop, when possible, from every
remote accessible by the proxy. Communicate partial drops somehow.
2024-06-04 11:51:33 +00:00
2024-06-12 15:55:18 +00:00
* Let `storeKey` return a list of UUIDs where content was stored,
and make proxies accept uploads directed at them, rather than a specific
instantiated remote, and fan out the upload to whatever nodes behind
the proxy want it. This will need P2P protocol extensions.
2024-06-04 11:51:33 +00:00
2024-06-12 15:55:18 +00:00
* Make commands like `git-annex push` not iterate over instantiated
remotes, and instead just send content to the proxy for fanout.
2024-06-11 19:15:58 +00:00
2024-06-12 15:55:18 +00:00
* Optimise proxy speed. See design for ideas.
2024-06-04 11:51:33 +00:00
2024-06-12 15:55:18 +00:00
* Use `sendfile()` to avoid data copying overhead when
`receiveBytes` is being fed right into `sendBytes`.
2024-06-04 11:51:33 +00:00
2024-06-12 15:55:18 +00:00
* Encryption and chunking. See design for issues.
2024-06-04 11:51:33 +00:00
2024-06-12 15:55:18 +00:00
* indirect uploads (to be considered). See design.
* Support using a proxy when its url is a P2P address.
(Eg tor-annex remotes.)