2024-05-01 15:04:20 +00:00
|
|
|
This is a summary todo covering several subprojects, which would extend
|
|
|
|
git-annex to be able to use proxies which sit in front of a cluster of
|
|
|
|
repositories.
|
|
|
|
|
2024-05-01 16:19:12 +00:00
|
|
|
1. [[design/passthrough_proxy]]
|
2024-05-01 19:26:51 +00:00
|
|
|
2. [[design/p2p_protocol_over_http]]
|
|
|
|
3. [[design/balanced_preferred_content]]
|
|
|
|
4. [[todo/track_free_space_in_repos_via_git-annex_branch]]
|
|
|
|
5. [[todo/proving_preferred_content_behavior]]
|
2024-05-01 15:04:20 +00:00
|
|
|
|
2024-06-04 10:53:59 +00:00
|
|
|
Joey has received funding to work on this.
|
|
|
|
Planned schedule of work:
|
|
|
|
|
|
|
|
* June: git-annex proxy
|
|
|
|
* July, part 1: git-annex proxy support for exporttree
|
|
|
|
* July, part 2: p2p protocol over http
|
|
|
|
* August: balanced preferred content
|
|
|
|
* September: streaming through proxy to special remotes (especially S3)
|
|
|
|
* October: proving behavior of balanced preferred content with proxies
|
|
|
|
|
2024-05-01 15:04:20 +00:00
|
|
|
[[!tag projects/openneuro]]
|
2024-06-04 11:51:33 +00:00
|
|
|
|
|
|
|
# work notes
|
|
|
|
|
2024-06-04 18:55:54 +00:00
|
|
|
In development on the `proxy` branch.
|
|
|
|
|
2024-06-23 20:38:01 +00:00
|
|
|
For June's work on [[design/passthrough_proxy]], remaining todos:
|
2024-06-23 13:28:18 +00:00
|
|
|
|
2024-06-23 20:22:39 +00:00
|
|
|
* Getting a key from a cluster currently always selects the lowest cost
|
|
|
|
remote, and always the same remote if cost is the same. Should
|
|
|
|
round-robin amoung remotes, and prefer to avoid using remotes that
|
|
|
|
other git-annex processes are currently using.
|
|
|
|
|
2024-06-23 13:28:18 +00:00
|
|
|
* Support annex.jobs for clusters.
|
|
|
|
|
2024-06-18 16:07:01 +00:00
|
|
|
* Basic proxying to special remote support (non-streaming).
|
|
|
|
|
2024-06-12 17:52:17 +00:00
|
|
|
* Support proxies-of-proxies better, eg foo-bar-baz.
|
|
|
|
Currently, it does work, but have to run `git-annex updateproxy`
|
|
|
|
on foo in order for it to notice the bar-baz proxied remote exists,
|
|
|
|
and record it as foo-bar-baz. Make it skip recording proxies of
|
|
|
|
proxies like that, and instead automatically generate those from the log.
|
|
|
|
(With cycle prevention there of course.)
|
|
|
|
|
2024-06-14 19:23:43 +00:00
|
|
|
* Cycle prevention including cluster-in-cluster cycles. See design.
|
2024-06-04 11:51:33 +00:00
|
|
|
|
2024-06-12 15:55:18 +00:00
|
|
|
* Optimise proxy speed. See design for ideas.
|
2024-06-04 11:51:33 +00:00
|
|
|
|
2024-06-12 15:55:18 +00:00
|
|
|
* Use `sendfile()` to avoid data copying overhead when
|
|
|
|
`receiveBytes` is being fed right into `sendBytes`.
|
2024-06-04 11:51:33 +00:00
|
|
|
|
2024-06-12 15:55:18 +00:00
|
|
|
* Encryption and chunking. See design for issues.
|
2024-06-04 11:51:33 +00:00
|
|
|
|
2024-06-12 18:45:39 +00:00
|
|
|
* Indirect uploads (to be considered). See design.
|
2024-06-12 15:55:18 +00:00
|
|
|
|
|
|
|
* Support using a proxy when its url is a P2P address.
|
|
|
|
(Eg tor-annex remotes.)
|
2024-06-23 16:31:00 +00:00
|
|
|
|
2024-06-23 20:22:39 +00:00
|
|
|
* `viconfig` support for setting preferred content, group,
|
|
|
|
and description of clusters
|
2024-06-23 20:38:01 +00:00
|
|
|
|
|
|
|
|
|
|
|
# completed items for June's work on [[design/passthrough_proxy]]:
|
|
|
|
|
|
|
|
* UUID discovery via git-annex branch. Add a log file listing UUIDs
|
|
|
|
accessible via proxy UUIDs. It also will contain the names
|
|
|
|
of the remotes that the proxy is a proxy for,
|
|
|
|
from the perspective of the proxy. (done)
|
|
|
|
|
|
|
|
* Add `git-annex updateproxy` command (done)
|
|
|
|
|
|
|
|
* Remote instantiation for proxies. (done)
|
|
|
|
|
|
|
|
* Implement git-annex-shell proxying to git remotes. (done)
|
|
|
|
|
|
|
|
* Proxy should update location tracking information for proxied remotes,
|
|
|
|
so it is available to other users who sync with it. (done)
|
|
|
|
|
|
|
|
* Implement `git-annex updatecluster` command (done)
|
|
|
|
|
|
|
|
* Implement cluster UUID insertation on location log load, and removal
|
|
|
|
on location log store. (done)
|
|
|
|
|
|
|
|
* Omit cluster UUIDs when constructing drop proofs, since lockcontent will
|
|
|
|
always fail on a cluster. (done)
|
|
|
|
|
|
|
|
* Don't count cluster UUID as a copy in numcopies checking etc. (done)
|
|
|
|
|
|
|
|
* Tab complete proxied remotes and clusters in eg --from option. (done)
|
|
|
|
|
|
|
|
* Getting a key from a cluster should proxy from one of the nodes that has
|
|
|
|
it. (done)
|
|
|
|
|
|
|
|
* Implement upload with fanout to multiple cluster nodes and reporting back
|
|
|
|
additional UUIDs over P2P protocol. (done)
|
|
|
|
|
|
|
|
* Implement cluster drops, trying to remove from all nodes, and returning
|
|
|
|
which UUIDs it was dropped from. (done)
|
|
|
|
|
|
|
|
* `git-annex testremote` works against proxied remote and cluster. (done)
|
2024-06-25 14:06:28 +00:00
|
|
|
|
|
|
|
* Avoid `git-annex sync --content` etc from operating on cluster nodes by
|
|
|
|
default since syncing with a cluster implicitly syncs with its nodes. (done)
|
2024-06-25 15:35:41 +00:00
|
|
|
|
|
|
|
* On upload to cluster, send to nodes where its preferred content, and not
|
|
|
|
to other nodes. (done)
|