2024-05-01 15:04:20 +00:00
|
|
|
This is a summary todo covering several subprojects, which would extend
|
|
|
|
git-annex to be able to use proxies which sit in front of a cluster of
|
|
|
|
repositories.
|
|
|
|
|
2024-05-01 16:19:12 +00:00
|
|
|
1. [[design/passthrough_proxy]]
|
2024-05-01 19:26:51 +00:00
|
|
|
2. [[design/p2p_protocol_over_http]]
|
|
|
|
3. [[design/balanced_preferred_content]]
|
|
|
|
4. [[todo/track_free_space_in_repos_via_git-annex_branch]]
|
|
|
|
5. [[todo/proving_preferred_content_behavior]]
|
2024-05-01 15:04:20 +00:00
|
|
|
|
2024-07-01 15:38:29 +00:00
|
|
|
## table of contents
|
|
|
|
|
|
|
|
[[!toc ]]
|
|
|
|
|
|
|
|
## planned schedule
|
|
|
|
|
2024-06-04 10:53:59 +00:00
|
|
|
Joey has received funding to work on this.
|
|
|
|
Planned schedule of work:
|
|
|
|
|
2024-06-27 19:28:10 +00:00
|
|
|
* June: git-annex proxies and clusters
|
2024-07-01 15:44:54 +00:00
|
|
|
* July, part 1: p2p protocol over http
|
|
|
|
* July, part 2: git-annex proxy support for exporttree
|
2024-06-04 10:53:59 +00:00
|
|
|
* August: balanced preferred content
|
|
|
|
* September: streaming through proxy to special remotes (especially S3)
|
|
|
|
* October: proving behavior of balanced preferred content with proxies
|
|
|
|
|
2024-05-01 15:04:20 +00:00
|
|
|
[[!tag projects/openneuro]]
|
2024-06-04 11:51:33 +00:00
|
|
|
|
2024-07-01 15:38:29 +00:00
|
|
|
## work notes
|
2024-06-04 11:51:33 +00:00
|
|
|
|
2024-07-23 18:58:49 +00:00
|
|
|
* Rest of Remote.Git needs implementing.
|
|
|
|
|
|
|
|
* git-annex p2phttp needs to support https. Including serving .well-known
|
|
|
|
for ACME.
|
|
|
|
|
2024-07-22 23:50:08 +00:00
|
|
|
* A Locker should expire the lock on its own after 10 minutes,
|
|
|
|
initially. Once keeplocked is called, the expiry should end with the end
|
|
|
|
of that call.
|
2024-07-07 20:16:11 +00:00
|
|
|
|
2024-07-09 18:30:50 +00:00
|
|
|
* Make http server support proxies and clusters.
|
|
|
|
|
2024-07-23 00:59:45 +00:00
|
|
|
* `git-annex p2phttp` could support systemd socket activation. This would
|
|
|
|
allow making a systemd unit that listens on port 80.
|
|
|
|
|
2024-07-05 19:34:58 +00:00
|
|
|
* Perhaps: Support cgi program that proxies over to a webserver
|
|
|
|
speaking the http protocol.
|
2024-07-02 22:04:29 +00:00
|
|
|
|
2024-07-04 19:18:06 +00:00
|
|
|
## completed items for July's work on p2p protocol over http
|
|
|
|
|
2024-07-22 23:50:08 +00:00
|
|
|
* HTTP P2P protocol document [[design/p2p_protocol_over_http]].
|
2024-07-04 19:18:06 +00:00
|
|
|
|
2024-07-22 23:50:08 +00:00
|
|
|
* addressed [[doc/todo/P2P_locking_connection_drop_safety]]
|
2024-07-05 19:34:58 +00:00
|
|
|
|
2024-07-11 15:50:44 +00:00
|
|
|
* implemented server and client for HTTP P2P protocol
|
|
|
|
|
2024-07-22 19:02:08 +00:00
|
|
|
* added git-annex p2phttp command to serve HTTP P2P protocol
|
|
|
|
|
2024-07-23 18:58:49 +00:00
|
|
|
* Allow using annex+http urls in remote.name.annexUrl
|
|
|
|
|
2024-07-02 20:16:37 +00:00
|
|
|
## items deferred until later for [[design/passthrough_proxy]]
|
2024-06-23 13:28:18 +00:00
|
|
|
|
2024-07-01 15:29:04 +00:00
|
|
|
* Check annex.diskreserve when proxying for special remotes
|
|
|
|
to avoid the proxy's disk filling up with the temporary object file
|
|
|
|
cached there.
|
|
|
|
|
2024-06-28 19:32:00 +00:00
|
|
|
* Resuming an interrupted download from proxied special remote makes the proxy
|
|
|
|
re-download the whole content. It could instead keep some of the
|
|
|
|
object files around when the client does not send SUCCESS. This would
|
|
|
|
use more disk, but without streaming, proxying a special remote already
|
|
|
|
needs some disk. And it could minimize to eg, the last 2 or so.
|
2024-06-28 21:07:01 +00:00
|
|
|
The design doc has some more thoughts about this.
|
2024-06-18 16:07:01 +00:00
|
|
|
|
2024-06-28 19:32:00 +00:00
|
|
|
* Streaming download from proxied special remotes. See design.
|
2024-07-01 15:29:04 +00:00
|
|
|
(Planned for September)
|
2024-06-27 16:57:08 +00:00
|
|
|
|
2024-07-01 15:33:07 +00:00
|
|
|
* When an upload to a cluster is distributed to multiple special remotes,
|
|
|
|
a temporary file is written for each one, which may even happen in
|
|
|
|
parallel. This is a lot of extra work and may use excess disk space.
|
|
|
|
It should be possible to only write a single temp file.
|
|
|
|
(With streaming this won't be an issue.)
|
|
|
|
|
2024-06-27 17:40:09 +00:00
|
|
|
* Indirect uploads when proxying for special remote
|
|
|
|
(to be considered). See design.
|
|
|
|
|
2024-06-27 18:36:55 +00:00
|
|
|
* Getting a key from a cluster currently picks from amoung
|
|
|
|
the lowest cost remotes at random. This could be smarter,
|
|
|
|
eg prefer to avoid using remotes that are doing other transfers at the
|
|
|
|
same time.
|
|
|
|
|
2024-06-27 19:21:03 +00:00
|
|
|
* The cost of a proxied node that is accessed via an intermediate gateway
|
|
|
|
is currently the same as a node accessed via the cluster gateway.
|
|
|
|
To fix this, there needs to be some way to tell how many hops through
|
|
|
|
gateways it takes to get to a node. Currently the only way is to
|
|
|
|
guess based on number of dashes in the node name, which is not satisfying.
|
|
|
|
|
|
|
|
Even counting hops is not very satisfying, one cluster gateway could
|
|
|
|
be much more expensive to traverse than another one.
|
|
|
|
|
|
|
|
If seriously tackling this, it might be worth making enough information
|
|
|
|
available to use spanning tree protocol for routing inside clusters.
|
2024-06-25 21:50:22 +00:00
|
|
|
|
2024-06-12 15:55:18 +00:00
|
|
|
* Optimise proxy speed. See design for ideas.
|
2024-06-04 11:51:33 +00:00
|
|
|
|
2024-06-12 15:55:18 +00:00
|
|
|
* Use `sendfile()` to avoid data copying overhead when
|
|
|
|
`receiveBytes` is being fed right into `sendBytes`.
|
2024-06-25 21:26:26 +00:00
|
|
|
Library to use:
|
|
|
|
<https://hackage.haskell.org/package/hsyscall-0.4/docs/System-Syscall.html>
|
2024-06-04 11:51:33 +00:00
|
|
|
|
2024-06-12 15:55:18 +00:00
|
|
|
* Support using a proxy when its url is a P2P address.
|
|
|
|
(Eg tor-annex remotes.)
|
2024-06-23 16:31:00 +00:00
|
|
|
|
2024-07-01 15:38:29 +00:00
|
|
|
## completed items for June's work on [[design/passthrough_proxy]]:
|
2024-06-23 20:38:01 +00:00
|
|
|
|
|
|
|
* UUID discovery via git-annex branch. Add a log file listing UUIDs
|
|
|
|
accessible via proxy UUIDs. It also will contain the names
|
|
|
|
of the remotes that the proxy is a proxy for,
|
|
|
|
from the perspective of the proxy. (done)
|
|
|
|
|
|
|
|
* Add `git-annex updateproxy` command (done)
|
|
|
|
|
|
|
|
* Remote instantiation for proxies. (done)
|
|
|
|
|
|
|
|
* Implement git-annex-shell proxying to git remotes. (done)
|
|
|
|
|
|
|
|
* Proxy should update location tracking information for proxied remotes,
|
|
|
|
so it is available to other users who sync with it. (done)
|
|
|
|
|
2024-06-27 19:28:10 +00:00
|
|
|
* Implement `git-annex initcluster` and `git-annex updatecluster` commands (done)
|
2024-06-23 20:38:01 +00:00
|
|
|
|
|
|
|
* Implement cluster UUID insertation on location log load, and removal
|
|
|
|
on location log store. (done)
|
|
|
|
|
|
|
|
* Omit cluster UUIDs when constructing drop proofs, since lockcontent will
|
|
|
|
always fail on a cluster. (done)
|
|
|
|
|
|
|
|
* Don't count cluster UUID as a copy in numcopies checking etc. (done)
|
|
|
|
|
|
|
|
* Tab complete proxied remotes and clusters in eg --from option. (done)
|
|
|
|
|
|
|
|
* Getting a key from a cluster should proxy from one of the nodes that has
|
|
|
|
it. (done)
|
|
|
|
|
|
|
|
* Implement upload with fanout to multiple cluster nodes and reporting back
|
|
|
|
additional UUIDs over P2P protocol. (done)
|
|
|
|
|
|
|
|
* Implement cluster drops, trying to remove from all nodes, and returning
|
|
|
|
which UUIDs it was dropped from. (done)
|
|
|
|
|
|
|
|
* `git-annex testremote` works against proxied remote and cluster. (done)
|
2024-06-25 14:06:28 +00:00
|
|
|
|
|
|
|
* Avoid `git-annex sync --content` etc from operating on cluster nodes by
|
|
|
|
default since syncing with a cluster implicitly syncs with its nodes. (done)
|
2024-06-25 15:35:41 +00:00
|
|
|
|
|
|
|
* On upload to cluster, send to nodes where its preferred content, and not
|
|
|
|
to other nodes. (done)
|
2024-06-25 18:52:47 +00:00
|
|
|
|
|
|
|
* Support annex.jobs for clusters. (done)
|
|
|
|
|
2024-06-26 16:56:16 +00:00
|
|
|
* Add `git-annex extendcluster` command and extend `git-annex updatecluster`
|
|
|
|
to support clusters with multiple gateways. (done)
|
|
|
|
|
|
|
|
* Support proxying for a remote that is proxied by another gateway of
|
|
|
|
a cluster. (done)
|
2024-06-27 16:20:22 +00:00
|
|
|
|
|
|
|
* Support distributed clusters: Make a proxy for a cluster repeat
|
|
|
|
protocol messages on to any remotes that have the same UUID as
|
|
|
|
the cluster. Needs extension to P2P protocol to avoid cycles.
|
|
|
|
(done)
|
2024-06-27 19:21:03 +00:00
|
|
|
|
|
|
|
* Proxied cluster nodes should have slightly higher cost than the cluster
|
|
|
|
gateway. (done)
|
2024-06-28 19:32:00 +00:00
|
|
|
|
|
|
|
* Basic support for proxying special remotes. (But not exporttree=yes ones
|
|
|
|
yet.) (done)
|
2024-07-01 15:29:04 +00:00
|
|
|
|
|
|
|
* Tab complete remotes in all relevant commands (done)
|
|
|
|
|
|
|
|
* Display cluster and proxy information in git-annex info (done)
|