This is a summary todo covering several subprojects, which would extend git-annex to be able to use proxies which sit in front of a cluster of repositories. 1. [[design/passthrough_proxy]] 2. [[design/p2p_protocol_over_http]] 3. [[design/balanced_preferred_content]] 4. [[todo/track_free_space_in_repos_via_git-annex_branch]] 5. [[todo/proving_preferred_content_behavior]] Joey has received funding to work on this. Planned schedule of work: * June: git-annex proxy * July, part 1: git-annex proxy support for exporttree * July, part 2: p2p protocol over http * August: balanced preferred content * September: streaming through proxy to special remotes (especially S3) * October: proving behavior of balanced preferred content with proxies [[!tag projects/openneuro]] # work notes In development on the `proxy` branch. For June's work on [[design/passthrough_proxy]], implementation plan: * UUID discovery via git-annex branch. Add a log file listing UUIDs accessible via proxy UUIDs. It also will contain the names of the remotes that the proxy is a proxy for, from the perspective of the proxy. (done) * Add `git-annex updateproxy` command (done) * Remote instantiation for proxies. (done) * Implement git-annex-shell proxying to git remotes. (done) * Proxy should update location tracking information for proxied remotes, so it is available to other users who sync with it. (done) * Implement `git-annex updatecluster` command (done) * Implement cluster UUID insertation on location log load, and removal on location log store. (done) * Omit cluster UUIDs when constructing drop proofs, since lockcontent will always fail on a cluster. (done) * Don't count cluster UUID as a copy. (Including in `whereis` display.) Work in progress. fromNumCopies is sometimes used to get a number that is compared with a list of UUIDs. And limitCopies doesn't use numcopies machinery * Basic proxying to special remote support (non-streaming). * Consider getting instantiated remotes into git remote list. See design. * Implement upload with fanout and reporting back additional UUIDs over P2P protocol. * Getting a key from a cluster should proxy from one of the nodes that has it, or from the proxy repository itself if it has the key. * On upload to a cluster, as well as fanout to nodes, if the key is preferred content of the proxy repository, store it there. (But not when preferred content is not configured.) * Implement cluster drops, trying to remove from all nodes, and returning which UUIDs it was dropped from. * Support proxies-of-proxies better, eg foo-bar-baz. Currently, it does work, but have to run `git-annex updateproxy` on foo in order for it to notice the bar-baz proxied remote exists, and record it as foo-bar-baz. Make it skip recording proxies of proxies like that, and instead automatically generate those from the log. (With cycle prevention there of course.) * Cycle prevention including cluster-in-cluster cycles. See design. * Optimise proxy speed. See design for ideas. * Use `sendfile()` to avoid data copying overhead when `receiveBytes` is being fed right into `sendBytes`. * Encryption and chunking. See design for issues. * Indirect uploads (to be considered). See design. * Support using a proxy when its url is a P2P address. (Eg tor-annex remotes.)