This is a summary todo covering several subprojects, which extend git-annex to be able to use proxies which sit in front of a cluster of repositories. 1. [[design/passthrough_proxy]] 2. [[design/p2p_protocol_over_http]] 3. [[design/balanced_preferred_content]] 4. [[todo/track_free_space_in_repos_via_git-annex_branch]] 5. [[todo/proving_preferred_content_behavior]] ## table of contents [[!toc ]] ## plan Joey has received funding to work on this. Planned schedule of work: * June: git-annex proxies and clusters * July: p2p protocol over http * August, part 1: git-annex proxy support for exporttree * August, part 2: balanced preferred content * September: proving behavior of balanced preferred content with proxies * October: streaming through proxy to special remotes (especially S3) > This project is now complete! [[done]] --[[Joey]] [[!tag projects/openneuro]] ## some todos that spun off from this project and didn't get implemented during it: For balanced preferred content and maxsize tracking: * [[todo/assistant_does_not_use_LiveUpdate]] * [[todo/git-annex_info_with_limit_overcounts]] For p2p protocol over http: * [[p2phttp_serve_multiple_repositories]] * [[git-remote-annex_support_for_p2phttp]] For proxying: * [[proxying_for_p2phttp_and_tor-annex_remotes]] * [[faster_proxying]] * [[smarter_use_of_disk_when_proxying]] ## completed items for October's work on streaming through proxy to special remotes * Stream downloads through proxy for all special remotes that indicate they download in order. * Added ORDERED message to external special remote protocol. * Added DATA-PRESENT and documented in [[tips/client_side_upload_to_a_special_remote]] ## completed items for September's work on proving behavior of preferred content * Static analysis to detect "not present", "not balanced", and similar unstable preferred content expressions and avoid problems with them. * Implemented `git-annex sim` command. * Simulated a variety of repository networks, and random preferred content expressions, checking that a stable state is always reached. * Fix bug that prevented anything being stored in an empty repository whose preferred content expression uses sizebalanced. (Identified via `git-annex sim`) ## completed items for August's work on balanced preferred content * Balanced preferred content basic implementation, including --rebalance option. * Implemented [[track_free_space_in_repos_via_git-annex_branch]] * Implemented tracking of live changes to repository sizes. * `git-annex maxsize` * annex.fullybalancedthreshhold ## completed items for August's work on git-annex proxy support for exporttre * Special remotes configured with exporttree=yes annexobjects=yes can store objects in .git/annex/objects, as well as an exported tree. * Support proxying to special remotes configured with exporttree=yes annexobjects=yes. * post-retrieve: When proxying is enabled for an exporttree=yes special remote and the configured remote.name.annex-tracking-branch is received, the tree is exported to the special remote. * When getting from a P2P HTTP remote, prompt for credentials when required, instead of failing. * Prevent `updateproxy` and `updatecluster` from adding an exporttree=yes special remote that does not have annexobjects=yes, to avoid foot shooting. * Implement `git-annex export treeish --to=foo --from=bar`, which gets from bar as needed to send to foo. Make post-retrieve use `--to=r --from=r` to handle the multiple files case. ## completed items for July's work on p2p protocol over http * HTTP P2P protocol design [[design/p2p_protocol_over_http]]. * addressed [[doc/todo/P2P_locking_connection_drop_safety]] * implemented server and client for HTTP P2P protocol * added git-annex p2phttp command to serve HTTP P2P protocol * Make git-annex p2phttp support https. * Allow using annex+http urls in remote.name.annexUrl * Make http server support proxying. * Make http server support serving a cluster. ## completed items for June's work on [[design/passthrough_proxy]]: * UUID discovery via git-annex branch. Add a log file listing UUIDs accessible via proxy UUIDs. It also will contain the names of the remotes that the proxy is a proxy for, from the perspective of the proxy. (done) * Add `git-annex updateproxy` command (done) * Remote instantiation for proxies. (done) * Implement git-annex-shell proxying to git remotes. (done) * Proxy should update location tracking information for proxied remotes, so it is available to other users who sync with it. (done) * Implement `git-annex initcluster` and `git-annex updatecluster` commands (done) * Implement cluster UUID insertation on location log load, and removal on location log store. (done) * Omit cluster UUIDs when constructing drop proofs, since lockcontent will always fail on a cluster. (done) * Don't count cluster UUID as a copy in numcopies checking etc. (done) * Tab complete proxied remotes and clusters in eg --from option. (done) * Getting a key from a cluster should proxy from one of the nodes that has it. (done) * Implement upload with fanout to multiple cluster nodes and reporting back additional UUIDs over P2P protocol. (done) * Implement cluster drops, trying to remove from all nodes, and returning which UUIDs it was dropped from. (done) * `git-annex testremote` works against proxied remote and cluster. (done) * Avoid `git-annex sync --content` etc from operating on cluster nodes by default since syncing with a cluster implicitly syncs with its nodes. (done) * On upload to cluster, send to nodes where its preferred content, and not to other nodes. (done) * Support annex.jobs for clusters. (done) * Add `git-annex extendcluster` command and extend `git-annex updatecluster` to support clusters with multiple gateways. (done) * Support proxying for a remote that is proxied by another gateway of a cluster. (done) * Support distributed clusters: Make a proxy for a cluster repeat protocol messages on to any remotes that have the same UUID as the cluster. Needs extension to P2P protocol to avoid cycles. (done) * Proxied cluster nodes should have slightly higher cost than the cluster gateway. (done) * Basic support for proxying special remotes. (But not exporttree=yes ones yet.) (done) * Tab complete remotes in all relevant commands (done) * Display cluster and proxy information in git-annex info (done)