oops, add the new todos meant to be in prev commit

2024-10-30 14:50:24 -04:00 · 2024-10-30 14:50:24 -04:00 · 3c973aba57
commit 3c973aba57
parent 87871f724e
5 changed files with 94 additions and 0 deletions
--- a/doc/todo/assistant_does_not_use_LiveUpdate.mdwn
+++ b/doc/todo/assistant_does_not_use_LiveUpdate.mdwn
@ -0,0 +1,8 @@
+The assistant is using NoLiveUpdate, but it should be posssible to plumb
+a LiveUpdate through it from preferred content checking to location log
+updating.
+
+The benefit would be when using balanced preferred content expressions,
+the assistant would get live updates about repo sizes.
+
+(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]
--- a/doc/todo/faster_proxying.mdwn
+++ b/doc/todo/faster_proxying.mdwn
@ -0,0 +1,35 @@
+Not that proxying is super slow, but it does involve bouncing content
+through the proxy, and could be made faster. Some ideas:
+
+* A proxy to a local git repository spawns git-annex-shell 
+  to communicate with it. It would be more efficient to operate
+  directly on the Remote. Especially when transferring content to/from it.
+  But: When a cluster has several nodes that are local git repositories,
+  and is sending data to all of them, this would need an alternate
+  interface than `storeKey`, which supports streaming, of chunks
+  of a ByteString.
+
+* Use `sendfile()` to avoid data copying overhead when
+  `receiveBytes` is being fed right into `sendBytes`.
+  Library to use:
+  <https://hackage.haskell.org/package/hsyscall-0.4/docs/System-Syscall.html>
+
+* Getting a key from a cluster currently picks from amoung
+  the lowest cost nodes at random. This could be smarter,
+  eg prefer to avoid using nodes that are doing other transfers at the
+  same time.
+
+* The cost of a proxied node that is accessed via an intermediate gateway
+  is currently the same as a node accessed via the cluster gateway. So in
+  such a situation, git-annex may make a suboptimal choice of path.
+  To fix this, there needs to be some way to tell how many hops through
+  gateways it takes to get to a node. Currently the only way is to
+  guess based on number of dashes in the node name, which is not satisfying.
+
+  Even counting hops is not very satisfying, one cluster gateway could
+  be much more expensive to traverse than another one.
+
+  If seriously tackling this, it might be worth making enough information
+  available to use spanning tree protocol for routing inside clusters.
+
+(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]
--- a/doc/todo/git-remote-annex_support_for_p2phttp.mdwn
+++ b/doc/todo/git-remote-annex_support_for_p2phttp.mdwn
@ -0,0 +1,12 @@
+Should be possible to use a git-remote-annex annex::$uuid url as
+remote.foo.url with remote.foo.annexUrl using annex+http, and so
+not need a separate web server to serve the git repository when using
+`git-annex p2phttp`.
+
+Doesn't work currently because git-remote-annex urls only support
+special remotes.
+
+It would need a new form of git-remote-annex url, eg:
+annex::$uuid?annex+http://example.com/git-annex/
+
+(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]
--- a/doc/todo/proxying_for_p2phttp_and_tor-annex_remotes.mdwn
+++ b/doc/todo/proxying_for_p2phttp_and_tor-annex_remotes.mdwn
@ -0,0 +1,13 @@
+git-annex can proxy for remotes that are accessed locally or over
+ssh, as well as special remotes. But, it cannot proxy for remotes that
+themselves have a annex+http annexUrl.
+
+This would need a translation from P2P protocol to servant client.
+Should not be very hard to implement if someone needs it for some reason.
+
+Also, git-annex could support proxying to remotes whose url is a P2P
+address. Eg, tor-annex remotes. This only needs a way to 
+generate a RemoteSide for them.
+
+(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]
+
--- a/doc/todo/smarter_use_of_disk_when_proxying.mdwn
+++ b/doc/todo/smarter_use_of_disk_when_proxying.mdwn
@ -0,0 +1,26 @@
+When proxying for a special remote, downloads can stream in from it and out
+the proxy, but that does happen via a temporary file, which grows to the
+full size of the file being downloaded. And uploads to a special get buffered to a 
+temporary file.
+
+It would be nice to do full streaming without temp files, but also it's a
+hard change to make.
+
+Some improvements that could be made without making such a big change:
+
+* When an upload to a cluster is distributed to multiple special remotes,
+  a temporary file is written for each one, which may even happen in
+  parallel. This is a lot of extra work and may use excess disk space.
+  It should be possible to only write a single temp file.
+
+* Check annex.diskreserve when proxying for special remotes
+  to avoid the proxy's disk filling up with the temporary object file
+  cached there.
+
+* Resuming an interrupted download from proxied special remote makes the proxy
+  re-download the whole content. It could instead keep some of the
+  object files around when the client does not send SUCCESS. This would
+  use more disk, but could minimize to eg, the last 2 or so.
+  The [[design/passthrough_proxy]] design doc has some more thoughts about this.
+
+(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]