oops, add the new todos meant to be in prev commit

This commit is contained in:
Joey Hess 2024-10-30 14:50:24 -04:00
parent 87871f724e
commit 3c973aba57
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 94 additions and 0 deletions

View file

@ -0,0 +1,8 @@
The assistant is using NoLiveUpdate, but it should be posssible to plumb
a LiveUpdate through it from preferred content checking to location log
updating.
The benefit would be when using balanced preferred content expressions,
the assistant would get live updates about repo sizes.
(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]

View file

@ -0,0 +1,35 @@
Not that proxying is super slow, but it does involve bouncing content
through the proxy, and could be made faster. Some ideas:
* A proxy to a local git repository spawns git-annex-shell
to communicate with it. It would be more efficient to operate
directly on the Remote. Especially when transferring content to/from it.
But: When a cluster has several nodes that are local git repositories,
and is sending data to all of them, this would need an alternate
interface than `storeKey`, which supports streaming, of chunks
of a ByteString.
* Use `sendfile()` to avoid data copying overhead when
`receiveBytes` is being fed right into `sendBytes`.
Library to use:
<https://hackage.haskell.org/package/hsyscall-0.4/docs/System-Syscall.html>
* Getting a key from a cluster currently picks from amoung
the lowest cost nodes at random. This could be smarter,
eg prefer to avoid using nodes that are doing other transfers at the
same time.
* The cost of a proxied node that is accessed via an intermediate gateway
is currently the same as a node accessed via the cluster gateway. So in
such a situation, git-annex may make a suboptimal choice of path.
To fix this, there needs to be some way to tell how many hops through
gateways it takes to get to a node. Currently the only way is to
guess based on number of dashes in the node name, which is not satisfying.
Even counting hops is not very satisfying, one cluster gateway could
be much more expensive to traverse than another one.
If seriously tackling this, it might be worth making enough information
available to use spanning tree protocol for routing inside clusters.
(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]

View file

@ -0,0 +1,12 @@
Should be possible to use a git-remote-annex annex::$uuid url as
remote.foo.url with remote.foo.annexUrl using annex+http, and so
not need a separate web server to serve the git repository when using
`git-annex p2phttp`.
Doesn't work currently because git-remote-annex urls only support
special remotes.
It would need a new form of git-remote-annex url, eg:
annex::$uuid?annex+http://example.com/git-annex/
(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]

View file

@ -0,0 +1,13 @@
git-annex can proxy for remotes that are accessed locally or over
ssh, as well as special remotes. But, it cannot proxy for remotes that
themselves have a annex+http annexUrl.
This would need a translation from P2P protocol to servant client.
Should not be very hard to implement if someone needs it for some reason.
Also, git-annex could support proxying to remotes whose url is a P2P
address. Eg, tor-annex remotes. This only needs a way to
generate a RemoteSide for them.
(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]

View file

@ -0,0 +1,26 @@
When proxying for a special remote, downloads can stream in from it and out
the proxy, but that does happen via a temporary file, which grows to the
full size of the file being downloaded. And uploads to a special get buffered to a
temporary file.
It would be nice to do full streaming without temp files, but also it's a
hard change to make.
Some improvements that could be made without making such a big change:
* When an upload to a cluster is distributed to multiple special remotes,
a temporary file is written for each one, which may even happen in
parallel. This is a lot of extra work and may use excess disk space.
It should be possible to only write a single temp file.
* Check annex.diskreserve when proxying for special remotes
to avoid the proxy's disk filling up with the temporary object file
cached there.
* Resuming an interrupted download from proxied special remote makes the proxy
re-download the whole content. It could instead keep some of the
object files around when the client does not send SUCCESS. This would
use more disk, but could minimize to eg, the last 2 or so.
The [[design/passthrough_proxy]] design doc has some more thoughts about this.
(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]