update status and design work on proxy encryption and chunking

This commit is contained in:
Joey Hess 2024-06-07 12:35:04 -04:00
parent a0e59c1d17
commit 43ff697f25
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 49 additions and 16 deletions

View file

@ -189,24 +189,60 @@ The remote interface operates on object files stored on disk. See
[[todo/transitive_transfers]] for discussion of that problem. If proxies
get implemented, that problem should be revisited.
## chunking
When the proxy is in front of a special remote that is chunked,
where does the chunking happen? It could happen on the client, or on the
proxy.
Git remotes don't ever do chunking currently, so chunking on the client
would need changes there.
Also, a given upload via a proxy may get sent to several special remotes,
each with different chunk sizes, or perhaps some not chunked and some
chunked. For uploads to be efficient, chunking needs to happen on the proxy.
## encryption
When the proxy is in front of a special remote that uses encryption, where
does the encryption happen? It could either happen on the client before
sending to the proxy, or the proxy could do the encryption since it
communicates with the special remote. For security, doing the encryption on
the client seems like the best choice by far.
communicates with the special remote.
But, git-annex's git remotes don't currently ever do encryption. And
special remotes don't communicate via the P2P protocol with a git remote.
So none of git-annex's existing remote implementations would be able to handle
this case. Something will need to be changed in the remote
implementation for this.
If the client does not want the proxy to see unencrypted data,
they would obviously prefer encryption happens locally.
(Chunking has the same problem.)
But, the proxy could be the only thing that has access to a security key
that is used in encrypting a special remote that's located behind it.
There's a security benefit there too.
So there are kind of two different perspectives here that can have
different opinions.
Also if encryption for a special remote behind a proxy happened
client-side, and the client relied on that, nothing would stop the proxy
from replacing that encrypted special remote with an unencrypted remote.
Then the client side encryption would not happen, the user would not
notice, and the proxy could see their unencrypted content.
Of course, if a client really wanted to, they could make a special remote
that uses the remote behind the proxy as a key/value backend.
Then the client could encrypt locally.
On the implementation side, git-annex's git remotes don't currently ever do
encryption. And special remotes don't communicate via the P2P protocol with
a git remote. So none of git-annex's existing remote implementations would
be able to handle client-side encryption.
There's potentially a layering problem here, because exactly how encryption
(or chunking) works can vary depending on the type of special remote.
works can vary depending on the type of special remote.
Encrypted and chunked special remotes first chunk, then encrypt.
So it chunking happens on the proxy, encryption *must* also happen there.
So overall, it seems better to do proxy-side encryption. But it may be
worth adding a special remote that does its own client-side encryption
in front of the proxy.
## cycles

View file

@ -34,16 +34,13 @@ For June's work on [[design/passthrough_proxy]], implementation plan:
1. Add `git-annex updateproxy` command and remote.name.annex-proxy
configuration. (done)
2. Remote instantiation for proxies almost works, but fails at:
"git-annex: cannot determine uuid for origin-foo"
getRepoUUID does not look at the Repo's UUID setting, but reads it
from git-config. It's not set there for a proxied remote.
So: Add annex-uuid parsing to RemoteConfig.
2. Remote instantiation for proxies. (done)
3. Implement proxying in git-annex-shell.
4. Either implement proxying for local path remotes, or prevent
listProxied from operating on them.
4. Let `storeKey` return a list of UUIDs where content was stored,
and make proxies accept uploads directed at them, rather than a specific
instantiated remote, and fan out the upload to whatever nodes behind