From 43ff697f2513a3a2cab85008fe64646e084c3b07 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 7 Jun 2024 12:35:04 -0400 Subject: [PATCH] update status and design work on proxy encryption and chunking --- doc/design/passthrough_proxy.mdwn | 54 +++++++++++++++++++++++++------ doc/todo/git-annex_proxies.mdwn | 11 +++---- 2 files changed, 49 insertions(+), 16 deletions(-) diff --git a/doc/design/passthrough_proxy.mdwn b/doc/design/passthrough_proxy.mdwn index e943363369..76e6c2cc18 100644 --- a/doc/design/passthrough_proxy.mdwn +++ b/doc/design/passthrough_proxy.mdwn @@ -189,24 +189,60 @@ The remote interface operates on object files stored on disk. See [[todo/transitive_transfers]] for discussion of that problem. If proxies get implemented, that problem should be revisited. +## chunking + +When the proxy is in front of a special remote that is chunked, +where does the chunking happen? It could happen on the client, or on the +proxy. + +Git remotes don't ever do chunking currently, so chunking on the client +would need changes there. + +Also, a given upload via a proxy may get sent to several special remotes, +each with different chunk sizes, or perhaps some not chunked and some +chunked. For uploads to be efficient, chunking needs to happen on the proxy. + ## encryption When the proxy is in front of a special remote that uses encryption, where does the encryption happen? It could either happen on the client before sending to the proxy, or the proxy could do the encryption since it -communicates with the special remote. For security, doing the encryption on -the client seems like the best choice by far. +communicates with the special remote. -But, git-annex's git remotes don't currently ever do encryption. And -special remotes don't communicate via the P2P protocol with a git remote. -So none of git-annex's existing remote implementations would be able to handle -this case. Something will need to be changed in the remote -implementation for this. +If the client does not want the proxy to see unencrypted data, +they would obviously prefer encryption happens locally. -(Chunking has the same problem.) +But, the proxy could be the only thing that has access to a security key +that is used in encrypting a special remote that's located behind it. +There's a security benefit there too. + +So there are kind of two different perspectives here that can have +different opinions. + +Also if encryption for a special remote behind a proxy happened +client-side, and the client relied on that, nothing would stop the proxy +from replacing that encrypted special remote with an unencrypted remote. +Then the client side encryption would not happen, the user would not +notice, and the proxy could see their unencrypted content. + +Of course, if a client really wanted to, they could make a special remote +that uses the remote behind the proxy as a key/value backend. +Then the client could encrypt locally. + +On the implementation side, git-annex's git remotes don't currently ever do +encryption. And special remotes don't communicate via the P2P protocol with +a git remote. So none of git-annex's existing remote implementations would +be able to handle client-side encryption. There's potentially a layering problem here, because exactly how encryption -(or chunking) works can vary depending on the type of special remote. +works can vary depending on the type of special remote. + +Encrypted and chunked special remotes first chunk, then encrypt. +So it chunking happens on the proxy, encryption *must* also happen there. + +So overall, it seems better to do proxy-side encryption. But it may be +worth adding a special remote that does its own client-side encryption +in front of the proxy. ## cycles diff --git a/doc/todo/git-annex_proxies.mdwn b/doc/todo/git-annex_proxies.mdwn index 90dc9c614d..2e8bad27cd 100644 --- a/doc/todo/git-annex_proxies.mdwn +++ b/doc/todo/git-annex_proxies.mdwn @@ -34,16 +34,13 @@ For June's work on [[design/passthrough_proxy]], implementation plan: 1. Add `git-annex updateproxy` command and remote.name.annex-proxy configuration. (done) -2. Remote instantiation for proxies almost works, but fails at: - "git-annex: cannot determine uuid for origin-foo" - - getRepoUUID does not look at the Repo's UUID setting, but reads it - from git-config. It's not set there for a proxied remote. - - So: Add annex-uuid parsing to RemoteConfig. +2. Remote instantiation for proxies. (done) 3. Implement proxying in git-annex-shell. +4. Either implement proxying for local path remotes, or prevent + listProxied from operating on them. + 4. Let `storeKey` return a list of UUIDs where content was stored, and make proxies accept uploads directed at them, rather than a specific instantiated remote, and fan out the upload to whatever nodes behind