Merge branch 'master' into proxy
This commit is contained in:
commit
6568ba4904
3 changed files with 82 additions and 16 deletions
|
@ -0,0 +1,33 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 7"""
|
||||||
|
date="2024-06-04T15:15:36Z"
|
||||||
|
content="""
|
||||||
|
Decoding the export.log, we have these events:
|
||||||
|
|
||||||
|
Tue Aug 4 13:44:10 2020 (PST): An export is run on an openneuro worker
|
||||||
|
sending to `s3-PRIVATE`, of b78b723042e6d7a967c806b52258e8554caa1696 which
|
||||||
|
is now lost to history. After that export completed, there was a subsequent
|
||||||
|
started but not completed export of
|
||||||
|
ae2937297eb1b4f6c9bfdfcf9d7a41b1adcea32e, also lost to history.
|
||||||
|
|
||||||
|
Fri Jan 19 21:04:26 2024: An export run on the same worker, sending to
|
||||||
|
a `s3-PUBLIC` (not the current one, one that has been marked dead and
|
||||||
|
forgotten), of ae2937297eb1b4f6c9bfdfcf9d7a41b1adcea32e. After that export
|
||||||
|
completed, there was a subsequent started but not completed export of
|
||||||
|
28b655e8207f916122bbcbd22c0369d86bb4ffc1.
|
||||||
|
|
||||||
|
Later the same day, an export run on the same worker, sending to
|
||||||
|
`s3-PUBLIC` (the current one), of 28b655e8207f916122bbcbd22c0369d86bb4ffc1.
|
||||||
|
This export completed.
|
||||||
|
|
||||||
|
Interesting that two exports were apparently started but left incomplete.
|
||||||
|
This could have been because git-annex was interrupted, which would go a
|
||||||
|
way toward confirming my analysis of this bug. But also possible
|
||||||
|
there was a error exporting one or more files.
|
||||||
|
|
||||||
|
According to Nell, the git history of main was rewritten to remove a large
|
||||||
|
file from git. The tree 28b655e8207f916122bbcbd22c0369d86bb4ffc1 appears
|
||||||
|
to still contain the large binary file. No commit in main references it.
|
||||||
|
It did get grafted into the git-annex branch which is why it was not lost.
|
||||||
|
"""]]
|
|
@ -189,24 +189,60 @@ The remote interface operates on object files stored on disk. See
|
||||||
[[todo/transitive_transfers]] for discussion of that problem. If proxies
|
[[todo/transitive_transfers]] for discussion of that problem. If proxies
|
||||||
get implemented, that problem should be revisited.
|
get implemented, that problem should be revisited.
|
||||||
|
|
||||||
|
## chunking
|
||||||
|
|
||||||
|
When the proxy is in front of a special remote that is chunked,
|
||||||
|
where does the chunking happen? It could happen on the client, or on the
|
||||||
|
proxy.
|
||||||
|
|
||||||
|
Git remotes don't ever do chunking currently, so chunking on the client
|
||||||
|
would need changes there.
|
||||||
|
|
||||||
|
Also, a given upload via a proxy may get sent to several special remotes,
|
||||||
|
each with different chunk sizes, or perhaps some not chunked and some
|
||||||
|
chunked. For uploads to be efficient, chunking needs to happen on the proxy.
|
||||||
|
|
||||||
## encryption
|
## encryption
|
||||||
|
|
||||||
When the proxy is in front of a special remote that uses encryption, where
|
When the proxy is in front of a special remote that uses encryption, where
|
||||||
does the encryption happen? It could either happen on the client before
|
does the encryption happen? It could either happen on the client before
|
||||||
sending to the proxy, or the proxy could do the encryption since it
|
sending to the proxy, or the proxy could do the encryption since it
|
||||||
communicates with the special remote. For security, doing the encryption on
|
communicates with the special remote.
|
||||||
the client seems like the best choice by far.
|
|
||||||
|
|
||||||
But, git-annex's git remotes don't currently ever do encryption. And
|
If the client does not want the proxy to see unencrypted data,
|
||||||
special remotes don't communicate via the P2P protocol with a git remote.
|
they would obviously prefer encryption happens locally.
|
||||||
So none of git-annex's existing remote implementations would be able to handle
|
|
||||||
this case. Something will need to be changed in the remote
|
|
||||||
implementation for this.
|
|
||||||
|
|
||||||
(Chunking has the same problem.)
|
But, the proxy could be the only thing that has access to a security key
|
||||||
|
that is used in encrypting a special remote that's located behind it.
|
||||||
|
There's a security benefit there too.
|
||||||
|
|
||||||
|
So there are kind of two different perspectives here that can have
|
||||||
|
different opinions.
|
||||||
|
|
||||||
|
Also if encryption for a special remote behind a proxy happened
|
||||||
|
client-side, and the client relied on that, nothing would stop the proxy
|
||||||
|
from replacing that encrypted special remote with an unencrypted remote.
|
||||||
|
Then the client side encryption would not happen, the user would not
|
||||||
|
notice, and the proxy could see their unencrypted content.
|
||||||
|
|
||||||
|
Of course, if a client really wanted to, they could make a special remote
|
||||||
|
that uses the remote behind the proxy as a key/value backend.
|
||||||
|
Then the client could encrypt locally.
|
||||||
|
|
||||||
|
On the implementation side, git-annex's git remotes don't currently ever do
|
||||||
|
encryption. And special remotes don't communicate via the P2P protocol with
|
||||||
|
a git remote. So none of git-annex's existing remote implementations would
|
||||||
|
be able to handle client-side encryption.
|
||||||
|
|
||||||
There's potentially a layering problem here, because exactly how encryption
|
There's potentially a layering problem here, because exactly how encryption
|
||||||
(or chunking) works can vary depending on the type of special remote.
|
works can vary depending on the type of special remote.
|
||||||
|
|
||||||
|
Encrypted and chunked special remotes first chunk, then encrypt.
|
||||||
|
So it chunking happens on the proxy, encryption *must* also happen there.
|
||||||
|
|
||||||
|
So overall, it seems better to do proxy-side encryption. But it may be
|
||||||
|
worth adding a special remote that does its own client-side encryption
|
||||||
|
in front of the proxy.
|
||||||
|
|
||||||
## cycles
|
## cycles
|
||||||
|
|
||||||
|
|
|
@ -34,16 +34,13 @@ For June's work on [[design/passthrough_proxy]], implementation plan:
|
||||||
1. Add `git-annex updateproxy` command and remote.name.annex-proxy
|
1. Add `git-annex updateproxy` command and remote.name.annex-proxy
|
||||||
configuration. (done)
|
configuration. (done)
|
||||||
|
|
||||||
2. Remote instantiation for proxies almost works, but fails at:
|
2. Remote instantiation for proxies. (done)
|
||||||
"git-annex: cannot determine uuid for origin-foo"
|
|
||||||
|
|
||||||
getRepoUUID does not look at the Repo's UUID setting, but reads it
|
|
||||||
from git-config. It's not set there for a proxied remote.
|
|
||||||
|
|
||||||
So: Add annex-uuid parsing to RemoteConfig.
|
|
||||||
|
|
||||||
3. Implement proxying in git-annex-shell.
|
3. Implement proxying in git-annex-shell.
|
||||||
|
|
||||||
|
4. Either implement proxying for local path remotes, or prevent
|
||||||
|
listProxied from operating on them.
|
||||||
|
|
||||||
4. Let `storeKey` return a list of UUIDs where content was stored,
|
4. Let `storeKey` return a list of UUIDs where content was stored,
|
||||||
and make proxies accept uploads directed at them, rather than a specific
|
and make proxies accept uploads directed at them, rather than a specific
|
||||||
instantiated remote, and fan out the upload to whatever nodes behind
|
instantiated remote, and fan out the upload to whatever nodes behind
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue