Merge branch 'master' into proxy
This commit is contained in:
commit
6568ba4904
3 changed files with 82 additions and 16 deletions
|
@ -0,0 +1,33 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 7"""
|
||||
date="2024-06-04T15:15:36Z"
|
||||
content="""
|
||||
Decoding the export.log, we have these events:
|
||||
|
||||
Tue Aug 4 13:44:10 2020 (PST): An export is run on an openneuro worker
|
||||
sending to `s3-PRIVATE`, of b78b723042e6d7a967c806b52258e8554caa1696 which
|
||||
is now lost to history. After that export completed, there was a subsequent
|
||||
started but not completed export of
|
||||
ae2937297eb1b4f6c9bfdfcf9d7a41b1adcea32e, also lost to history.
|
||||
|
||||
Fri Jan 19 21:04:26 2024: An export run on the same worker, sending to
|
||||
a `s3-PUBLIC` (not the current one, one that has been marked dead and
|
||||
forgotten), of ae2937297eb1b4f6c9bfdfcf9d7a41b1adcea32e. After that export
|
||||
completed, there was a subsequent started but not completed export of
|
||||
28b655e8207f916122bbcbd22c0369d86bb4ffc1.
|
||||
|
||||
Later the same day, an export run on the same worker, sending to
|
||||
`s3-PUBLIC` (the current one), of 28b655e8207f916122bbcbd22c0369d86bb4ffc1.
|
||||
This export completed.
|
||||
|
||||
Interesting that two exports were apparently started but left incomplete.
|
||||
This could have been because git-annex was interrupted, which would go a
|
||||
way toward confirming my analysis of this bug. But also possible
|
||||
there was a error exporting one or more files.
|
||||
|
||||
According to Nell, the git history of main was rewritten to remove a large
|
||||
file from git. The tree 28b655e8207f916122bbcbd22c0369d86bb4ffc1 appears
|
||||
to still contain the large binary file. No commit in main references it.
|
||||
It did get grafted into the git-annex branch which is why it was not lost.
|
||||
"""]]
|
|
@ -189,24 +189,60 @@ The remote interface operates on object files stored on disk. See
|
|||
[[todo/transitive_transfers]] for discussion of that problem. If proxies
|
||||
get implemented, that problem should be revisited.
|
||||
|
||||
## chunking
|
||||
|
||||
When the proxy is in front of a special remote that is chunked,
|
||||
where does the chunking happen? It could happen on the client, or on the
|
||||
proxy.
|
||||
|
||||
Git remotes don't ever do chunking currently, so chunking on the client
|
||||
would need changes there.
|
||||
|
||||
Also, a given upload via a proxy may get sent to several special remotes,
|
||||
each with different chunk sizes, or perhaps some not chunked and some
|
||||
chunked. For uploads to be efficient, chunking needs to happen on the proxy.
|
||||
|
||||
## encryption
|
||||
|
||||
When the proxy is in front of a special remote that uses encryption, where
|
||||
does the encryption happen? It could either happen on the client before
|
||||
sending to the proxy, or the proxy could do the encryption since it
|
||||
communicates with the special remote. For security, doing the encryption on
|
||||
the client seems like the best choice by far.
|
||||
communicates with the special remote.
|
||||
|
||||
But, git-annex's git remotes don't currently ever do encryption. And
|
||||
special remotes don't communicate via the P2P protocol with a git remote.
|
||||
So none of git-annex's existing remote implementations would be able to handle
|
||||
this case. Something will need to be changed in the remote
|
||||
implementation for this.
|
||||
If the client does not want the proxy to see unencrypted data,
|
||||
they would obviously prefer encryption happens locally.
|
||||
|
||||
(Chunking has the same problem.)
|
||||
But, the proxy could be the only thing that has access to a security key
|
||||
that is used in encrypting a special remote that's located behind it.
|
||||
There's a security benefit there too.
|
||||
|
||||
So there are kind of two different perspectives here that can have
|
||||
different opinions.
|
||||
|
||||
Also if encryption for a special remote behind a proxy happened
|
||||
client-side, and the client relied on that, nothing would stop the proxy
|
||||
from replacing that encrypted special remote with an unencrypted remote.
|
||||
Then the client side encryption would not happen, the user would not
|
||||
notice, and the proxy could see their unencrypted content.
|
||||
|
||||
Of course, if a client really wanted to, they could make a special remote
|
||||
that uses the remote behind the proxy as a key/value backend.
|
||||
Then the client could encrypt locally.
|
||||
|
||||
On the implementation side, git-annex's git remotes don't currently ever do
|
||||
encryption. And special remotes don't communicate via the P2P protocol with
|
||||
a git remote. So none of git-annex's existing remote implementations would
|
||||
be able to handle client-side encryption.
|
||||
|
||||
There's potentially a layering problem here, because exactly how encryption
|
||||
(or chunking) works can vary depending on the type of special remote.
|
||||
works can vary depending on the type of special remote.
|
||||
|
||||
Encrypted and chunked special remotes first chunk, then encrypt.
|
||||
So it chunking happens on the proxy, encryption *must* also happen there.
|
||||
|
||||
So overall, it seems better to do proxy-side encryption. But it may be
|
||||
worth adding a special remote that does its own client-side encryption
|
||||
in front of the proxy.
|
||||
|
||||
## cycles
|
||||
|
||||
|
|
|
@ -34,16 +34,13 @@ For June's work on [[design/passthrough_proxy]], implementation plan:
|
|||
1. Add `git-annex updateproxy` command and remote.name.annex-proxy
|
||||
configuration. (done)
|
||||
|
||||
2. Remote instantiation for proxies almost works, but fails at:
|
||||
"git-annex: cannot determine uuid for origin-foo"
|
||||
|
||||
getRepoUUID does not look at the Repo's UUID setting, but reads it
|
||||
from git-config. It's not set there for a proxied remote.
|
||||
|
||||
So: Add annex-uuid parsing to RemoteConfig.
|
||||
2. Remote instantiation for proxies. (done)
|
||||
|
||||
3. Implement proxying in git-annex-shell.
|
||||
|
||||
4. Either implement proxying for local path remotes, or prevent
|
||||
listProxied from operating on them.
|
||||
|
||||
4. Let `storeKey` return a list of UUIDs where content was stored,
|
||||
and make proxies accept uploads directed at them, rather than a specific
|
||||
instantiated remote, and fan out the upload to whatever nodes behind
|
||||
|
|
Loading…
Add table
Reference in a new issue