diff --git a/doc/design/passthrough_proxy.mdwn b/doc/design/passthrough_proxy.mdwn index 01720beba7..578b79a427 100644 --- a/doc/design/passthrough_proxy.mdwn +++ b/doc/design/passthrough_proxy.mdwn @@ -364,6 +364,10 @@ remote to the usual temp object file on the proxy, but without moving that to the annex object file at the end. As the temp object file grows, stream the content out via the proxy. +> This needs the same process to read and write the same file, which is +> disallowed in Haskell (without going lowlevel in a way that seems +> difficult). + Some special remotes will overwrite or truncate an existing temp object file when starting a download. So the proxy should wait until the file is growing to start streaming it. @@ -383,6 +387,20 @@ stream downloads from such special remotes. So there will be a perhaps long delay before the client sees their download start. Extend the P2P protocol with a way to send pre-download progress perhaps? +> That seems pretty complicated. Alternatively, require that +> retrieveKeyFile only writes to the file in-order. Even the bittorrent +> special remote currently does, since it waits for the bittorrent download +> to complete before moving the file to the destination. All other +> special remotes built into git-annex are ok as well. +> +> Possibly some external special remote does not (eg maybe rclone in some +> situation)? +> +> This could be handled with a special remote protocol extension that asks +> the special remote to confirm if it retrieves in order. When a special +> remote does not support that extension, Remote.External can just download +> to a temp file and rename after download. + A simple approach for proxying uploads is to buffer the upload to the temp object file, and once it's complete (and hash verified), send it on to the special remote(s). Then delete the temp object file. This has a problem that @@ -412,6 +430,16 @@ another process could open the temp file and stream it out to its client. But how to detect when the whole content has been received? Could check key size, but what about unsized keys? +## special remotes using P2P protocol + +Another way to handle proxying to special remotes would be to make some +special remotes speak the P2P protocol. Then the proxy can just proxy P2P +protocol to them the same as it does to git-annex remotes. + +The difficulty with this though is that encryption and chunking are +implemented as transformations of special remotes, and would need to be +re-implemented on top of the P2P protocol. + ## chunking When the proxy is in front of a special remote that is chunked, diff --git a/doc/todo/git-annex_proxies.mdwn b/doc/todo/git-annex_proxies.mdwn index cb13f9b5f1..0b78d852c8 100644 --- a/doc/todo/git-annex_proxies.mdwn +++ b/doc/todo/git-annex_proxies.mdwn @@ -28,6 +28,23 @@ Planned schedule of work: ## work notes +* Currently working on streaming download via proxy from special remote. + +* Tried implementing a background thread in the proxy that runs while + retrieving a file, to stream it out as it comes in. That failed because + reading from a file that the same process is writing to is prevented by + locking in haskell. (Could be gotten around by using FD rather than Handle, + but would need to read from the FD and use packCString to make a ByteString.) + + But also, remotes using fileRetriever retrieve to the temp object file, + before it is renamed to the requested file. In the case of a proxy, + that is a different file, and so it won't see the file until it's all + been transferred and renamed. + +* Could the P2P protocol be used as an alternate interface for a special + remote? Would avoid needing temp files when proxying for special remotes, + and would support resume from offset as well for special remotes for + which that makes sense. ## completed items for September's work on proving behavior of preferred content diff --git a/doc/todo/transitive_transfers/comment_10_0cbd6ef5c09734a181d165c0d5e2f123._comment b/doc/todo/transitive_transfers/comment_10_0cbd6ef5c09734a181d165c0d5e2f123._comment new file mode 100644 index 0000000000..38a3decad3 --- /dev/null +++ b/doc/todo/transitive_transfers/comment_10_0cbd6ef5c09734a181d165c0d5e2f123._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 10""" + date="2024-10-07T17:57:05Z" + content=""" +Strictly speaking it's possible to do better than `git-annex copy --from --to` currently does. + +When git-annex is used as a proxy to a P2P remote, it streams the P2P +protocol from client to remote, and so needs no temp files. + +So in a way, the P2P protocol is the real solution to this? Except special +remote don't use the P2P protocol. +"""]]