tried a blind alley on streaming special remote download via proxy
This didn't work. In case I want to revisit, here's what I tried. diff --git a/Annex/Proxy.hs b/Annex/Proxy.hs index 48222872c1..e4e526d3dd 100644 --- a/Annex/Proxy.hs +++ b/Annex/Proxy.hs @@ -26,16 +26,21 @@ import Logs.UUID import Logs.Location import Utility.Tmp.Dir import Utility.Metered +import Utility.ThreadScheduler +import Utility.OpenFd import Git.Types import qualified Database.Export as Export import Control.Concurrent.STM import Control.Concurrent.Async +import Control.Concurrent.MVar import qualified Data.ByteString as B +import qualified Data.ByteString as BS import qualified Data.ByteString.Lazy as L import qualified System.FilePath.ByteString as P import qualified Data.Map as M import qualified Data.Set as S +import System.IO.Unsafe proxyRemoteSide :: ProtocolVersion -> Bypass -> Remote -> Annex RemoteSide proxyRemoteSide clientmaxversion bypass r @@ -240,21 +245,99 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go writeVerifyChunk iv h b storetofile iv h (n - fromIntegral (B.length b)) bs - proxyget offset af k = withproxytmpfile k $ \tmpfile -> do + proxyget offset af k = withproxytmpfile k $ \tmpfile -> + let retrieve = tryNonAsync $ Remote.retrieveKeyFile r k af + (fromRawFilePath tmpfile) nullMeterUpdate vc + in case fromKey keySize k of + Just size | size > 0 -> do + cancelv <- liftIO newEmptyMVar + donev <- liftIO newEmptyMVar + streamer <- liftIO $ async $ + streamdata offset tmpfile size cancelv donev + retrieve >>= \case + Right _ -> liftIO $ do + putMVar donev () + wait streamer + Left err -> liftIO $ do + putMVar cancelv () + wait streamer + propagateerror err + _ -> retrieve >>= \case + Right _ -> liftIO $ senddata offset tmpfile + Left err -> liftIO $ propagateerror err + where -- Don't verify the content from the remote, -- because the client will do its own verification. - let vc = Remote.NoVerify - tryNonAsync (Remote.retrieveKeyFile r k af (fromRawFilePath tmpfile) nullMeterUpdate vc) >>= \case - Right _ -> liftIO $ senddata offset tmpfile - Left err -> liftIO $ propagateerror err + vc = Remote.NoVerify + streamdata (Offset offset) f size cancelv donev = do + sendlen offset size + waitforfile + x <- tryNonAsync $ do + fd <- openFdWithMode f ReadOnly Nothing defaultFileFlags + h <- fdToHandle fd + hSeek h AbsoluteSeek offset + senddata' h (getcontents size) + case x of + Left err -> do + throwM err + Right res -> return res + where + -- The file doesn't exist at the start. + -- Wait for some data to be written to it as well, + -- in case an empty file is first created and then + -- overwritten. When there is an offset, wait for + -- the file to get that large. Note that this is not used + -- when the size is 0. + waitforfile = tryNonAsync (fromIntegral <$> getFileSize f) >>= \case + Right sz | sz > 0 && sz >= offset -> return () + _ -> ifM (isEmptyMVar cancelv) + ( do + threadDelaySeconds (Seconds 1) + waitforfile + , do + return () + ) + + getcontents n h = unsafeInterleaveIO $ do + isdone <- isEmptyMVar donev <||> isEmptyMVar cancelv + c <- BS.hGet h defaultChunkSize + let n' = n - fromIntegral (BS.length c) + let c' = L.fromChunks [BS.take (fromIntegral n) c] + if BS.null c + then if isdone + then return mempty + else do + -- Wait for more data to be + -- written to the file. + threadDelaySeconds (Seconds 1) + getcontents n h + else if n' > 0 + then do + -- unsafeInterleaveIO causes + -- this to be deferred until + -- data is read from the lazy + -- ByteString. + cs <- getcontents n' h + return $ L.append c' cs + else return c' + senddata (Offset offset) f = do size <- fromIntegral <$> getFileSize f - let n = max 0 (size - offset) - sendmessage $ DATA (Len n) + sendlen offset size withBinaryFile (fromRawFilePath f) ReadMode $ \h -> do hSeek h AbsoluteSeek offset - sendbs =<< L.hGetContents h + senddata' h L.hGetContents + + senddata' h getcontents = do + sendbs =<< getcontents h -- Important to keep the handle open until -- the client responds. The bytestring -- could still be lazily streaming out to @@ -272,6 +355,11 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go Just FAILURE -> return () Just _ -> giveup "protocol error" Nothing -> return () + + sendlen offset size = do + let n = max 0 (size - offset) + sendmessage $ DATA (Len n) + {- Check if this repository can proxy for a specified remote uuid, - and if so enable proxying for it. -}
This commit is contained in:
parent
b501d23f9b
commit
8baa43ee12
3 changed files with 58 additions and 0 deletions
|
@ -364,6 +364,10 @@ remote to the usual temp object file on the proxy, but without moving that
|
|||
to the annex object file at the end. As the temp object file grows, stream
|
||||
the content out via the proxy.
|
||||
|
||||
> This needs the same process to read and write the same file, which is
|
||||
> disallowed in Haskell (without going lowlevel in a way that seems
|
||||
> difficult).
|
||||
|
||||
Some special remotes will overwrite or truncate an existing temp object
|
||||
file when starting a download. So the proxy should wait until the file is
|
||||
growing to start streaming it.
|
||||
|
@ -383,6 +387,20 @@ stream downloads from such special remotes. So there will be a perhaps long
|
|||
delay before the client sees their download start. Extend the P2P protocol
|
||||
with a way to send pre-download progress perhaps?
|
||||
|
||||
> That seems pretty complicated. Alternatively, require that
|
||||
> retrieveKeyFile only writes to the file in-order. Even the bittorrent
|
||||
> special remote currently does, since it waits for the bittorrent download
|
||||
> to complete before moving the file to the destination. All other
|
||||
> special remotes built into git-annex are ok as well.
|
||||
>
|
||||
> Possibly some external special remote does not (eg maybe rclone in some
|
||||
> situation)?
|
||||
>
|
||||
> This could be handled with a special remote protocol extension that asks
|
||||
> the special remote to confirm if it retrieves in order. When a special
|
||||
> remote does not support that extension, Remote.External can just download
|
||||
> to a temp file and rename after download.
|
||||
|
||||
A simple approach for proxying uploads is to buffer the upload to the temp
|
||||
object file, and once it's complete (and hash verified), send it on to the
|
||||
special remote(s). Then delete the temp object file. This has a problem that
|
||||
|
@ -412,6 +430,16 @@ another process could open the temp file and stream it out to its client.
|
|||
But how to detect when the whole content has been received? Could check key
|
||||
size, but what about unsized keys?
|
||||
|
||||
## special remotes using P2P protocol
|
||||
|
||||
Another way to handle proxying to special remotes would be to make some
|
||||
special remotes speak the P2P protocol. Then the proxy can just proxy P2P
|
||||
protocol to them the same as it does to git-annex remotes.
|
||||
|
||||
The difficulty with this though is that encryption and chunking are
|
||||
implemented as transformations of special remotes, and would need to be
|
||||
re-implemented on top of the P2P protocol.
|
||||
|
||||
## chunking
|
||||
|
||||
When the proxy is in front of a special remote that is chunked,
|
||||
|
|
|
@ -28,6 +28,23 @@ Planned schedule of work:
|
|||
|
||||
## work notes
|
||||
|
||||
* Currently working on streaming download via proxy from special remote.
|
||||
|
||||
* Tried implementing a background thread in the proxy that runs while
|
||||
retrieving a file, to stream it out as it comes in. That failed because
|
||||
reading from a file that the same process is writing to is prevented by
|
||||
locking in haskell. (Could be gotten around by using FD rather than Handle,
|
||||
but would need to read from the FD and use packCString to make a ByteString.)
|
||||
|
||||
But also, remotes using fileRetriever retrieve to the temp object file,
|
||||
before it is renamed to the requested file. In the case of a proxy,
|
||||
that is a different file, and so it won't see the file until it's all
|
||||
been transferred and renamed.
|
||||
|
||||
* Could the P2P protocol be used as an alternate interface for a special
|
||||
remote? Would avoid needing temp files when proxying for special remotes,
|
||||
and would support resume from offset as well for special remotes for
|
||||
which that makes sense.
|
||||
|
||||
## completed items for September's work on proving behavior of preferred content
|
||||
|
||||
|
|
|
@ -0,0 +1,13 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 10"""
|
||||
date="2024-10-07T17:57:05Z"
|
||||
content="""
|
||||
Strictly speaking it's possible to do better than `git-annex copy --from --to` currently does.
|
||||
|
||||
When git-annex is used as a proxy to a P2P remote, it streams the P2P
|
||||
protocol from client to remote, and so needs no temp files.
|
||||
|
||||
So in a way, the P2P protocol is the real solution to this? Except special
|
||||
remote don't use the P2P protocol.
|
||||
"""]]
|
Loading…
Add table
Reference in a new issue