Commit graph

45794 commits

Author SHA1 Message Date
Joey Hess
f920d90781
smaller delay in proxy streamer
A one second delay made it seem really choppy and slow when the special
remote was sending content fairly steadily but was bottlenecked on
running gpg on 10 mb chunks.

This does not appreciably increase CPU, although of course if the
special remote is very slow it will add up over time.

It would perhaps be better to use inotify, like tailVerify does.
2024-10-15 14:45:19 -04:00
Joey Hess
835283b862
stream through proxy when using fileRetriever
The problem was that when the proxy requests a key be retrieved to its
own temp file, fileRetriever was retriving it to the key's temp
location, and then moving it at the end, which broke streaming.

So, plumb through the path where the key is being retrieved to.
2024-10-15 14:29:06 -04:00
Joey Hess
54fcc2ec51
fix logic error 2024-10-15 14:28:47 -04:00
yarikoptic
6eb32468bc initial report on not all files being saved 2024-10-15 18:15:16 +00:00
Joey Hess
9e8bbb3aac
remove change that was accidentially committed 2024-10-15 13:30:52 -04:00
Joey Hess
c6c794a27d
comment 2024-10-15 13:27:27 -04:00
Joey Hess
c1b0348307
Merge branch 'master' of ssh://git-annex.branchable.com 2024-10-15 12:27:01 -04:00
Joey Hess
930c078965
working in streamproxy branch 2024-10-15 12:26:53 -04:00
Joey Hess
edaed18e4c
Sped up proxied downloads from special remotes, by streaming
Currently works for special remotes that don't use fileRetriever. Ones that
do will download to another filename and rename it into place, defeating
the streaming.

This actually benchmarks slightly slower when getting a large file from
a fast proxied special remote. However, when the proxied special remote
is slow, it will be a big win.
2024-10-15 12:25:15 -04:00
Joey Hess
76a1989a0e
implement openFileBeingWritten
This bypasses the usual haskell file locking used to prevent opening a
file for read that is being written to.

This is unfortunately a bit of a hack. But it seems fairly unlikely to
get broken by changes to ghc. I hope. Using fdToHandle' will also work.

This does not work on windows because it uses openFd from posix. It
would probably be possible to implement it for windows too, just opening
the FD using the Win32 library instead. However, whether windows will
allow reading from a file that is also being written to I don't know,
and since in the git-annex case the writer could be another process (eg
external special remote), that might be doing its own locking in
windows, that seems a can of worms I'd prefer not to open.
2024-10-15 11:56:42 -04:00
Joey Hess
57ac43e4f1
update 2024-10-15 10:31:42 -04:00
matrss
c7155366c7 Added a comment 2024-10-14 12:21:54 +00:00
matrss
872d97eb2a Added a comment 2024-10-14 12:01:40 +00:00
Joey Hess
9574e3a8bb
Merge branch 'master' of ssh://git-annex.branchable.com 2024-10-12 10:57:52 -04:00
Spencer
10b6539174 Added a comment: [FR] Remote Settings for All Clones 2024-10-09 23:10:17 +00:00
annex@9cc004f218c318a28099ff2645959be0fcbc6d94
4404cc4c8b Added a comment: Support for importtree 2024-10-09 06:42:40 +00:00
matrss
4e5dcf4207 2024-10-08 07:22:39 +00:00
Spencer
cbc88a878f 2024-10-08 03:58:55 +00:00
Spencer
0f7ba08e95 2024-10-08 03:31:05 +00:00
Spencer
d87d725b1c Correction: rclonelayout=lower is not synonymous with the directory remote, directory is. 2024-10-07 21:26:42 +00:00
Spencer
ae09255c05 Added a comment: How to Clone? 2024-10-07 20:00:24 +00:00
Joey Hess
8baa43ee12
tried a blind alley on streaming special remote download via proxy
This didn't work. In case I want to revisit, here's what I tried.

diff --git a/Annex/Proxy.hs b/Annex/Proxy.hs
index 48222872c1..e4e526d3dd 100644
--- a/Annex/Proxy.hs
+++ b/Annex/Proxy.hs
@@ -26,16 +26,21 @@ import Logs.UUID
 import Logs.Location
 import Utility.Tmp.Dir
 import Utility.Metered
+import Utility.ThreadScheduler
+import Utility.OpenFd
 import Git.Types
 import qualified Database.Export as Export

 import Control.Concurrent.STM
 import Control.Concurrent.Async
+import Control.Concurrent.MVar
 import qualified Data.ByteString as B
+import qualified Data.ByteString as BS
 import qualified Data.ByteString.Lazy as L
 import qualified System.FilePath.ByteString as P
 import qualified Data.Map as M
 import qualified Data.Set as S
+import System.IO.Unsafe

 proxyRemoteSide :: ProtocolVersion -> Bypass -> Remote -> Annex RemoteSide
 proxyRemoteSide clientmaxversion bypass r
@@ -240,21 +245,99 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go
 		writeVerifyChunk iv h b
 		storetofile iv h (n - fromIntegral (B.length b)) bs

-	proxyget offset af k = withproxytmpfile k $ \tmpfile -> do
+	proxyget offset af k = withproxytmpfile k $ \tmpfile ->
+		let retrieve = tryNonAsync $ Remote.retrieveKeyFile r k af
+			(fromRawFilePath tmpfile) nullMeterUpdate vc
+		in case fromKey keySize k of
+			Just size | size > 0 -> do
+				cancelv <- liftIO newEmptyMVar
+				donev <- liftIO newEmptyMVar
+				streamer <- liftIO $ async $
+					streamdata offset tmpfile size cancelv donev
+				retrieve >>= \case
+					Right _ -> liftIO $ do
+						putMVar donev ()
+						wait streamer
+					Left err -> liftIO $ do
+						putMVar cancelv ()
+						wait streamer
+						propagateerror err
+			_ -> retrieve >>= \case
+				Right _ -> liftIO $ senddata offset tmpfile
+				Left err -> liftIO $ propagateerror err
+	  where
 		-- Don't verify the content from the remote,
 		-- because the client will do its own verification.
-		let vc = Remote.NoVerify
-		tryNonAsync (Remote.retrieveKeyFile r k af (fromRawFilePath tmpfile) nullMeterUpdate vc) >>= \case
-			Right _ -> liftIO $ senddata offset tmpfile
-			Left err -> liftIO $ propagateerror err
+		vc = Remote.NoVerify

+	streamdata (Offset offset) f size cancelv donev = do
+		sendlen offset size
+		waitforfile
+		x <- tryNonAsync $ do
+			fd <- openFdWithMode f ReadOnly Nothing defaultFileFlags
+			h <- fdToHandle fd
+			hSeek h AbsoluteSeek offset
+			senddata' h (getcontents size)
+		case x of
+			Left err -> do
+				throwM err
+			Right res -> return res
+	  where
+		-- The file doesn't exist at the start.
+		-- Wait for some data to be written to it as well,
+		-- in case an empty file is first created and then
+		-- overwritten. When there is an offset, wait for
+		-- the file to get that large. Note that this is not used
+		-- when the size is 0.
+		waitforfile = tryNonAsync (fromIntegral <$> getFileSize f) >>= \case
+			Right sz | sz > 0 && sz >= offset -> return ()
+			_ -> ifM (isEmptyMVar cancelv)
+				( do
+					threadDelaySeconds (Seconds 1)
+					waitforfile
+				, do
+					return ()
+				)
+
+		getcontents n h = unsafeInterleaveIO $ do
+			isdone <- isEmptyMVar donev <||> isEmptyMVar cancelv
+			c <- BS.hGet h defaultChunkSize
+			let n' = n - fromIntegral (BS.length c)
+			let c' = L.fromChunks [BS.take (fromIntegral n) c]
+			if BS.null c
+				then if isdone
+					then return mempty
+					else do
+						-- Wait for more data to be
+						-- written to the file.
+						threadDelaySeconds (Seconds 1)
+						getcontents n h
+				else if n' > 0
+					then do
+						-- unsafeInterleaveIO causes
+						-- this to be deferred until
+						-- data is read from the lazy
+						-- ByteString.
+						cs <- getcontents n' h
+						return $ L.append c' cs
+					else return c'
+
 	senddata (Offset offset) f = do
 		size <- fromIntegral <$> getFileSize f
-		let n = max 0 (size - offset)
-		sendmessage $ DATA (Len n)
+		sendlen offset size
 		withBinaryFile (fromRawFilePath f) ReadMode $ \h -> do
 			hSeek h AbsoluteSeek offset
-			sendbs =<< L.hGetContents h
+			senddata' h L.hGetContents
+
+	senddata' h getcontents = do
+			sendbs =<< getcontents h
 			-- Important to keep the handle open until
 			-- the client responds. The bytestring
 			-- could still be lazily streaming out to
@@ -272,6 +355,11 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go
 				Just FAILURE -> return ()
 				Just _ -> giveup "protocol error"
 				Nothing -> return ()
+
+	sendlen offset size = do
+		let n = max 0 (size - offset)
+		sendmessage $ DATA (Len n)
+

 {- Check if this repository can proxy for a specified remote uuid,
  - and if so enable proxying for it. -}
2024-10-07 15:12:09 -04:00
Spencer
cb196337f4 additional question of spaces in URL 2024-10-07 19:10:19 +00:00
Spencer
abd56608cf 2024-10-07 19:02:17 +00:00
matrss
f650627b23 2024-10-07 14:40:19 +00:00
matrss
b0a6301cde Added a comment 2024-10-07 14:12:23 +00:00
Joey Hess
b501d23f9b
update 2024-10-07 10:06:12 -04:00
matrss
6b6ec39997 2024-10-07 13:59:56 +00:00
sng@353ca358075d9aa328f60a5439a3cee10f8301fe
b57677251b Added a comment 2024-10-06 21:42:13 +00:00
matrss
19f7b0e7d4 2024-10-02 15:07:54 +00:00
matrss
470bd1f441 2024-10-02 14:51:58 +00:00
matrss
4a794ce0ba 2024-10-02 14:42:37 +00:00
yarikoptic
13580427c8 filing an issue on yt-dlp not used for some reason 2024-10-01 21:01:40 +00:00
Joey Hess
f3403e9691
add news item for git-annex 10.20240927 2024-09-30 19:16:06 -04:00
Joey Hess
fca26db22b
releasing package git-annex version 10.20240927 2024-09-30 19:15:57 -04:00
Joey Hess
3d7f94ea39
Merge branch 'master' of ssh://git-annex.branchable.com 2024-09-30 17:36:45 -04:00
Joey Hess
743690d022
fix build with old random
getStdGen used to be an IO not a MonadIO action
2024-09-30 17:36:19 -04:00
brendan.ward@a2e11ad27f6b2fa2c556aea6811496e0d95dd0da
191e84d82a 2024-09-30 20:54:14 +00:00
Joey Hess
d2ad07f5a3
fix build with random-1.2
getStdGen worked with that version but initStdGen is newer. For our
purposes, they are equivilant.
2024-09-30 14:56:06 -04:00
Joey Hess
75b3f0eb75
fix build with old base
i386ancient has a base too old for NE.singleton
2024-09-30 11:02:08 -04:00
Joey Hess
1d8bf92724
Merge branch 'master' of ssh://git-annex.branchable.com 2024-09-27 15:31:31 -04:00
Joey Hess
5225812659
Revert "remove stack-lts-18.13.yaml"
This reverts commit b0546e8bde.

https://github.com/datalad/git-annex/issues/204 is still not fixed yet
2024-09-27 15:30:51 -04:00
Joey Hess
e8e4347fcc
update version for release 2024-09-27 10:01:44 -04:00
mike@2d6d71f56ce2a992244350475251df87c26fe351
7b5dda33e0 removed 2024-09-27 12:18:59 +00:00
mike@2d6d71f56ce2a992244350475251df87c26fe351
39e02528f0 Added a comment: corruption using git-annex-remote-rclone 2024-09-27 12:18:41 +00:00
mike@2d6d71f56ce2a992244350475251df87c26fe351
82538a9cd3 Added a comment: corruption using git-annex-remote-rclone 2024-09-27 07:39:06 +00:00
Joey Hess
b0546e8bde
remove stack-lts-18.13.yaml
windows autobuilder should have been fixed by now

(offline so didn't check)
2024-09-26 18:46:12 -04:00
Joey Hess
4ca3d1d584
remove read of the heads
and one tail

Removed head from Utility.PartialPrelude in order to avoid the build
warning with recent ghc versions as well.
2024-09-26 18:43:59 -04:00
Joey Hess
10216b44d2
use NonEmpty for dirHashes
This avoids 4 uses of head.
2024-09-26 18:15:00 -04:00
Joey Hess
43f31121a5
Git: use NonEmpty in fullconfig
This is a nice win. Avoids partial functions, by encoding at the type
level the fact that fullconfig is never an empty list.
2024-09-26 17:54:36 -04:00