Commit graph

4869 commits

Author SHA1 Message Date
matrss
84c86ad294 Added a comment 2024-12-15 18:13:00 +00:00
matrss
9f17cec7ba Added a comment 2024-12-13 22:02:15 +00:00
Joey Hess
cac47364a6
comment 2024-12-12 14:24:57 -04:00
matrss
71a4b51de6 2024-12-08 17:28:48 +00:00
matrss
01386a982b Added a comment 2024-12-08 17:27:59 +00:00
matrss
b6fc7ded82 2024-12-08 17:20:50 +00:00
Joey Hess
9c1ab28112
close 2024-12-04 13:44:48 -04:00
yarikoptic
78bff9bcab Added a comment 2024-12-03 20:26:10 +00:00
Joey Hess
dd052dcba1
annexInsteadOf config
Added config `url.<base>.annexInsteadOf` corresponding to git's
`url.<base>.pushInsteadOf`, to configure the urls to use for accessing the
git-annex repositories on a server without needing to configure
remote.name.annexUrl in each repository.

While one use case for this would be rewriting urls to use annex+http,
I decided not to add any kind of special case for that. So while
git-annex p2phttp, when serving multiple repositories, needs an url
of eg "annex+http://example.com/git-annex/ for each of them, rewriting an
url like "https://example.com/git/foo/bar" with this config set to
"https://example.com/git/" will result in eg
"annex+http://example.com/git-annex/foo/bar", which p2phttp does not
support.

That seems better dealt with in either git-annex p2phttp or a http
middleware, rather than complicating the config with a special case for
annex+http.

Anyway, there are other use cases for this that don't involve annex+http.
2024-12-03 14:39:07 -04:00
Joey Hess
0404968d10
comments 2024-12-03 13:00:14 -04:00
Joey Hess
aa2d543930
comment 2024-11-25 12:32:09 -04:00
Joey Hess
f5e1a7f4e4
comment 2024-11-21 15:21:08 -04:00
Joey Hess
41672b01bb
comment 2024-11-20 13:41:34 -04:00
Joey Hess
b8a717a617
reuse http url password for p2phttp url when on same host
When remote.name.annexUrl is an annex+http(s) url, that uses the same
hostname as remote.name.url, which is itself a http(s) url, they are
assumed to share a username and password.

This avoids unnecessary duplicate password prompts.
2024-11-19 15:27:26 -04:00
Joey Hess
3510072883
update 2024-11-19 14:42:50 -04:00
Joey Hess
aaba82f3c8
comments 2024-11-19 14:26:47 -04:00
Joey Hess
6489342b71
tag INM7 2024-11-19 14:12:11 -04:00
matrss
6b3920b168 Added a comment 2024-11-15 08:54:07 +00:00
yarikoptic
dcfca3f49d Added a comment 2024-11-03 14:48:03 +00:00
Joey Hess
3c973aba57
oops, add the new todos meant to be in prev commit 2024-10-30 14:50:24 -04:00
Joey Hess
87871f724e
split up remaining items from todo/git-annex_proxies and close it! 2024-10-30 14:49:54 -04:00
Joey Hess
126daf949d
DATA-PRESENT working for exporttree=yes remotes
Since the annex-tracking-branch is pushed first, git-annex has already
updated the export database when the DATA-PRESENT arrives. Which means
that just using checkPresent is enough to verify that there is some file
on the special remote in the export location for the key.

So, the simplest possible implementation of this happened to work!

(I also tested it with chunked specialremotes, which also works, as long
as the chunk size used is the same as the configured chunk size. In that
case, the lack of a chunk log is not a problem. Doubtful this will ever
make sense to use with a chunked special remote though, that gets pretty
deep into re-implementing git-annex.)

Updated the client side upload tip with a missing step, and reorged for clarity.
2024-10-30 13:55:47 -04:00
Joey Hess
fda151a4e2
Merge branch 'master' into p2pv4 2024-10-30 08:13:49 -04:00
Joey Hess
4d03ada12f
break out todo item 2024-10-30 08:13:33 -04:00
Joey Hess
2ca6ecad58
add tip for DATA-PRESENT feature 2024-10-29 16:15:01 -04:00
Joey Hess
95d1d29724
update 2024-10-28 13:46:57 -04:00
Joey Hess
7dde035ac8
planning 2024-10-22 11:09:47 -04:00
Joey Hess
8baccda98f
Merge branch 'master' into streamproxy 2024-10-22 09:49:28 -04:00
matrss
49a57bf25f Added a comment 2024-10-21 16:32:18 +00:00
Joey Hess
378e878d1e
comment 2024-10-21 11:39:27 -04:00
Joey Hess
028b4a9203
comment 2024-10-21 11:32:02 -04:00
Joey Hess
8c7047fc77
Merge branch 'master' into streamproxy 2024-10-18 10:18:59 -04:00
mih
b642800d8c Tag with project ID 2024-10-17 20:26:08 +00:00
matrss
8f96f7b16b 2024-10-17 09:39:56 +00:00
yarikoptic
460cdb5623 Added a comment 2024-10-16 18:58:04 +00:00
yarikoptic
bcc243f5b1 make freeze/thaw relative paths 2024-10-16 18:51:45 +00:00
Joey Hess
c4dfeaef53
streaming uploads 2024-10-15 16:02:19 -04:00
Joey Hess
d9b4bf4224
added retrieveKeyFileInOrder and ORDERED to external special remote protocol
I anticipate lots of external special remote programs will neglect
implementing this. Still, it's the right thing to do to assume that some
of them may write files out of order. Probably most external special
remotes will not be used with a proxy. When someone is using one with a
proxy, they can always get it fixed to send ORDERED.
2024-10-15 15:40:14 -04:00
Joey Hess
835283b862
stream through proxy when using fileRetriever
The problem was that when the proxy requests a key be retrieved to its
own temp file, fileRetriever was retriving it to the key's temp
location, and then moving it at the end, which broke streaming.

So, plumb through the path where the key is being retrieved to.
2024-10-15 14:29:06 -04:00
Joey Hess
c1b0348307
Merge branch 'master' of ssh://git-annex.branchable.com 2024-10-15 12:27:01 -04:00
Joey Hess
930c078965
working in streamproxy branch 2024-10-15 12:26:53 -04:00
Joey Hess
edaed18e4c
Sped up proxied downloads from special remotes, by streaming
Currently works for special remotes that don't use fileRetriever. Ones that
do will download to another filename and rename it into place, defeating
the streaming.

This actually benchmarks slightly slower when getting a large file from
a fast proxied special remote. However, when the proxied special remote
is slow, it will be a big win.
2024-10-15 12:25:15 -04:00
Joey Hess
57ac43e4f1
update 2024-10-15 10:31:42 -04:00
matrss
c7155366c7 Added a comment 2024-10-14 12:21:54 +00:00
Joey Hess
9574e3a8bb
Merge branch 'master' of ssh://git-annex.branchable.com 2024-10-12 10:57:52 -04:00
Joey Hess
8baa43ee12
tried a blind alley on streaming special remote download via proxy
This didn't work. In case I want to revisit, here's what I tried.

diff --git a/Annex/Proxy.hs b/Annex/Proxy.hs
index 48222872c1..e4e526d3dd 100644
--- a/Annex/Proxy.hs
+++ b/Annex/Proxy.hs
@@ -26,16 +26,21 @@ import Logs.UUID
 import Logs.Location
 import Utility.Tmp.Dir
 import Utility.Metered
+import Utility.ThreadScheduler
+import Utility.OpenFd
 import Git.Types
 import qualified Database.Export as Export

 import Control.Concurrent.STM
 import Control.Concurrent.Async
+import Control.Concurrent.MVar
 import qualified Data.ByteString as B
+import qualified Data.ByteString as BS
 import qualified Data.ByteString.Lazy as L
 import qualified System.FilePath.ByteString as P
 import qualified Data.Map as M
 import qualified Data.Set as S
+import System.IO.Unsafe

 proxyRemoteSide :: ProtocolVersion -> Bypass -> Remote -> Annex RemoteSide
 proxyRemoteSide clientmaxversion bypass r
@@ -240,21 +245,99 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go
 		writeVerifyChunk iv h b
 		storetofile iv h (n - fromIntegral (B.length b)) bs

-	proxyget offset af k = withproxytmpfile k $ \tmpfile -> do
+	proxyget offset af k = withproxytmpfile k $ \tmpfile ->
+		let retrieve = tryNonAsync $ Remote.retrieveKeyFile r k af
+			(fromRawFilePath tmpfile) nullMeterUpdate vc
+		in case fromKey keySize k of
+			Just size | size > 0 -> do
+				cancelv <- liftIO newEmptyMVar
+				donev <- liftIO newEmptyMVar
+				streamer <- liftIO $ async $
+					streamdata offset tmpfile size cancelv donev
+				retrieve >>= \case
+					Right _ -> liftIO $ do
+						putMVar donev ()
+						wait streamer
+					Left err -> liftIO $ do
+						putMVar cancelv ()
+						wait streamer
+						propagateerror err
+			_ -> retrieve >>= \case
+				Right _ -> liftIO $ senddata offset tmpfile
+				Left err -> liftIO $ propagateerror err
+	  where
 		-- Don't verify the content from the remote,
 		-- because the client will do its own verification.
-		let vc = Remote.NoVerify
-		tryNonAsync (Remote.retrieveKeyFile r k af (fromRawFilePath tmpfile) nullMeterUpdate vc) >>= \case
-			Right _ -> liftIO $ senddata offset tmpfile
-			Left err -> liftIO $ propagateerror err
+		vc = Remote.NoVerify

+	streamdata (Offset offset) f size cancelv donev = do
+		sendlen offset size
+		waitforfile
+		x <- tryNonAsync $ do
+			fd <- openFdWithMode f ReadOnly Nothing defaultFileFlags
+			h <- fdToHandle fd
+			hSeek h AbsoluteSeek offset
+			senddata' h (getcontents size)
+		case x of
+			Left err -> do
+				throwM err
+			Right res -> return res
+	  where
+		-- The file doesn't exist at the start.
+		-- Wait for some data to be written to it as well,
+		-- in case an empty file is first created and then
+		-- overwritten. When there is an offset, wait for
+		-- the file to get that large. Note that this is not used
+		-- when the size is 0.
+		waitforfile = tryNonAsync (fromIntegral <$> getFileSize f) >>= \case
+			Right sz | sz > 0 && sz >= offset -> return ()
+			_ -> ifM (isEmptyMVar cancelv)
+				( do
+					threadDelaySeconds (Seconds 1)
+					waitforfile
+				, do
+					return ()
+				)
+
+		getcontents n h = unsafeInterleaveIO $ do
+			isdone <- isEmptyMVar donev <||> isEmptyMVar cancelv
+			c <- BS.hGet h defaultChunkSize
+			let n' = n - fromIntegral (BS.length c)
+			let c' = L.fromChunks [BS.take (fromIntegral n) c]
+			if BS.null c
+				then if isdone
+					then return mempty
+					else do
+						-- Wait for more data to be
+						-- written to the file.
+						threadDelaySeconds (Seconds 1)
+						getcontents n h
+				else if n' > 0
+					then do
+						-- unsafeInterleaveIO causes
+						-- this to be deferred until
+						-- data is read from the lazy
+						-- ByteString.
+						cs <- getcontents n' h
+						return $ L.append c' cs
+					else return c'
+
 	senddata (Offset offset) f = do
 		size <- fromIntegral <$> getFileSize f
-		let n = max 0 (size - offset)
-		sendmessage $ DATA (Len n)
+		sendlen offset size
 		withBinaryFile (fromRawFilePath f) ReadMode $ \h -> do
 			hSeek h AbsoluteSeek offset
-			sendbs =<< L.hGetContents h
+			senddata' h L.hGetContents
+
+	senddata' h getcontents = do
+			sendbs =<< getcontents h
 			-- Important to keep the handle open until
 			-- the client responds. The bytestring
 			-- could still be lazily streaming out to
@@ -272,6 +355,11 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go
 				Just FAILURE -> return ()
 				Just _ -> giveup "protocol error"
 				Nothing -> return ()
+
+	sendlen offset size = do
+		let n = max 0 (size - offset)
+		sendmessage $ DATA (Len n)
+

 {- Check if this repository can proxy for a specified remote uuid,
  - and if so enable proxying for it. -}
2024-10-07 15:12:09 -04:00
Joey Hess
b501d23f9b
update 2024-10-07 10:06:12 -04:00
matrss
19f7b0e7d4 2024-10-02 15:07:54 +00:00
matrss
470bd1f441 2024-10-02 14:51:58 +00:00
matrss
4a794ce0ba 2024-10-02 14:42:37 +00:00