Commit graph

281 commits

Author SHA1 Message Date
Joey Hess
87871f724e
split up remaining items from todo/git-annex_proxies and close it! 2024-10-30 14:49:54 -04:00
Joey Hess
126daf949d
DATA-PRESENT working for exporttree=yes remotes
Since the annex-tracking-branch is pushed first, git-annex has already
updated the export database when the DATA-PRESENT arrives. Which means
that just using checkPresent is enough to verify that there is some file
on the special remote in the export location for the key.

So, the simplest possible implementation of this happened to work!

(I also tested it with chunked specialremotes, which also works, as long
as the chunk size used is the same as the configured chunk size. In that
case, the lack of a chunk log is not a problem. Doubtful this will ever
make sense to use with a chunked special remote though, that gets pretty
deep into re-implementing git-annex.)

Updated the client side upload tip with a missing step, and reorged for clarity.
2024-10-30 13:55:47 -04:00
Joey Hess
fda151a4e2
Merge branch 'master' into p2pv4 2024-10-30 08:13:49 -04:00
Joey Hess
4d03ada12f
break out todo item 2024-10-30 08:13:33 -04:00
Joey Hess
2ca6ecad58
add tip for DATA-PRESENT feature 2024-10-29 16:15:01 -04:00
Joey Hess
95d1d29724
update 2024-10-28 13:46:57 -04:00
Joey Hess
7dde035ac8
planning 2024-10-22 11:09:47 -04:00
Joey Hess
8c7047fc77
Merge branch 'master' into streamproxy 2024-10-18 10:18:59 -04:00
Joey Hess
c4dfeaef53
streaming uploads 2024-10-15 16:02:19 -04:00
Joey Hess
d9b4bf4224
added retrieveKeyFileInOrder and ORDERED to external special remote protocol
I anticipate lots of external special remote programs will neglect
implementing this. Still, it's the right thing to do to assume that some
of them may write files out of order. Probably most external special
remotes will not be used with a proxy. When someone is using one with a
proxy, they can always get it fixed to send ORDERED.
2024-10-15 15:40:14 -04:00
Joey Hess
835283b862
stream through proxy when using fileRetriever
The problem was that when the proxy requests a key be retrieved to its
own temp file, fileRetriever was retriving it to the key's temp
location, and then moving it at the end, which broke streaming.

So, plumb through the path where the key is being retrieved to.
2024-10-15 14:29:06 -04:00
Joey Hess
930c078965
working in streamproxy branch 2024-10-15 12:26:53 -04:00
Joey Hess
edaed18e4c
Sped up proxied downloads from special remotes, by streaming
Currently works for special remotes that don't use fileRetriever. Ones that
do will download to another filename and rename it into place, defeating
the streaming.

This actually benchmarks slightly slower when getting a large file from
a fast proxied special remote. However, when the proxied special remote
is slow, it will be a big win.
2024-10-15 12:25:15 -04:00
Joey Hess
57ac43e4f1
update 2024-10-15 10:31:42 -04:00
Joey Hess
8baa43ee12
tried a blind alley on streaming special remote download via proxy
This didn't work. In case I want to revisit, here's what I tried.

diff --git a/Annex/Proxy.hs b/Annex/Proxy.hs
index 48222872c1..e4e526d3dd 100644
--- a/Annex/Proxy.hs
+++ b/Annex/Proxy.hs
@@ -26,16 +26,21 @@ import Logs.UUID
 import Logs.Location
 import Utility.Tmp.Dir
 import Utility.Metered
+import Utility.ThreadScheduler
+import Utility.OpenFd
 import Git.Types
 import qualified Database.Export as Export

 import Control.Concurrent.STM
 import Control.Concurrent.Async
+import Control.Concurrent.MVar
 import qualified Data.ByteString as B
+import qualified Data.ByteString as BS
 import qualified Data.ByteString.Lazy as L
 import qualified System.FilePath.ByteString as P
 import qualified Data.Map as M
 import qualified Data.Set as S
+import System.IO.Unsafe

 proxyRemoteSide :: ProtocolVersion -> Bypass -> Remote -> Annex RemoteSide
 proxyRemoteSide clientmaxversion bypass r
@@ -240,21 +245,99 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go
 		writeVerifyChunk iv h b
 		storetofile iv h (n - fromIntegral (B.length b)) bs

-	proxyget offset af k = withproxytmpfile k $ \tmpfile -> do
+	proxyget offset af k = withproxytmpfile k $ \tmpfile ->
+		let retrieve = tryNonAsync $ Remote.retrieveKeyFile r k af
+			(fromRawFilePath tmpfile) nullMeterUpdate vc
+		in case fromKey keySize k of
+			Just size | size > 0 -> do
+				cancelv <- liftIO newEmptyMVar
+				donev <- liftIO newEmptyMVar
+				streamer <- liftIO $ async $
+					streamdata offset tmpfile size cancelv donev
+				retrieve >>= \case
+					Right _ -> liftIO $ do
+						putMVar donev ()
+						wait streamer
+					Left err -> liftIO $ do
+						putMVar cancelv ()
+						wait streamer
+						propagateerror err
+			_ -> retrieve >>= \case
+				Right _ -> liftIO $ senddata offset tmpfile
+				Left err -> liftIO $ propagateerror err
+	  where
 		-- Don't verify the content from the remote,
 		-- because the client will do its own verification.
-		let vc = Remote.NoVerify
-		tryNonAsync (Remote.retrieveKeyFile r k af (fromRawFilePath tmpfile) nullMeterUpdate vc) >>= \case
-			Right _ -> liftIO $ senddata offset tmpfile
-			Left err -> liftIO $ propagateerror err
+		vc = Remote.NoVerify

+	streamdata (Offset offset) f size cancelv donev = do
+		sendlen offset size
+		waitforfile
+		x <- tryNonAsync $ do
+			fd <- openFdWithMode f ReadOnly Nothing defaultFileFlags
+			h <- fdToHandle fd
+			hSeek h AbsoluteSeek offset
+			senddata' h (getcontents size)
+		case x of
+			Left err -> do
+				throwM err
+			Right res -> return res
+	  where
+		-- The file doesn't exist at the start.
+		-- Wait for some data to be written to it as well,
+		-- in case an empty file is first created and then
+		-- overwritten. When there is an offset, wait for
+		-- the file to get that large. Note that this is not used
+		-- when the size is 0.
+		waitforfile = tryNonAsync (fromIntegral <$> getFileSize f) >>= \case
+			Right sz | sz > 0 && sz >= offset -> return ()
+			_ -> ifM (isEmptyMVar cancelv)
+				( do
+					threadDelaySeconds (Seconds 1)
+					waitforfile
+				, do
+					return ()
+				)
+
+		getcontents n h = unsafeInterleaveIO $ do
+			isdone <- isEmptyMVar donev <||> isEmptyMVar cancelv
+			c <- BS.hGet h defaultChunkSize
+			let n' = n - fromIntegral (BS.length c)
+			let c' = L.fromChunks [BS.take (fromIntegral n) c]
+			if BS.null c
+				then if isdone
+					then return mempty
+					else do
+						-- Wait for more data to be
+						-- written to the file.
+						threadDelaySeconds (Seconds 1)
+						getcontents n h
+				else if n' > 0
+					then do
+						-- unsafeInterleaveIO causes
+						-- this to be deferred until
+						-- data is read from the lazy
+						-- ByteString.
+						cs <- getcontents n' h
+						return $ L.append c' cs
+					else return c'
+
 	senddata (Offset offset) f = do
 		size <- fromIntegral <$> getFileSize f
-		let n = max 0 (size - offset)
-		sendmessage $ DATA (Len n)
+		sendlen offset size
 		withBinaryFile (fromRawFilePath f) ReadMode $ \h -> do
 			hSeek h AbsoluteSeek offset
-			sendbs =<< L.hGetContents h
+			senddata' h L.hGetContents
+
+	senddata' h getcontents = do
+			sendbs =<< getcontents h
 			-- Important to keep the handle open until
 			-- the client responds. The bytestring
 			-- could still be lazily streaming out to
@@ -272,6 +355,11 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go
 				Just FAILURE -> return ()
 				Just _ -> giveup "protocol error"
 				Nothing -> return ()
+
+	sendlen offset size = do
+		let n = max 0 (size - offset)
+		sendmessage $ DATA (Len n)
+

 {- Check if this repository can proxy for a specified remote uuid,
  - and if so enable proxying for it. -}
2024-10-07 15:12:09 -04:00
Joey Hess
b501d23f9b
update 2024-10-07 10:06:12 -04:00
Joey Hess
99236376e7
sim: document interruption and concurrency issues
Does not seem worth doing a lot of locking and detection of these
problems.
2024-09-26 12:26:47 -04:00
Joey Hess
783e910d0c
sim: Add metadata command
Only really needed for completeness, preferred content expressions can
match against metadata.
2024-09-26 12:20:37 -04:00
Joey Hess
6f084524bd
Merge branch 'sim' 2024-09-25 14:42:27 -04:00
Joey Hess
d026e585be
update 2024-09-25 14:29:37 -04:00
Joey Hess
8e94b75a61
support simulating clusters
Without actually simulating cluster implementation at all. Instead, only
the essential fact that cluster gateways know what changes they have
made to each node of a cluster. That is enough for sims like
sizebalanced_cluster.
2024-09-25 14:06:41 -04:00
Joey Hess
61c95f4d29
design for simulating clusters w/o simulating cluster gateways 2024-09-25 12:58:53 -04:00
Joey Hess
85418d6c72
update 2024-09-25 12:10:55 -04:00
Joey Hess
4ed58d7894
sim: random preferred content expression generation 2024-09-24 11:23:23 -04:00
Joey Hess
7cc4312695
fix state overwrite bug
I have needed to excercise a lot of care in threading st through, and I
got it wrong here. Probably using a state monad would be a good idea.
2024-09-24 10:00:38 -04:00
Joey Hess
76fa43e882
update test case for bug
after recent changes broke the test case

the other bug I cannot reproduce though
2024-09-23 16:05:11 -04:00
Joey Hess
969e6c2747
sped up sim step by about 200%
Noticed that it was quite slow compared with things like action
sendwanted. Guessed that the slowdown is largely due to every step
doing a simulated git pull/push.

So, rather than always doing a pull/push, only do those when no actions
are found without doing a pull/push.

This does mean that step will sometimes experience a split brain
situation, but that seems like a good thing? Because step ought to
explore as many possible scenarios as it reasonably can.
2024-09-23 15:45:47 -04:00
Joey Hess
6cf9a101b8
sim: Fix size tracking for balanced preferred content 2024-09-23 12:42:32 -04:00
Joey Hess
a6b8082119
update 2024-09-23 09:38:56 -04:00
Joey Hess
2daa8a8f21
puzzling bug 2024-09-20 16:53:40 -04:00
Joey Hess
19b966f0fd
sim: better step
On each step, find all the actions that could be done, and pick one of them
to do.

Should detect stability, but that is broken.
2024-09-20 15:23:34 -04:00
Joey Hess
24b3aed84a
update 2024-09-20 11:59:35 -04:00
Joey Hess
fd24d0d66f
update 2024-09-20 11:26:40 -04:00
Joey Hess
7c10d6846c
update 2024-09-20 11:05:57 -04:00
Joey Hess
f061ae92fb
sim: implement addtree 2024-09-20 10:34:52 -04:00
Joey Hess
29d8429779
sim: tested concurrency over actions
This demonstrates concurrent behavior that looks right. And with a
random seed, the results are deterministic.

init foo
init bar
init backup
connect foo <-> bar
connect foo <-> backup
addmulti 10 testfiles 1mb 1gb foo backup
action foo gitpull backup
wanted foo nothing
wanted bar anything
wanted backup anything
action bar gitpull foo
action foo dropunwanted while action bar getwanted foo
2024-09-17 14:39:53 -04:00
Joey Hess
6751f23978
sim: fix get bug
When getting from a remote, have to check that the repo doing the
getting thinks the remote contains the key, but also that the remote
actually does. Before this bug fix, it would get from a repo that used
to have the key, but that had dropped it since the last git pull.
2024-09-17 14:29:49 -04:00
Joey Hess
b85965cb3c
sim: implement dropunwantedfrom 2024-09-17 13:35:35 -04:00
Joey Hess
eb5fad4e79
fix ActionDropUnwanted
Now tested working
2024-09-17 11:55:57 -04:00
Joey Hess
4c7db31c20
addmulti 2024-09-17 11:22:14 -04:00
Joey Hess
2a16796a1c
move pull/push/sync into getSimActionComponents
As well as being a more pleasing implementation than I managed
yesterday, this allows for those actions to be run concurrently in the
sim.
2024-09-17 10:54:44 -04:00
Joey Hess
3b7e3cb2f4
add 2024-09-17 08:31:55 -04:00
Joey Hess
00e3531169
update 2024-09-04 11:36:46 -04:00
Joey Hess
1b6c33a38e
update 2024-09-03 14:24:32 -04:00
Joey Hess
03864a2c3b
update 2024-09-03 11:52:54 -04:00
Joey Hess
53b7375cc6
update 2024-08-30 11:14:45 -04:00
Joey Hess
f89a1b8216
remove stale live changes from reposize database
Reorganized the reposize database directory, and split up a column.

checkStaleSizeChanges needs to run before needLiveUpdate,
otherwise the process won't be holding a lock on its pid file, and
another process could go in and expire the live update it records. It
just so happens that they do get called in the correct order, since
checking balanced preferred content calls getLiveRepoSizes before
needLiveUpdate.

The 1 minute delay between checks is arbitrary, but will avoid excess
work. The downside of it is that, if a process is dropping a file and
gets interrupted, for 1 minute another process can expect a repository
will soon be smaller than it is. And so a process might send data to a
repository when a file is not really going to be dropped from it. But
note that can already happen if a drop takes some time in eg locking and
then fails. So it seems possible that live updates should only be
allowed to increase, rather than decrease the size of a repository.
2024-08-28 13:57:25 -04:00
Joey Hess
278adbb726
combine 2 queries 2024-08-28 11:00:59 -04:00
Joey Hess
e006acef22
avoid reposize database locking overhead when not needed
Only when the preferred content expression being matched uses balanced
preferred content is this overhead needed.

It might be possible to eliminate the locking entirely. Eg, check the
live changes before and after the action and re-run if they are not
stable. For now, this is good enough, it avoids existing preferred
content getting slow. If balanced preferred content turns out to be too
slow to check, that could be tried later.
2024-08-28 10:52:34 -04:00
Joey Hess
0a119184e6
thoughts 2024-08-27 14:59:13 -04:00