git-annex

Author	SHA1	Message	Date
Joey Hess	930c078965	working in streamproxy branch	2024-10-15 12:26:53 -04:00
Joey Hess	57ac43e4f1	update	2024-10-15 10:31:42 -04:00
Joey Hess	9574e3a8bb	Merge branch 'master' of ssh://git-annex.branchable.com	2024-10-12 10:57:52 -04:00
Joey Hess	8baa43ee12	tried a blind alley on streaming special remote download via proxy This didn't work. In case I want to revisit, here's what I tried. diff --git a/Annex/Proxy.hs b/Annex/Proxy.hs index 48222872c1..e4e526d3dd 100644 --- a/Annex/Proxy.hs +++ b/Annex/Proxy.hs @@ -26,16 +26,21 @@ import Logs.UUID import Logs.Location import Utility.Tmp.Dir import Utility.Metered +import Utility.ThreadScheduler +import Utility.OpenFd import Git.Types import qualified Database.Export as Export import Control.Concurrent.STM import Control.Concurrent.Async +import Control.Concurrent.MVar import qualified Data.ByteString as B +import qualified Data.ByteString as BS import qualified Data.ByteString.Lazy as L import qualified System.FilePath.ByteString as P import qualified Data.Map as M import qualified Data.Set as S +import System.IO.Unsafe proxyRemoteSide :: ProtocolVersion -> Bypass -> Remote -> Annex RemoteSide proxyRemoteSide clientmaxversion bypass r @@ -240,21 +245,99 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go writeVerifyChunk iv h b storetofile iv h (n - fromIntegral (B.length b)) bs - proxyget offset af k = withproxytmpfile k $ \tmpfile -> do + proxyget offset af k = withproxytmpfile k $ \tmpfile -> + let retrieve = tryNonAsync $ Remote.retrieveKeyFile r k af + (fromRawFilePath tmpfile) nullMeterUpdate vc + in case fromKey keySize k of + Just size \| size > 0 -> do + cancelv <- liftIO newEmptyMVar + donev <- liftIO newEmptyMVar + streamer <- liftIO $ async $ + streamdata offset tmpfile size cancelv donev + retrieve >>= \case + Right _ -> liftIO $ do + putMVar donev () + wait streamer + Left err -> liftIO $ do + putMVar cancelv () + wait streamer + propagateerror err + _ -> retrieve >>= \case + Right _ -> liftIO $ senddata offset tmpfile + Left err -> liftIO $ propagateerror err + where -- Don't verify the content from the remote, -- because the client will do its own verification. - let vc = Remote.NoVerify - tryNonAsync (Remote.retrieveKeyFile r k af (fromRawFilePath tmpfile) nullMeterUpdate vc) >>= \case - Right _ -> liftIO $ senddata offset tmpfile - Left err -> liftIO $ propagateerror err + vc = Remote.NoVerify + streamdata (Offset offset) f size cancelv donev = do + sendlen offset size + waitforfile + x <- tryNonAsync $ do + fd <- openFdWithMode f ReadOnly Nothing defaultFileFlags + h <- fdToHandle fd + hSeek h AbsoluteSeek offset + senddata' h (getcontents size) + case x of + Left err -> do + throwM err + Right res -> return res + where + -- The file doesn't exist at the start. + -- Wait for some data to be written to it as well, + -- in case an empty file is first created and then + -- overwritten. When there is an offset, wait for + -- the file to get that large. Note that this is not used + -- when the size is 0. + waitforfile = tryNonAsync (fromIntegral <$> getFileSize f) >>= \case + Right sz \| sz > 0 && sz >= offset -> return () + _ -> ifM (isEmptyMVar cancelv) + ( do + threadDelaySeconds (Seconds 1) + waitforfile + , do + return () + ) + + getcontents n h = unsafeInterleaveIO $ do + isdone <- isEmptyMVar donev <\|\|> isEmptyMVar cancelv + c <- BS.hGet h defaultChunkSize + let n' = n - fromIntegral (BS.length c) + let c' = L.fromChunks [BS.take (fromIntegral n) c] + if BS.null c + then if isdone + then return mempty + else do + -- Wait for more data to be + -- written to the file. + threadDelaySeconds (Seconds 1) + getcontents n h + else if n' > 0 + then do + -- unsafeInterleaveIO causes + -- this to be deferred until + -- data is read from the lazy + -- ByteString. + cs <- getcontents n' h + return $ L.append c' cs + else return c' + senddata (Offset offset) f = do size <- fromIntegral <$> getFileSize f - let n = max 0 (size - offset) - sendmessage $ DATA (Len n) + sendlen offset size withBinaryFile (fromRawFilePath f) ReadMode $ \h -> do hSeek h AbsoluteSeek offset - sendbs =<< L.hGetContents h + senddata' h L.hGetContents + + senddata' h getcontents = do + sendbs =<< getcontents h -- Important to keep the handle open until -- the client responds. The bytestring -- could still be lazily streaming out to @@ -272,6 +355,11 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go Just FAILURE -> return () Just _ -> giveup "protocol error" Nothing -> return () + + sendlen offset size = do + let n = max 0 (size - offset) + sendmessage $ DATA (Len n) + {- Check if this repository can proxy for a specified remote uuid, - and if so enable proxying for it. -}	2024-10-07 15:12:09 -04:00
Joey Hess	b501d23f9b	update	2024-10-07 10:06:12 -04:00
matrss	19f7b0e7d4		2024-10-02 15:07:54 +00:00
matrss	470bd1f441		2024-10-02 14:51:58 +00:00
matrss	4a794ce0ba		2024-10-02 14:42:37 +00:00
Joey Hess	99236376e7	sim: document interruption and concurrency issues Does not seem worth doing a lot of locking and detection of these problems.	2024-09-26 12:26:47 -04:00
Joey Hess	783e910d0c	sim: Add metadata command Only really needed for completeness, preferred content expressions can match against metadata.	2024-09-26 12:20:37 -04:00
Joey Hess	6f084524bd	Merge branch 'sim'	2024-09-25 14:42:27 -04:00
Joey Hess	d026e585be	update	2024-09-25 14:29:37 -04:00
Joey Hess	8e94b75a61	support simulating clusters Without actually simulating cluster implementation at all. Instead, only the essential fact that cluster gateways know what changes they have made to each node of a cluster. That is enough for sims like sizebalanced_cluster.	2024-09-25 14:06:41 -04:00
Joey Hess	61c95f4d29	design for simulating clusters w/o simulating cluster gateways	2024-09-25 12:58:53 -04:00
Joey Hess	85418d6c72	update	2024-09-25 12:10:55 -04:00
Joey Hess	4ed58d7894	sim: random preferred content expression generation	2024-09-24 11:23:23 -04:00
Joey Hess	7cc4312695	fix state overwrite bug I have needed to excercise a lot of care in threading st through, and I got it wrong here. Probably using a state monad would be a good idea.	2024-09-24 10:00:38 -04:00
Joey Hess	76fa43e882	update test case for bug after recent changes broke the test case the other bug I cannot reproduce though	2024-09-23 16:05:11 -04:00
Joey Hess	969e6c2747	sped up sim step by about 200% Noticed that it was quite slow compared with things like action sendwanted. Guessed that the slowdown is largely due to every step doing a simulated git pull/push. So, rather than always doing a pull/push, only do those when no actions are found without doing a pull/push. This does mean that step will sometimes experience a split brain situation, but that seems like a good thing? Because step ought to explore as many possible scenarios as it reasonably can.	2024-09-23 15:45:47 -04:00
Joey Hess	6cf9a101b8	sim: Fix size tracking for balanced preferred content	2024-09-23 12:42:32 -04:00
Joey Hess	a6b8082119	update	2024-09-23 09:38:56 -04:00
Joey Hess	2daa8a8f21	puzzling bug	2024-09-20 16:53:40 -04:00
Joey Hess	19b966f0fd	sim: better step On each step, find all the actions that could be done, and pick one of them to do. Should detect stability, but that is broken.	2024-09-20 15:23:34 -04:00
Joey Hess	24b3aed84a	update	2024-09-20 11:59:35 -04:00
Joey Hess	fd24d0d66f	update	2024-09-20 11:26:40 -04:00
Joey Hess	7c10d6846c	update	2024-09-20 11:05:57 -04:00
Joey Hess	f061ae92fb	sim: implement addtree	2024-09-20 10:34:52 -04:00
Joey Hess	5e51e7c339	comment	2024-09-18 09:08:42 -04:00
Joey Hess	29d8429779	sim: tested concurrency over actions This demonstrates concurrent behavior that looks right. And with a random seed, the results are deterministic. init foo init bar init backup connect foo <-> bar connect foo <-> backup addmulti 10 testfiles 1mb 1gb foo backup action foo gitpull backup wanted foo nothing wanted bar anything wanted backup anything action bar gitpull foo action foo dropunwanted while action bar getwanted foo	2024-09-17 14:39:53 -04:00
Joey Hess	6751f23978	sim: fix get bug When getting from a remote, have to check that the repo doing the getting thinks the remote contains the key, but also that the remote actually does. Before this bug fix, it would get from a repo that used to have the key, but that had dropped it since the last git pull.	2024-09-17 14:29:49 -04:00
Joey Hess	b85965cb3c	sim: implement dropunwantedfrom	2024-09-17 13:35:35 -04:00
Joey Hess	eb5fad4e79	fix ActionDropUnwanted Now tested working	2024-09-17 11:55:57 -04:00
Joey Hess	4c7db31c20	addmulti	2024-09-17 11:22:14 -04:00
Joey Hess	2a16796a1c	move pull/push/sync into getSimActionComponents As well as being a more pleasing implementation than I managed yesterday, this allows for those actions to be run concurrently in the sim.	2024-09-17 10:54:44 -04:00
Joey Hess	3b7e3cb2f4	add	2024-09-17 08:31:55 -04:00
nobodyinperson	f8d1022db0	Added a comment: 👍 +1 for encrypting the annex on regular git remotes	2024-09-12 14:51:20 +00:00
m.szczepanik@8dd0314f20fa09be99ee3903d1c04a80eafbd849	3a03ed42e6		2024-09-12 12:13:06 +00:00
Joey Hess	ed740bc31e	comment	2024-09-05 09:20:38 -04:00
Joey Hess	00e3531169	update	2024-09-04 11:36:46 -04:00
Joey Hess	1b6c33a38e	update	2024-09-03 14:24:32 -04:00
Joey Hess	3398514c38	sim design	2024-09-03 14:23:48 -04:00
Joey Hess	340bdd0dac	treat "not present" in preferred content as invalid Detect when a preferred content expression contains "not present", which would lead to repeatedly getting and then dropping files, and make it never match. This also applies to "not balanced" and "not sizebalanced". --explain will tell the user when this happens Note that getMatcher calls matchMrun' and does not check for unstable negated limits. While there is no --present anyway, if there was, it would not make sense for --not --present to complain about instability and fail to match.	2024-09-03 13:50:06 -04:00
Joey Hess	03864a2c3b	update	2024-09-03 11:52:54 -04:00
Joey Hess	53b7375cc6	update	2024-08-30 11:14:45 -04:00
Joey Hess	d0938d730b	Merge branch 'master' into balanced	2024-08-30 11:01:39 -04:00
yarikoptic	e2b7895cbc	Added a comment	2024-08-29 18:35:47 +00:00
Joey Hess	f89a1b8216	remove stale live changes from reposize database Reorganized the reposize database directory, and split up a column. checkStaleSizeChanges needs to run before needLiveUpdate, otherwise the process won't be holding a lock on its pid file, and another process could go in and expire the live update it records. It just so happens that they do get called in the correct order, since checking balanced preferred content calls getLiveRepoSizes before needLiveUpdate. The 1 minute delay between checks is arbitrary, but will avoid excess work. The downside of it is that, if a process is dropping a file and gets interrupted, for 1 minute another process can expect a repository will soon be smaller than it is. And so a process might send data to a repository when a file is not really going to be dropped from it. But note that can already happen if a drop takes some time in eg locking and then fails. So it seems possible that live updates should only be allowed to increase, rather than decrease the size of a repository.	2024-08-28 13:57:25 -04:00
Joey Hess	278adbb726	combine 2 queries	2024-08-28 11:00:59 -04:00
Joey Hess	e006acef22	avoid reposize database locking overhead when not needed Only when the preferred content expression being matched uses balanced preferred content is this overhead needed. It might be possible to eliminate the locking entirely. Eg, check the live changes before and after the action and re-run if they are not stable. For now, this is good enough, it avoids existing preferred content getting slow. If balanced preferred content turns out to be too slow to check, that could be tried later.	2024-08-28 10:52:34 -04:00
matrss	3f62116d64	Added a comment	2024-08-28 08:47:33 +00:00

1 2 3 4 5 ...

4827 commits