Commit graph

3079 commits

Author SHA1 Message Date
Joey Hess
a2fc471e14
safer git sha object filename
Rather than use the filename provided by INPUT, which could come from user
input, and so could be something that looks like a dashed parameter,
use a .git/object/<sha> filename.

This avoids user input passing through INPUT and back out, with the file
path then passed to a command, which could do something unexpected with
a dashed parameter, or other special parameter.

Added a note in the design about being careful of passing user input to
commands. They still have to be careful of that in general, just not in
this case.
2025-03-04 14:54:13 -04:00
Joey Hess
f32d2aecce
autoenable security for compute special remote
Added annex.security.autoenable-compute-programs and only allow
autoenabling special remotes that use compute programs on that list.

The reason this is needed is a user might have some compute programs
that are less safe to use than others. They might want to use an unsafe
one only with one repository, where they are the only committer or other
committers are trusted. They might be ok with others being used by any
repository, and if so they can add them to the list.

Another reason would be a user who has installed a compute program by
accident. Eg, it might be included with git-annex at some point, or
pulled in by some dependency. That user doesn't necessarily want that
compute program to be used in an autoenabled special remote.
2025-03-03 15:52:56 -04:00
Joey Hess
2b8428bb17
wording 2025-02-25 17:26:28 -04:00
Joey Hess
f8c7cea019
pdate demo program
needed a mkdir
2025-02-25 17:23:38 -04:00
Joey Hess
2e1fe1620e
handle comutations in subdirs of the git repository
Eg, a computation might be run in "foo/" and refer to "../bar" as an
input or output.

So, the subdir is part of the computation state.

Also, prevent input or output of files that are outside the git
repository. Of course, the program can access any file on disk if it
wants to; this is just a guard against mistakes. And it may also be
useful if the program comunicates with something less trusted than it,
eg a container image, so input/output files communicated by that are not
the source of security problems.
2025-02-25 15:08:38 -04:00
Joey Hess
921850d05c
support addcomputed --fast
This complicates the interface but it's still simpler to understand than
the old interface.
2025-02-24 13:48:46 -04:00
Joey Hess
490174b068
new compute program interface
This is much more flexible, and also simpler to understand.
2025-02-24 12:44:20 -04:00
Joey Hess
b804f8a3cc
update 2025-02-21 15:09:46 -04:00
Joey Hess
e897229088
wip 2025-02-20 17:23:15 -04:00
Joey Hess
4f3d9f8115
update 2025-02-20 13:27:59 -04:00
Joey Hess
c1b53dbbd0
wip 2025-02-20 13:27:47 -04:00
Joey Hess
ace9944d1c
add REPRODUCIBLE 2025-02-19 14:16:36 -04:00
Joey Hess
f52385f63d
optional and required inputs and some other changes 2025-02-19 12:47:32 -04:00
Joey Hess
f4c3fdeaed
improved draft design 2025-02-18 15:46:47 -04:00
Joey Hess
e6e69f8f93
draft 2025-02-13 16:12:07 -04:00
Joey Hess
cbb6df35aa
merge in doc changes from master 2025-01-29 18:57:25 -04:00
Joey Hess
a4e9057486
implement put data-present parameter in http servant
Changed the protocol docs because servant parses "true" and "false" for
booleans in query parameters, not "1" and "0".

clientPut with datapresent=True is not used by git-annex, and I don't
anticipate it being used in git-annex, except for testing.

I've tested this by making clientPut be called with datapresent=True and
git-annex copy to a remote succeeds once the object file is first
manually copied to the remote. That would be a good test for the test
suite, but running the http client means exposing it to at least
localhost, and would fail if a real http client was already running on
that port.
2024-10-29 13:32:43 -04:00
Joey Hess
d782b136e0
p2p protocol version 4 for DATA-PRESENT 2024-10-29 10:39:12 -04:00
Joey Hess
dc7aec77a4
formatting 2024-10-28 13:49:58 -04:00
Joey Hess
926b632faa
simplified design for indirect uploads 2024-10-28 13:29:33 -04:00
Joey Hess
a0f1fbb613
pondering 2024-10-22 11:37:43 -04:00
Joey Hess
ed679f2a51
add missing space 2024-10-22 11:12:12 -04:00
Joey Hess
7dde035ac8
planning 2024-10-22 11:09:47 -04:00
Joey Hess
8baccda98f
Merge branch 'master' into streamproxy 2024-10-22 09:49:28 -04:00
Joey Hess
d5b59ecba9
clarification on 403 2024-10-18 11:05:41 -04:00
Joey Hess
8c7047fc77
Merge branch 'master' into streamproxy 2024-10-18 10:18:59 -04:00
Joey Hess
c4dfeaef53
streaming uploads 2024-10-15 16:02:19 -04:00
Joey Hess
d9b4bf4224
added retrieveKeyFileInOrder and ORDERED to external special remote protocol
I anticipate lots of external special remote programs will neglect
implementing this. Still, it's the right thing to do to assume that some
of them may write files out of order. Probably most external special
remotes will not be used with a proxy. When someone is using one with a
proxy, they can always get it fixed to send ORDERED.
2024-10-15 15:40:14 -04:00
Joey Hess
8baa43ee12
tried a blind alley on streaming special remote download via proxy
This didn't work. In case I want to revisit, here's what I tried.

diff --git a/Annex/Proxy.hs b/Annex/Proxy.hs
index 48222872c1..e4e526d3dd 100644
--- a/Annex/Proxy.hs
+++ b/Annex/Proxy.hs
@@ -26,16 +26,21 @@ import Logs.UUID
 import Logs.Location
 import Utility.Tmp.Dir
 import Utility.Metered
+import Utility.ThreadScheduler
+import Utility.OpenFd
 import Git.Types
 import qualified Database.Export as Export

 import Control.Concurrent.STM
 import Control.Concurrent.Async
+import Control.Concurrent.MVar
 import qualified Data.ByteString as B
+import qualified Data.ByteString as BS
 import qualified Data.ByteString.Lazy as L
 import qualified System.FilePath.ByteString as P
 import qualified Data.Map as M
 import qualified Data.Set as S
+import System.IO.Unsafe

 proxyRemoteSide :: ProtocolVersion -> Bypass -> Remote -> Annex RemoteSide
 proxyRemoteSide clientmaxversion bypass r
@@ -240,21 +245,99 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go
 		writeVerifyChunk iv h b
 		storetofile iv h (n - fromIntegral (B.length b)) bs

-	proxyget offset af k = withproxytmpfile k $ \tmpfile -> do
+	proxyget offset af k = withproxytmpfile k $ \tmpfile ->
+		let retrieve = tryNonAsync $ Remote.retrieveKeyFile r k af
+			(fromRawFilePath tmpfile) nullMeterUpdate vc
+		in case fromKey keySize k of
+			Just size | size > 0 -> do
+				cancelv <- liftIO newEmptyMVar
+				donev <- liftIO newEmptyMVar
+				streamer <- liftIO $ async $
+					streamdata offset tmpfile size cancelv donev
+				retrieve >>= \case
+					Right _ -> liftIO $ do
+						putMVar donev ()
+						wait streamer
+					Left err -> liftIO $ do
+						putMVar cancelv ()
+						wait streamer
+						propagateerror err
+			_ -> retrieve >>= \case
+				Right _ -> liftIO $ senddata offset tmpfile
+				Left err -> liftIO $ propagateerror err
+	  where
 		-- Don't verify the content from the remote,
 		-- because the client will do its own verification.
-		let vc = Remote.NoVerify
-		tryNonAsync (Remote.retrieveKeyFile r k af (fromRawFilePath tmpfile) nullMeterUpdate vc) >>= \case
-			Right _ -> liftIO $ senddata offset tmpfile
-			Left err -> liftIO $ propagateerror err
+		vc = Remote.NoVerify

+	streamdata (Offset offset) f size cancelv donev = do
+		sendlen offset size
+		waitforfile
+		x <- tryNonAsync $ do
+			fd <- openFdWithMode f ReadOnly Nothing defaultFileFlags
+			h <- fdToHandle fd
+			hSeek h AbsoluteSeek offset
+			senddata' h (getcontents size)
+		case x of
+			Left err -> do
+				throwM err
+			Right res -> return res
+	  where
+		-- The file doesn't exist at the start.
+		-- Wait for some data to be written to it as well,
+		-- in case an empty file is first created and then
+		-- overwritten. When there is an offset, wait for
+		-- the file to get that large. Note that this is not used
+		-- when the size is 0.
+		waitforfile = tryNonAsync (fromIntegral <$> getFileSize f) >>= \case
+			Right sz | sz > 0 && sz >= offset -> return ()
+			_ -> ifM (isEmptyMVar cancelv)
+				( do
+					threadDelaySeconds (Seconds 1)
+					waitforfile
+				, do
+					return ()
+				)
+
+		getcontents n h = unsafeInterleaveIO $ do
+			isdone <- isEmptyMVar donev <||> isEmptyMVar cancelv
+			c <- BS.hGet h defaultChunkSize
+			let n' = n - fromIntegral (BS.length c)
+			let c' = L.fromChunks [BS.take (fromIntegral n) c]
+			if BS.null c
+				then if isdone
+					then return mempty
+					else do
+						-- Wait for more data to be
+						-- written to the file.
+						threadDelaySeconds (Seconds 1)
+						getcontents n h
+				else if n' > 0
+					then do
+						-- unsafeInterleaveIO causes
+						-- this to be deferred until
+						-- data is read from the lazy
+						-- ByteString.
+						cs <- getcontents n' h
+						return $ L.append c' cs
+					else return c'
+
 	senddata (Offset offset) f = do
 		size <- fromIntegral <$> getFileSize f
-		let n = max 0 (size - offset)
-		sendmessage $ DATA (Len n)
+		sendlen offset size
 		withBinaryFile (fromRawFilePath f) ReadMode $ \h -> do
 			hSeek h AbsoluteSeek offset
-			sendbs =<< L.hGetContents h
+			senddata' h L.hGetContents
+
+	senddata' h getcontents = do
+			sendbs =<< getcontents h
 			-- Important to keep the handle open until
 			-- the client responds. The bytestring
 			-- could still be lazily streaming out to
@@ -272,6 +355,11 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go
 				Just FAILURE -> return ()
 				Just _ -> giveup "protocol error"
 				Nothing -> return ()
+
+	sendlen offset size = do
+		let n = max 0 (size - offset)
+		sendmessage $ DATA (Len n)
+

 {- Check if this repository can proxy for a specified remote uuid,
  - and if so enable proxying for it. -}
2024-10-07 15:12:09 -04:00
Joey Hess
b800ea6826
2 level toc 2024-09-02 16:32:28 -04:00
Joey Hess
1e1c13dd38
fix number of headers 2024-09-02 16:31:03 -04:00
Joey Hess
68a99a8f48
size based rebalancing design 2024-08-18 16:25:12 -04:00
Joey Hess
bcd2b9a5c4
idea 2024-08-12 09:43:14 -04:00
Joey Hess
3019b21c40
more formal documentation of balancing 2024-08-11 13:29:06 -04:00
Joey Hess
3ce2e95a5f
balanced preferred content and --rebalance
This all works fine. But it doesn't check repository sizes yet, and
without repository size checking, once a repository gets full, there
will be no other repository that will want its files.

Use of sha2 seems unncessary, probably alder2 or md5 or crc would have
been enough. Possibly just summing up the bytes of the key mod the number
of repositories would have sufficed. But sha2 is there, and probably
hardware accellerated. I doubt very much there is any security benefit
to using it though. If someone wants to construct a key that will be
balanced onto a given repository, sha2 is certianly not going to stop
them.
2024-08-09 14:16:09 -04:00
Joey Hess
3ea835c7e8
proxied exporttree=yes versionedexport=yes remotes are not untrusted
This removes versionedExport, which was only used by the S3 special
remote. Instead, versionedexport=yes is a common way for remotes to
indicate that they are versioned.
2024-08-08 15:24:19 -04:00
Joey Hess
4750ffbd3b
finalized design for proxying to exporttree=yes annexobjects=yes special remotes 2024-08-06 11:45:45 -04:00
Joey Hess
84d27cf34f
update 2024-08-06 11:13:51 -04:00
Joey Hess
d52fd3cf83
update 2024-07-30 12:17:05 -04:00
Joey Hess
1560e0eee9
comment 2024-07-30 10:50:13 -04:00
Joey Hess
b4eb6e3ced
comment 2024-07-29 11:59:33 -04:00
Joey Hess
321e2adf66
don't think I ever implementned the 422 idea, it will 404 2024-07-29 11:49:40 -04:00
Joey Hess
d3f584fcdb
wording 2024-07-29 11:44:44 -04:00
Joey Hess
5f5c29fbe7
link 2024-07-29 11:43:30 -04:00
Joey Hess
f3b207a4b9
wording 2024-07-29 11:37:13 -04:00
Joey Hess
74f81ebd04
Merge remote-tracking branch 'origin/httpproto' 2024-07-29 11:25:27 -04:00
stv0g
6352cebb92 Added a comment: importtree=yes Support 2024-07-29 06:50:01 +00:00
Joey Hess
cd89f91aa5
remove uuid from annex+http urls
Not needed it turns out.
2024-07-28 20:29:42 -04:00
Joey Hess
0ea645944e
thoughts on exporttree 2024-07-27 19:59:54 -04:00
Joey Hess
0fb86d2916
UNLOCKCONTENT is not a top-level request
proxyRequest was treating UNLOCKCONTENT as a separate request.
That made it possible for there to be two different connections to the
proxied remote, with LOCKCONTENT being sent to one, and UNLOCKCONTENT
to the other one. A protocol error.

git-annex testremote now passes against a http proxied remote.
2024-07-26 20:39:06 -04:00