Commit graph

3099 commits

Author SHA1 Message Date
Joey Hess
a690c02361
add section on security 2025-08-01 13:51:25 -04:00
Joey Hess
398664d160
inline didn't work due to extension 2025-08-01 13:46:53 -04:00
Joey Hess
3b6b3416d9
add example git-annex-p2p-unix-sockets program and end-user docs 2025-08-01 13:44:00 -04:00
Joey Hess
7643c716cd
changed design for p2p generic socket
Having the git-annex-p2p-<netname> command output the socket filename
left git-annex scrambling to listen to it in order to not miss incoming
connections. And if the command uses something like socat UNIX-CONNECT,
that expects the socket to be accepting connections and errors out when
it's not, that would be a problem.

Rather than complicating the protocol with git-annex needing to send
back a message when it's listening to the socket, simplified it by
having git-annex provide the socket path to the command.

This does mean that, if a P2P network has its own place it expects to
find a socket file, the git-annex-p2p-<netname> command would need to
somehow arrange for it to use the git-annex socket path. A symlink would
be one way to handle that situation.
2025-07-31 13:18:30 -04:00
Joey Hess
4fb9b7cb67
support P2PAnnex in connectPeer
This is probably enough to support accessing remotes using p2p-annex:: urls.
Not tested yet of course since there is not yet support for serving the
other side of such a connection, or for setting up such a connection.

P2P.Generic has an implementation of the whole interface to the
git-annex-p2p-<netname> commands.
2025-07-30 13:23:23 -04:00
Joey Hess
f631bc9e56
add P2PAnnex constructor
This is for p2p-annex:: urls that will use the new generic P2P
transport.

In addressCredsFile, threw in an url encoding of any non-alphanumeric
characters that are in the address. This is to avoid any possible path
traversal attacks via a p2p-annex:: url, since the address part of it
could contain any characters. And, went ahead and did the same url
encoding of tor-annex:: urls, even though tor onion addresses are all
alphanumerics, on the off chance that might avoid a similar problem.
(It does not seem likely enough to treat it as a security hole.)
2025-07-30 12:09:17 -04:00
Joey Hess
2a81b26e8e
document output as a single line 2025-07-29 14:26:10 -04:00
Joey Hess
d70a8de5c5
rename design page 2025-07-29 14:24:05 -04:00
Joey Hess
c4a0ecaad1
documentation for generic P2P transports 2025-07-29 14:22:25 -04:00
Joey Hess
05c016084d
design for p2p socket transport 2025-07-29 14:00:21 -04:00
Joey Hess
0f998f0698
improve 2025-03-11 12:54:34 -04:00
Joey Hess
0477a8d098
add INPUT-REQUIRED
Used by git-annex-compute-singularity to make addcomputed --fast work.

Also, simplified git-annex-compute-singularity; there is no need to hard
link the container into place. singularity does not care about the
extension of the container, so can just pass it the annex object file.
2025-03-11 11:46:31 -04:00
Joey Hess
b02aca8627
reorg and expand security section 2025-03-11 11:12:59 -04:00
Joey Hess
e0b7653495
added git-annex-compute-singularity
And implemented SANDBOX, which it needs.
2025-03-10 16:41:26 -04:00
Joey Hess
7bda5f470c
document output files must be regular files 2025-03-10 14:15:07 -04:00
Joey Hess
ed51924211
redirect command stdout to stderr
Otherwise it will be interpreted as compute program protocol
2025-03-07 16:01:27 -04:00
Joey Hess
2c6dce83de
make OUTPUT subdirs
Simplifies compute programs.
2025-03-07 14:57:12 -04:00
Joey Hess
81ce4264df
compute: add response to OUTPUT
This allows rejecting output filenames that are outside the repository,
and also handles converting eg "-foo" to "./-foo" to prevent a command
that it's passed to interpreting the output filename as a dashed option.
2025-03-07 14:47:34 -04:00
Joey Hess
825a648670
prefix output with ./ in example 2025-03-06 14:42:07 -04:00
Joey Hess
b835c8c937
no longer a draft 2025-03-06 14:29:07 -04:00
Joey Hess
a2fc471e14
safer git sha object filename
Rather than use the filename provided by INPUT, which could come from user
input, and so could be something that looks like a dashed parameter,
use a .git/object/<sha> filename.

This avoids user input passing through INPUT and back out, with the file
path then passed to a command, which could do something unexpected with
a dashed parameter, or other special parameter.

Added a note in the design about being careful of passing user input to
commands. They still have to be careful of that in general, just not in
this case.
2025-03-04 14:54:13 -04:00
Joey Hess
f32d2aecce
autoenable security for compute special remote
Added annex.security.autoenable-compute-programs and only allow
autoenabling special remotes that use compute programs on that list.

The reason this is needed is a user might have some compute programs
that are less safe to use than others. They might want to use an unsafe
one only with one repository, where they are the only committer or other
committers are trusted. They might be ok with others being used by any
repository, and if so they can add them to the list.

Another reason would be a user who has installed a compute program by
accident. Eg, it might be included with git-annex at some point, or
pulled in by some dependency. That user doesn't necessarily want that
compute program to be used in an autoenabled special remote.
2025-03-03 15:52:56 -04:00
Joey Hess
2b8428bb17
wording 2025-02-25 17:26:28 -04:00
Joey Hess
f8c7cea019
pdate demo program
needed a mkdir
2025-02-25 17:23:38 -04:00
Joey Hess
2e1fe1620e
handle comutations in subdirs of the git repository
Eg, a computation might be run in "foo/" and refer to "../bar" as an
input or output.

So, the subdir is part of the computation state.

Also, prevent input or output of files that are outside the git
repository. Of course, the program can access any file on disk if it
wants to; this is just a guard against mistakes. And it may also be
useful if the program comunicates with something less trusted than it,
eg a container image, so input/output files communicated by that are not
the source of security problems.
2025-02-25 15:08:38 -04:00
Joey Hess
921850d05c
support addcomputed --fast
This complicates the interface but it's still simpler to understand than
the old interface.
2025-02-24 13:48:46 -04:00
Joey Hess
490174b068
new compute program interface
This is much more flexible, and also simpler to understand.
2025-02-24 12:44:20 -04:00
Joey Hess
b804f8a3cc
update 2025-02-21 15:09:46 -04:00
Joey Hess
e897229088
wip 2025-02-20 17:23:15 -04:00
Joey Hess
4f3d9f8115
update 2025-02-20 13:27:59 -04:00
Joey Hess
c1b53dbbd0
wip 2025-02-20 13:27:47 -04:00
Joey Hess
ace9944d1c
add REPRODUCIBLE 2025-02-19 14:16:36 -04:00
Joey Hess
f52385f63d
optional and required inputs and some other changes 2025-02-19 12:47:32 -04:00
Joey Hess
f4c3fdeaed
improved draft design 2025-02-18 15:46:47 -04:00
Joey Hess
e6e69f8f93
draft 2025-02-13 16:12:07 -04:00
Joey Hess
cbb6df35aa
merge in doc changes from master 2025-01-29 18:57:25 -04:00
Joey Hess
a4e9057486
implement put data-present parameter in http servant
Changed the protocol docs because servant parses "true" and "false" for
booleans in query parameters, not "1" and "0".

clientPut with datapresent=True is not used by git-annex, and I don't
anticipate it being used in git-annex, except for testing.

I've tested this by making clientPut be called with datapresent=True and
git-annex copy to a remote succeeds once the object file is first
manually copied to the remote. That would be a good test for the test
suite, but running the http client means exposing it to at least
localhost, and would fail if a real http client was already running on
that port.
2024-10-29 13:32:43 -04:00
Joey Hess
d782b136e0
p2p protocol version 4 for DATA-PRESENT 2024-10-29 10:39:12 -04:00
Joey Hess
dc7aec77a4
formatting 2024-10-28 13:49:58 -04:00
Joey Hess
926b632faa
simplified design for indirect uploads 2024-10-28 13:29:33 -04:00
Joey Hess
a0f1fbb613
pondering 2024-10-22 11:37:43 -04:00
Joey Hess
ed679f2a51
add missing space 2024-10-22 11:12:12 -04:00
Joey Hess
7dde035ac8
planning 2024-10-22 11:09:47 -04:00
Joey Hess
8baccda98f
Merge branch 'master' into streamproxy 2024-10-22 09:49:28 -04:00
Joey Hess
d5b59ecba9
clarification on 403 2024-10-18 11:05:41 -04:00
Joey Hess
8c7047fc77
Merge branch 'master' into streamproxy 2024-10-18 10:18:59 -04:00
Joey Hess
c4dfeaef53
streaming uploads 2024-10-15 16:02:19 -04:00
Joey Hess
d9b4bf4224
added retrieveKeyFileInOrder and ORDERED to external special remote protocol
I anticipate lots of external special remote programs will neglect
implementing this. Still, it's the right thing to do to assume that some
of them may write files out of order. Probably most external special
remotes will not be used with a proxy. When someone is using one with a
proxy, they can always get it fixed to send ORDERED.
2024-10-15 15:40:14 -04:00
Joey Hess
8baa43ee12
tried a blind alley on streaming special remote download via proxy
This didn't work. In case I want to revisit, here's what I tried.

diff --git a/Annex/Proxy.hs b/Annex/Proxy.hs
index 48222872c1..e4e526d3dd 100644
--- a/Annex/Proxy.hs
+++ b/Annex/Proxy.hs
@@ -26,16 +26,21 @@ import Logs.UUID
 import Logs.Location
 import Utility.Tmp.Dir
 import Utility.Metered
+import Utility.ThreadScheduler
+import Utility.OpenFd
 import Git.Types
 import qualified Database.Export as Export

 import Control.Concurrent.STM
 import Control.Concurrent.Async
+import Control.Concurrent.MVar
 import qualified Data.ByteString as B
+import qualified Data.ByteString as BS
 import qualified Data.ByteString.Lazy as L
 import qualified System.FilePath.ByteString as P
 import qualified Data.Map as M
 import qualified Data.Set as S
+import System.IO.Unsafe

 proxyRemoteSide :: ProtocolVersion -> Bypass -> Remote -> Annex RemoteSide
 proxyRemoteSide clientmaxversion bypass r
@@ -240,21 +245,99 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go
 		writeVerifyChunk iv h b
 		storetofile iv h (n - fromIntegral (B.length b)) bs

-	proxyget offset af k = withproxytmpfile k $ \tmpfile -> do
+	proxyget offset af k = withproxytmpfile k $ \tmpfile ->
+		let retrieve = tryNonAsync $ Remote.retrieveKeyFile r k af
+			(fromRawFilePath tmpfile) nullMeterUpdate vc
+		in case fromKey keySize k of
+			Just size | size > 0 -> do
+				cancelv <- liftIO newEmptyMVar
+				donev <- liftIO newEmptyMVar
+				streamer <- liftIO $ async $
+					streamdata offset tmpfile size cancelv donev
+				retrieve >>= \case
+					Right _ -> liftIO $ do
+						putMVar donev ()
+						wait streamer
+					Left err -> liftIO $ do
+						putMVar cancelv ()
+						wait streamer
+						propagateerror err
+			_ -> retrieve >>= \case
+				Right _ -> liftIO $ senddata offset tmpfile
+				Left err -> liftIO $ propagateerror err
+	  where
 		-- Don't verify the content from the remote,
 		-- because the client will do its own verification.
-		let vc = Remote.NoVerify
-		tryNonAsync (Remote.retrieveKeyFile r k af (fromRawFilePath tmpfile) nullMeterUpdate vc) >>= \case
-			Right _ -> liftIO $ senddata offset tmpfile
-			Left err -> liftIO $ propagateerror err
+		vc = Remote.NoVerify

+	streamdata (Offset offset) f size cancelv donev = do
+		sendlen offset size
+		waitforfile
+		x <- tryNonAsync $ do
+			fd <- openFdWithMode f ReadOnly Nothing defaultFileFlags
+			h <- fdToHandle fd
+			hSeek h AbsoluteSeek offset
+			senddata' h (getcontents size)
+		case x of
+			Left err -> do
+				throwM err
+			Right res -> return res
+	  where
+		-- The file doesn't exist at the start.
+		-- Wait for some data to be written to it as well,
+		-- in case an empty file is first created and then
+		-- overwritten. When there is an offset, wait for
+		-- the file to get that large. Note that this is not used
+		-- when the size is 0.
+		waitforfile = tryNonAsync (fromIntegral <$> getFileSize f) >>= \case
+			Right sz | sz > 0 && sz >= offset -> return ()
+			_ -> ifM (isEmptyMVar cancelv)
+				( do
+					threadDelaySeconds (Seconds 1)
+					waitforfile
+				, do
+					return ()
+				)
+
+		getcontents n h = unsafeInterleaveIO $ do
+			isdone <- isEmptyMVar donev <||> isEmptyMVar cancelv
+			c <- BS.hGet h defaultChunkSize
+			let n' = n - fromIntegral (BS.length c)
+			let c' = L.fromChunks [BS.take (fromIntegral n) c]
+			if BS.null c
+				then if isdone
+					then return mempty
+					else do
+						-- Wait for more data to be
+						-- written to the file.
+						threadDelaySeconds (Seconds 1)
+						getcontents n h
+				else if n' > 0
+					then do
+						-- unsafeInterleaveIO causes
+						-- this to be deferred until
+						-- data is read from the lazy
+						-- ByteString.
+						cs <- getcontents n' h
+						return $ L.append c' cs
+					else return c'
+
 	senddata (Offset offset) f = do
 		size <- fromIntegral <$> getFileSize f
-		let n = max 0 (size - offset)
-		sendmessage $ DATA (Len n)
+		sendlen offset size
 		withBinaryFile (fromRawFilePath f) ReadMode $ \h -> do
 			hSeek h AbsoluteSeek offset
-			sendbs =<< L.hGetContents h
+			senddata' h L.hGetContents
+
+	senddata' h getcontents = do
+			sendbs =<< getcontents h
 			-- Important to keep the handle open until
 			-- the client responds. The bytestring
 			-- could still be lazily streaming out to
@@ -272,6 +355,11 @@ proxySpecialRemote protoversion r ihdl ohdl owaitv oclosedv mexportdb = go
 				Just FAILURE -> return ()
 				Just _ -> giveup "protocol error"
 				Nothing -> return ()
+
+	sendlen offset size = do
+		let n = max 0 (size - offset)
+		sendmessage $ DATA (Len n)
+

 {- Check if this repository can proxy for a specified remote uuid,
  - and if so enable proxying for it. -}
2024-10-07 15:12:09 -04:00
Joey Hess
b800ea6826
2 level toc 2024-09-02 16:32:28 -04:00