git-annex/Command/RecvKey.hs

{- git-annex command
 -
 - Copyright 2010 Joey Hess <id@joeyh.name>
 -
 - Licensed under the GNU AGPL version 3 or higher.
 -}

module Command.RecvKey where

import Command
import Annex.Content
import Annex.Action
import Annex
import Utility.Rsync
import Types.Transfer
import Logs.Location
import Command.SendKey (fieldTransfer)

cmd :: Command
cmd = noCommit $ command "recvkey" SectionPlumbing 
	"runs rsync in server mode to receive content"
	paramKey (withParams seek)

seek :: CmdParams -> CommandSeek
seek = withKeys (commandAction . start)

start :: (SeekInput, Key) -> CommandStart
start (_, key) = fieldTransfer Download key $ \_p -> do
	-- This matches the retrievalSecurityPolicy of Remote.Git
	let rsp = RetrievalAllKeysSecure
	ifM (getViaTmp rsp DefaultVerify key (AssociatedFile Nothing) Nothing go)
		( do
			logStatus key InfoPresent
			_ <- quiesce True
			return True
		, return False
		)
  where
	go tmp = unVerified $ do
		opts <- filterRsyncSafeOptions . maybe [] words
			<$> getField "RsyncOptions"
		liftIO $ rsyncServerReceive (map Param opts) (fromRawFilePath tmp)
git-annex-shell is complete still not used 2010-12-31 17:39:30 +00:00			`{- git-annex command`
			`-`
update my email address and homepage url 2015-01-21 16:50:09 +00:00			`- Copyright 2010 Joey Hess <id@joeyh.name>`
git-annex-shell is complete still not used 2010-12-31 17:39:30 +00:00			`-`
update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.) 2019-03-13 19:48:14 +00:00			`- Licensed under the GNU AGPL version 3 or higher.`
git-annex-shell is complete still not used 2010-12-31 17:39:30 +00:00			`-}`

			`module Command.RecvKey where`

			`import Command`
rename 2011-10-04 04:40:47 +00:00			`import Annex.Content`
Improve shutdown due to --time-limit, especially for fsck * Perform a clean shutdown when --time-limit is reached. This includes running queued git commands, and cleanup actions normally run when a command is finished. * fsck: Commit incremental fsck database when --time-limit is reached. Previously, some of the last files fscked did not make it into the database when using --time-limit. Note that this changes Annex.addCleanup hooks, to run after --time-limit expires. Fsck was using such a hook to clean up after a --incremental-schedule, and that shouldn't run when --time-limit exipires it. So, instead, moved that cleanup code to be run by cleanupIncremental. Resulted in some data type juggling. 2015-07-31 20:00:13 +00:00			`import Annex.Action`
Make git-annex-shell call the command with its (safe) options. 2013-03-29 00:34:07 +00:00			`import Annex`
renamed RsyncFile -> Rsync 2012-09-19 18:28:32 +00:00			`import Utility.Rsync`
get, move, copy, mirror: Added --failed switch which retries failed copies/moves Note that get --from foo --failed will get things that a previous get --from bar tried and failed to get, etc. I considered making --failed only retry transfers from the same remote, but it was easier, and seems more useful, to not have the same remote requirement. Noisy due to some refactoring into Types/ 2016-08-03 16:37:12 +00:00			`import Types.Transfer`
make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process. 2020-12-11 15:33:10 +00:00			`import Logs.Location`
keep logs of failed transfers, and requeue them when doing a non-full scan of a remote 2012-08-23 19:22:23 +00:00			`import Command.SendKey (fieldTransfer)`
git-annex-shell is complete still not used 2010-12-31 17:39:30 +00:00
started converting to use optparse-applicative This is a work in progress. It compiles and is able to do basic command dispatch, including git autocorrection, while using optparse-applicative for the core commandline parsing. * Many commands are temporarily disabled before conversion. * Options are not wired in yet. * cmdnorepo actions don't work yet. Also, removed the [Command] list, which was only used in one place. 2015-07-08 16:33:27 +00:00			`cmd :: Command`
convert all commands to work with optparse-applicative Still no options though. 2015-07-08 19:08:02 +00:00			`cmd = noCommit $ command "recvkey" SectionPlumbing`
			`"runs rsync in server mode to receive content"`
			`paramKey (withParams seek)`
git-annex-shell is complete still not used 2010-12-31 17:39:30 +00:00
started converting to use optparse-applicative This is a work in progress. It compiles and is able to do basic command dispatch, including git autocorrection, while using optparse-applicative for the core commandline parsing. * Many commands are temporarily disabled before conversion. * Options are not wired in yet. * cmdnorepo actions don't work yet. Also, removed the [Command] list, which was only used in one place. 2015-07-08 16:33:27 +00:00			`seek :: CmdParams -> CommandSeek`
move commandAction out of CmdLine.Seek This is groundwork for nested seek loops, eg seeking over all files and then performing commandActions on a list of remotes, which can be done concurrently. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2018-10-01 18:12:06 +00:00			`seek = withKeys (commandAction . start)`
git-annex-shell is complete still not used 2010-12-31 17:39:30 +00:00
add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact. 2020-09-14 20:49:33 +00:00			`start :: (SeekInput, Key) -> CommandStart`
			`start (_, key) = fieldTransfer Download key $ \_p -> do`
enforce retrievalSecurityPolicy Leveraged the existing verification code by making it also check the retrievalSecurityPolicy. Also, prevented getViaTmp from running the download action at all when the retrievalSecurityPolicy is going to prevent verifying and so storing it. Added annex.security.allow-unverified-downloads. A per-remote version would be nice to have too, but would need more plumbing, so KISS. (Bill the Cat reference not too over the top I hope. The point is to make this something the user reads the documentation for before using.) A few calls to verifyKeyContent and getViaTmp, that don't involve downloads from remotes, have RetrievalAllKeysSecure hard-coded. It was also hard-coded for P2P.Annex and Command.RecvKey, to match the values of the corresponding remotes. A few things use retrieveKeyFile/retrieveKeyFileCheap without going through getViaTmp. * Command.Fsck when downloading content from a remote to verify it. That content does not get into the annex, so this is ok. * Command.AddUrl when using a remote to download an url; this is new content being added, so this is ok. This commit was sponsored by Fernando Jimenez on Patreon. 2018-06-21 17:34:11 +00:00			`-- This matches the retrievalSecurityPolicy of Remote.Git`
			`let rsp = RetrievalAllKeysSecure`
disk free checking for unsized keys Improve disk free space checking when transferring unsized keys to local git remotes. Since the size of the object file is known, can check that instead. Getting unsized keys from local git remotes does not check the actual object size. It would be harder to handle that direction because the size check is run locally, before anything involving the remote is done. So it doesn't know the size of the file on the remote. Also, transferring unsized keys to other remotes, including ssh remotes and p2p remotes don't do disk size checking for unsized keys. This would need a change in protocol. (It does seem like it would be possible to implement the same thing for directory special remotes though.) In some sense, it might be better to not ever do disk free checking for unsized keys, than to do it only sometimes. A user might notice this direction working and consider it a bug that the other direction does not. On the other hand, disk reserve checking is not implemented for most special remotes at all, and yet it is implemented for a few, which is also inconsistent, but best effort. And so doing this best effort seems to make some sense. Fundamentally, if the user wants the size to always be checked, they should not use unsized keys. Sponsored-by: Brock Spratlen on Patreon 2024-01-16 18:29:10 +00:00			`ifM (getViaTmp rsp DefaultVerify key (AssociatedFile Nothing) Nothing go)`
fix "storeKey when already present" test for git-annex-shell transfers Now git-annex-shell recvkey, when the key is already present, allows another copy to be rsynced up, and just throws it away. This same behavior could have already happened before, when eg, two repos tried to upload the same object at the same time. So this makes the test suite pass, and should not add any bad behavior, other than slightly more work being done in a rather edge case. This relies on moveAnnex's behavior of keeping the current version of an object. 2014-08-04 13:16:47 +00:00			`( do`
make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process. 2020-12-11 15:33:10 +00:00			`logStatus key InfoPresent`
avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project 2022-10-12 17:50:46 +00:00			`_ <- quiesce True`
fix "storeKey when already present" test for git-annex-shell transfers Now git-annex-shell recvkey, when the key is already present, allows another copy to be rsynced up, and just throws it away. This same behavior could have already happened before, when eg, two repos tried to upload the same object at the same time. So this makes the test suite pass, and should not add any bad behavior, other than slightly more work being done in a rather edge case. This relies on moveAnnex's behavior of keeping the current version of an object. 2014-08-04 13:16:47 +00:00			`return True`
			`, return False`
			`)`
safe recv-key in direct mode Checks the key's size and checksum. This is sorta expensive, but it avoids needing to add another round-trip to the protocol. 2013-01-11 19:43:09 +00:00			`where`
other 80% of avoding verification when hard linking to objects in shared repo In c6632ee5c8e66c26ef18317f56ae02bae1e7e280, it actually only handled uploading objects to a shared repository. To avoid verification when downloading objects from a shared repository, was a lot harder. On the plus side, if the process of downloading a file from a remote is able to verify its content on the side, the remote can indicate this now, and avoid the extra post-download verification. As of yet, I don't have any remotes (except Git) using this ability. Some more work would be needed to support it in special remotes. It would make sense for tahoe to implicitly verify things downloaded from it; as long as you trust your tahoe server (which typically runs locally), there's cryptographic integrity. OTOH, despite bup being based on shas, a bup repo under an attacker's control could have the git ref used for an object changed, and so a bup repo shouldn't implicitly verify. Indeed, tahoe seems unique in being trustworthy enough to implicitly verify. 2015-10-02 17:56:42 +00:00			`go tmp = unVerified $ do`
minor refactoring 2013-03-30 23:05:51 +00:00			`opts <- filterRsyncSafeOptions . maybe [] words`
			`<$> getField "RsyncOptions"`
more RawFilePath conversion This commit was sponsored by Luke Shumaker on Patreon. 2020-11-02 20:31:28 +00:00			`liftIO $ rsyncServerReceive (map Param opts) (fromRawFilePath tmp)`