git-annex/Command/Reinject.hs

{- git-annex command
 -
 - Copyright 2011-2016 Joey Hess <id@joeyh.name>
 -
 - Licensed under the GNU AGPL version 3 or higher.
 -}

module Command.Reinject where

import Command
import Logs.Location
import Annex.Content
import Backend
import Types.KeySource
import Utility.Metered

cmd :: Command
cmd = command "reinject" SectionUtility 
	"inject content of file back into annex"
	(paramRepeating (paramPair "SRC" "DEST"))
	(seek <$$> optParser)

data ReinjectOptions = ReinjectOptions
	{ params :: CmdParams
	, knownOpt :: Bool
	}

optParser :: CmdParamsDesc -> Parser ReinjectOptions
optParser desc = ReinjectOptions
	<$> cmdParams desc
	<*> switch
		( long "known"
		<> help "inject all known files"
		<> hidden
		)

seek :: ReinjectOptions -> CommandSeek
seek os
	| knownOpt os = withStrings (commandAction . startKnown) (params os)
	| otherwise = withWords (commandAction . startSrcDest) (params os)

startSrcDest :: [FilePath] -> CommandStart
startSrcDest (src:dest:[])
	| src == dest = stop
	| otherwise = notAnnexed src $ ifAnnexed (toRawFilePath dest) go stop
  where
	go key = starting "reinject" (ActionItemOther (Just src)) $
		ifM (verifyKeyContent RetrievalAllKeysSecure DefaultVerify UnVerified key src)
			( perform src key
			, giveup $ src ++ " does not have expected content of " ++ dest
			)
startSrcDest _ = giveup "specify a src file and a dest file"

startKnown :: FilePath -> CommandStart
startKnown src = notAnnexed src $
	starting "reinject" (ActionItemOther (Just src)) $ do
		mkb <- genKey (KeySource src src Nothing) nullMeterUpdate Nothing
		case mkb of
			Nothing -> error "Failed to generate key"
			Just (key, _) -> ifM (isKnownKey key)
				( perform src key
				, do
					warning "Not known content; skipping"
					next $ return True
				)

notAnnexed :: FilePath -> CommandStart -> CommandStart
notAnnexed src = ifAnnexed (toRawFilePath src) $
	giveup $ "cannot used annexed file as src: " ++ src

perform :: FilePath -> Key -> CommandPerform
perform src key = ifM move
	( next $ cleanup key
	, error "failed"
	)
  where
	move = checkDiskSpaceToGet key False $
		moveAnnex key src

cleanup :: Key -> CommandCleanup
cleanup key = do
	logStatus key InfoPresent
	return True
Removed the setkey command, and added a setcontent command with a more useful interface. 2011-10-31 16:33:41 +00:00			`{- git-annex command`
			`-`
reinject: Added new mode which can reinject known files into the annex. For example: git-annex reinject --known /mnt/backup/* 2016-04-22 17:49:32 +00:00			`- Copyright 2011-2016 Joey Hess <id@joeyh.name>`
Removed the setkey command, and added a setcontent command with a more useful interface. 2011-10-31 16:33:41 +00:00			`-`
update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.) 2019-03-13 19:48:14 +00:00			`- Licensed under the GNU AGPL version 3 or higher.`
Removed the setkey command, and added a setcontent command with a more useful interface. 2011-10-31 16:33:41 +00:00			`-}`

better command name 2011-10-31 19:18:41 +00:00			`module Command.Reinject where`
Removed the setkey command, and added a setcontent command with a more useful interface. 2011-10-31 16:33:41 +00:00
			`import Command`
			`import Logs.Location`
			`import Annex.Content`
reinject: Added new mode which can reinject known files into the annex. For example: git-annex reinject --known /mnt/backup/* 2016-04-22 17:49:32 +00:00			`import Backend`
			`import Types.KeySource`
plumb MeterUpdate into getKey No behavior changes, but this shows everywhere that a progress meter could be displayed when hashing a file to add to the annex. Many of the places don't make sense to display a progress meter though, eg when importing the copy of the file probably swamps the hashing of the file. 2019-06-25 15:37:52 +00:00			`import Utility.Metered`
Removed the setkey command, and added a setcontent command with a more useful interface. 2011-10-31 16:33:41 +00:00
convert all commands to work with optparse-applicative Still no options though. 2015-07-08 19:08:02 +00:00			`cmd :: Command`
			`cmd = command "reinject" SectionUtility`
reinject: Added new mode which can reinject known files into the annex. For example: git-annex reinject --known /mnt/backup/* 2016-04-22 17:49:32 +00:00			`"inject content of file back into annex"`
avoid too-long command synopsis It was making git-annex usage output columns far too wide 2016-11-30 18:16:57 +00:00			`(paramRepeating (paramPair "SRC" "DEST"))`
reinject: Added new mode which can reinject known files into the annex. For example: git-annex reinject --known /mnt/backup/* 2016-04-22 17:49:32 +00:00			`(seek <$$> optParser)`
Removed the setkey command, and added a setcontent command with a more useful interface. 2011-10-31 16:33:41 +00:00
reinject: Added new mode which can reinject known files into the annex. For example: git-annex reinject --known /mnt/backup/* 2016-04-22 17:49:32 +00:00			`data ReinjectOptions = ReinjectOptions`
			`{ params :: CmdParams`
			`, knownOpt :: Bool`
			`}`
Removed the setkey command, and added a setcontent command with a more useful interface. 2011-10-31 16:33:41 +00:00
reinject: Added new mode which can reinject known files into the annex. For example: git-annex reinject --known /mnt/backup/* 2016-04-22 17:49:32 +00:00			`optParser :: CmdParamsDesc -> Parser ReinjectOptions`
			`optParser desc = ReinjectOptions`
			`<$> cmdParams desc`
			`<*> switch`
			`( long "known"`
			`<> help "inject all known files"`
			`<> hidden`
			`)`

			`seek :: ReinjectOptions -> CommandSeek`
			`seek os`
move commandAction out of CmdLine.Seek This is groundwork for nested seek loops, eg seeking over all files and then performing commandActions on a list of remotes, which can be done concurrently. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon. 2018-10-01 18:12:06 +00:00			`\| knownOpt os = withStrings (commandAction . startKnown) (params os)`
			`\| otherwise = withWords (commandAction . startSrcDest) (params os)`
reinject: Added new mode which can reinject known files into the annex. For example: git-annex reinject --known /mnt/backup/* 2016-04-22 17:49:32 +00:00
			`startSrcDest :: [FilePath] -> CommandStart`
			`startSrcDest (src:dest:[])`
cleanup 2011-10-31 20:46:51 +00:00			`\| src == dest = stop`
get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo. 2019-12-04 17:15:34 +00:00			`\| otherwise = notAnnexed src $ ifAnnexed (toRawFilePath dest) go stop`
import --reinject-duplicates This is the same as running git annex reinject --known, followed by git-annex import. The advantage to having it in one command is that it only has to hash each file once; the two commands have to hash the imported files a second time. This commit was sponsored by Shane-o on Patreon. 2017-02-09 19:40:44 +00:00			`where`
make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser. 2019-06-06 19:42:30 +00:00			`go key = starting "reinject" (ActionItemOther (Just src)) $`
			`ifM (verifyKeyContent RetrievalAllKeysSecure DefaultVerify UnVerified key src)`
			`( perform src key`
			`, giveup $ src ++ " does not have expected content of " ++ dest`
			`)`
Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin. 2016-11-16 01:29:54 +00:00			`startSrcDest _ = giveup "specify a src file and a dest file"`
reinject: Added new mode which can reinject known files into the annex. For example: git-annex reinject --known /mnt/backup/* 2016-04-22 17:49:32 +00:00
			`startKnown :: FilePath -> CommandStart`
make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser. 2019-06-06 19:42:30 +00:00			`startKnown src = notAnnexed src $`
			`starting "reinject" (ActionItemOther (Just src)) $ do`
plumb MeterUpdate into getKey No behavior changes, but this shows everywhere that a progress meter could be displayed when hashing a file to add to the annex. Many of the places don't make sense to display a progress meter though, eg when importing the copy of the file probably swamps the hashing of the file. 2019-06-25 15:37:52 +00:00			`mkb <- genKey (KeySource src src Nothing) nullMeterUpdate Nothing`
make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser. 2019-06-06 19:42:30 +00:00			`case mkb of`
			`Nothing -> error "Failed to generate key"`
			`Just (key, _) -> ifM (isKnownKey key)`
			`( perform src key`
			`, do`
			`warning "Not known content; skipping"`
			`next $ return True`
			`)`
reinject: Added new mode which can reinject known files into the annex. For example: git-annex reinject --known /mnt/backup/* 2016-04-22 17:49:32 +00:00
			`notAnnexed :: FilePath -> CommandStart -> CommandStart`
get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo. 2019-12-04 17:15:34 +00:00			`notAnnexed src = ifAnnexed (toRawFilePath src) $`
Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin. 2016-11-16 01:29:54 +00:00			`giveup $ "cannot used annexed file as src: " ++ src`
Removed the setkey command, and added a setcontent command with a more useful interface. 2011-10-31 16:33:41 +00:00
import --reinject-duplicates This is the same as running git annex reinject --known, followed by git-annex import. The advantage to having it in one command is that it only has to hash each file once; the two commands have to hash the imported files a second time. This commit was sponsored by Shane-o on Patreon. 2017-02-09 19:40:44 +00:00			`perform :: FilePath -> Key -> CommandPerform`
			`perform src key = ifM move`
Do verification of checksums of annex objects downloaded from remotes. * When annex objects are received into git repositories, their checksums are verified then too. * To get the old, faster, behavior of not verifying checksums, set annex.verify=false, or remote.<name>.annex-verify=false. * setkey, rekey: These commands also now verify that the provided file matches the key, unless annex.verify=false. * reinject: Already verified content; this can now be disabled by setting annex.verify=false. recvkey and reinject already did verification, so removed now duplicate code from them. fsck still does its own verification, which is ok since it does not use getViaTmp, so verification doesn't happen twice when using fsck --from. 2015-10-01 19:54:37 +00:00			`( next $ cleanup key`
			`, error "failed"`
			`)`
where indentation 2012-11-12 05:05:04 +00:00			`where`
annex.securehashesonly Cryptographically secure hashes can be forced to be used in a repository, by setting annex.securehashesonly. This does not prevent the git repository from containing files with insecure hashes, but it does prevent the content of such files from being pulled into .git/annex/objects from another repository. We want to make sure that at no point does git-annex accept content into .git/annex/objects that is hashed with an insecure key. Here's how it was done: * .git/annex/objects/xx/yy/KEY/ is kept frozen, so nothing can be written to it normally * So every place that writes content must call, thawContent or modifyContent. We can audit for these, and be sure we've considered all cases. * The main functions are moveAnnex, and linkToAnnex; these were made to check annex.securehashesonly, and are the main security boundary for annex.securehashesonly. * Most other calls to modifyContent deal with other files in the KEY directory (inode cache etc). The other ones that mess with the content are: - Annex.Direct.toDirectGen, in which content already in the annex directory is moved to the direct mode file, so not relevant. - fix and lock, which don't add new content - Command.ReKey.linkKey, which manually unlocks it to make a copy. * All other calls to thawContent appear safe. Made moveAnnex return a Bool, so checked all callsites and made them deal with a failure in appropriate ways. linkToAnnex simply returns LinkAnnexFailed; all callsites already deal with it failing in appropriate ways. This commit was sponsored by Riku Voipio. 2017-02-27 17:01:32 +00:00			`move = checkDiskSpaceToGet key False $`
import --reinject-duplicates This is the same as running git annex reinject --known, followed by git-annex import. The advantage to having it in one command is that it only has to hash each file once; the two commands have to hash the imported files a second time. This commit was sponsored by Shane-o on Patreon. 2017-02-09 19:40:44 +00:00			`moveAnnex key src`
Removed the setkey command, and added a setcontent command with a more useful interface. 2011-10-31 16:33:41 +00:00
reinject: When the provided file doesn't match, leave it where it is, rather than moving to .git/annex/bad/ 2012-09-16 05:17:48 +00:00			`cleanup :: Key -> CommandCleanup`
			`cleanup key = do`
Removed the setkey command, and added a setcontent command with a more useful interface. 2011-10-31 16:33:41 +00:00			`logStatus key InfoPresent`
reinject: When the provided file doesn't match, leave it where it is, rather than moving to .git/annex/bad/ 2012-09-16 05:17:48 +00:00			`return True`