git-annex/Command/Migrate.hs

105 lines
3.3 KiB
Haskell
Raw Normal View History

{- git-annex command
-
- Copyright 2011 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
module Command.Migrate where
import Command
import Backend
import Types.Backend (canUpgradeKey, fastMigrate)
import Types.KeySource
2011-10-04 04:40:47 +00:00
import Annex.Content
import qualified Command.ReKey
import qualified Command.Fsck
import qualified Annex
import Logs.MetaData
import Logs.Web
import Utility.Metered
cmd :: Command
cmd = withGlobalOptions [annexedMatchingOptions] $
command "migrate" SectionUtility
"switch data to different backend"
paramPaths (withParams seek)
seek :: CmdParams -> CommandSeek
seek = withFilesInGitAnnex ww seeker <=< workTreeItems ww
where
ww = WarnUnmatchLsFiles
seeker = AnnexedFileSeeker
{ startAction = start
, checkContentPresent = Nothing
, usesLocationLog = False
}
start :: SeekInput -> RawFilePath -> Key -> CommandStart
start si file key = do
forced <- Annex.getState Annex.force
v <- Backend.getBackend (fromRawFilePath file) key
case v of
Nothing -> stop
Just oldbackend -> do
exists <- inAnnex key
newbackend <- maybe defaultBackend return
=<< chooseBackend file
if (newbackend /= oldbackend || upgradableKey oldbackend key || forced) && exists
then starting "migrate" (mkActionItem (key, file)) si $
make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser.
2019-06-06 19:42:30 +00:00
perform file key oldbackend newbackend
else stop
{- Checks if a key is upgradable to a newer representation.
-
- Reasons for migration:
- - Ideally, all keys have file size metadata. Old keys may not.
- - Something has changed in the backend, such as a bug fix.
-}
upgradableKey :: Backend -> Key -> Bool
upgradableKey backend key = isNothing (fromKey keySize key) || backendupgradable
where
backendupgradable = maybe False (\a -> a key) (canUpgradeKey backend)
{- Store the old backend's key in the new backend
- The old backend's key is not dropped from it, because there may
- be other files still pointing at that key.
-
- To ensure that the data we have for the old key is valid, it's
- fscked here. First we generate the new key. This ensures that the
- data cannot get corrupted after the fsck but before the new key is
- generated.
-}
perform :: RawFilePath -> Key -> Backend -> Backend -> CommandPerform
perform file oldkey oldbackend newbackend = go =<< genkey (fastMigrate oldbackend)
2012-11-12 05:05:04 +00:00
where
go Nothing = stop
go (Just (newkey, knowngoodcontent))
| knowngoodcontent = finish newkey
| otherwise = stopUnless checkcontent $ finish newkey
checkcontent = Command.Fsck.checkBackend oldbackend oldkey Command.Fsck.KeyPresent afile
finish newkey = ifM (Command.ReKey.linkKey file oldkey newkey)
( do
2017-10-16 16:54:00 +00:00
_ <- copyMetaData oldkey newkey
-- If the old key had some associated urls, record them for
-- the new key as well.
urls <- getUrls oldkey
forM_ urls $ \url ->
setUrlPresent newkey url
include locked files in the keys database associated files Before only unlocked files were included. The initial scan now scans for locked as well as unlocked files. This does mean it gets a little bit slower, although I optimised it as well as I think it can be. reconcileStaged changed to diff from the current index to the tree of the previous index. This lets it handle deletions as well, removing associated files for both locked and unlocked files, which did not always happen before. On upgrade, there will be no recorded previous tree, so it will diff from the empty tree to current index, and so will fully populate the associated files, as well as removing any stale associated files that were present due to them not being removed before. reconcileStaged now does a bit more work. Most of the time, this will just be due to running more often, after some change is made to the index, and since there will be few changes since the last time, it will not be a noticable overhead. What may turn out to be a noticable slowdown is after changing to a branch, it has to go through the diff from the previous index to the new one, and if there are lots of changes, that could take a long time. Also, after adding a lot of files, or deleting a lot of files, or moving a large subdirectory, etc. Command.Lock used removeAssociatedFile, but now that's wrong because a newly locked file still needs to have its associated file tracked. Command.Rekey used removeAssociatedFile when the file was unlocked. It could remove it also when it's locked, but it is not really necessary, because it changes the index, and so the next time git-annex run and accesses the keys db, reconcileStaged will run and update it. There are probably several other places that use addAssociatedFile and don't need to any more for similar reasons. But there's no harm in keeping them, and it probably is a good idea to, if only to support mixing this with older versions of git-annex. However, mixing this and older versions does risk reconcileStaged not running, if the older version already ran it on a given index state. So it's not a good idea to mix versions. This problem could be dealt with by changing the name of the gitAnnexKeysDbIndexCache, but that would leave the old file dangling, or it would need to keep trying to remove it.
2021-05-21 19:47:37 +00:00
next $ Command.ReKey.cleanup file newkey
2018-10-16 19:52:40 +00:00
, giveup "failed creating link from old to new key"
)
genkey Nothing = do
content <- calcRepo $ gitAnnexLocation oldkey
let source = KeySource
2020-02-21 13:34:59 +00:00
{ keyFilename = file
, contentLocation = content
, inodeCache = Nothing
}
newkey <- fst <$> genKey source nullMeterUpdate (Just newbackend)
return $ Just (newkey, False)
genkey (Just fm) = fm oldkey newbackend afile >>= \case
Just newkey -> return (Just (newkey, True))
Nothing -> genkey Nothing
afile = AssociatedFile (Just file)