migrate: New --remove-size option

While intended for converting URL keys added by addurl --fast to be
as if added by addurl --relaxed, it can also be used to remove size
from other types of keys. Although that is not likely to be useful
for checksummed keys, I suppose it could be used for WORM or other
non-checksum keys.

Specifying the --remove-size option does not prevent other migrations
from taking effect if there's a key upgrade to perform, or if the
backend has changed. So --backend=URL needs to be used to prevent
migrating an URL key to the default backend.

Note that it's not possible to use git-annex migrate to convert from a
non-URL key to an URL key, as URL keys cannot be generated, except by
addurl. So while this can get the same effect as --relaxed would have
when addurl --fast was used, when --fast was not used, it won't work, or
if --backend=URL is not used will remove the size but not prevent
checksum verification, which is not useful. Due to this complexity, I
decided not to mention it in the git-annex addurl man page.

Sponsored-by: Jochen Bartl on Patreon
This commit is contained in:
Joey Hess 2021-11-12 12:59:30 -04:00
parent f3326b8b5a
commit 51b73ea1fc
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 87 additions and 13 deletions

View file

@ -13,6 +13,7 @@ git-annex (8.20211029) UNRELEASED; urgency=medium
* uninit: Avoid error message when there is no git-annex branch. * uninit: Avoid error message when there is no git-annex branch.
* git-lfs: Fix interoperability with gitlab's implementation of the * git-lfs: Fix interoperability with gitlab's implementation of the
git-lfs protocol, which requests Content-Encoding chunked. git-lfs protocol, which requests Content-Encoding chunked.
* migrate: New --remove-size option.
-- Joey Hess <id@joeyh.name> Mon, 01 Nov 2021 13:19:46 -0400 -- Joey Hess <id@joeyh.name> Mon, 01 Nov 2021 13:19:46 -0400

View file

@ -23,20 +23,33 @@ cmd :: Command
cmd = withGlobalOptions [annexedMatchingOptions] $ cmd = withGlobalOptions [annexedMatchingOptions] $
command "migrate" SectionUtility command "migrate" SectionUtility
"switch data to different backend" "switch data to different backend"
paramPaths (withParams seek) paramPaths (seek <$$> optParser)
seek :: CmdParams -> CommandSeek data MigrateOptions = MigrateOptions
seek = withFilesInGitAnnex ww seeker <=< workTreeItems ww { migrateThese :: CmdParams
, removeSize :: Bool
}
optParser :: CmdParamsDesc -> Parser MigrateOptions
optParser desc = MigrateOptions
<$> cmdParams desc
<*> switch
( long "remove-size"
<> help "remove size field from keys"
)
seek :: MigrateOptions -> CommandSeek
seek o = withFilesInGitAnnex ww seeker =<< workTreeItems ww (migrateThese o)
where where
ww = WarnUnmatchLsFiles ww = WarnUnmatchLsFiles
seeker = AnnexedFileSeeker seeker = AnnexedFileSeeker
{ startAction = start { startAction = start o
, checkContentPresent = Nothing , checkContentPresent = Nothing
, usesLocationLog = False , usesLocationLog = False
} }
start :: SeekInput -> RawFilePath -> Key -> CommandStart start :: MigrateOptions -> SeekInput -> RawFilePath -> Key -> CommandStart
start si file key = do start o si file key = do
forced <- Annex.getState Annex.force forced <- Annex.getState Annex.force
v <- Backend.getBackend (fromRawFilePath file) key v <- Backend.getBackend (fromRawFilePath file) key
case v of case v of
@ -46,9 +59,14 @@ start si file key = do
newbackend <- maybe defaultBackend return newbackend <- maybe defaultBackend return
=<< chooseBackend file =<< chooseBackend file
if (newbackend /= oldbackend || upgradableKey oldbackend key || forced) && exists if (newbackend /= oldbackend || upgradableKey oldbackend key || forced) && exists
then starting "migrate" (mkActionItem (key, file)) si $ then go False oldbackend newbackend
perform file key oldbackend newbackend else if removeSize o && exists
then go True oldbackend oldbackend
else stop else stop
where
go onlyremovesize oldbackend newbackend =
starting "migrate" (mkActionItem (key, file)) si $
perform onlyremovesize o file key oldbackend newbackend
{- Checks if a key is upgradable to a newer representation. {- Checks if a key is upgradable to a newer representation.
- -
@ -70,13 +88,14 @@ upgradableKey backend key = isNothing (fromKey keySize key) || backendupgradable
- data cannot get corrupted after the fsck but before the new key is - data cannot get corrupted after the fsck but before the new key is
- generated. - generated.
-} -}
perform :: RawFilePath -> Key -> Backend -> Backend -> CommandPerform perform :: Bool -> MigrateOptions -> RawFilePath -> Key -> Backend -> Backend -> CommandPerform
perform file oldkey oldbackend newbackend = go =<< genkey (fastMigrate oldbackend) perform onlyremovesize o file oldkey oldbackend newbackend = go =<< genkey (fastMigrate oldbackend)
where where
go Nothing = stop go Nothing = stop
go (Just (newkey, knowngoodcontent)) go (Just (newkey, knowngoodcontent))
| knowngoodcontent = finish newkey | knowngoodcontent = finish (removesize newkey)
| otherwise = stopUnless checkcontent $ finish newkey | otherwise = stopUnless checkcontent $
finish (removesize newkey)
checkcontent = Command.Fsck.checkBackend oldbackend oldkey Command.Fsck.KeyPresent afile checkcontent = Command.Fsck.checkBackend oldbackend oldkey Command.Fsck.KeyPresent afile
finish newkey = ifM (Command.ReKey.linkKey file oldkey newkey) finish newkey = ifM (Command.ReKey.linkKey file oldkey newkey)
( do ( do
@ -89,6 +108,7 @@ perform file oldkey oldbackend newbackend = go =<< genkey (fastMigrate oldbacken
next $ Command.ReKey.cleanup file newkey next $ Command.ReKey.cleanup file newkey
, giveup "failed creating link from old to new key" , giveup "failed creating link from old to new key"
) )
genkey _ | onlyremovesize = return $ Just (oldkey, False)
genkey Nothing = do genkey Nothing = do
content <- calcRepo $ gitAnnexLocation oldkey content <- calcRepo $ gitAnnexLocation oldkey
let source = KeySource let source = KeySource
@ -101,4 +121,7 @@ perform file oldkey oldbackend newbackend = go =<< genkey (fastMigrate oldbacken
genkey (Just fm) = fm oldkey newbackend afile >>= \case genkey (Just fm) = fm oldkey newbackend afile >>= \case
Just newkey -> return (Just (newkey, True)) Just newkey -> return (Just (newkey, True))
Nothing -> genkey Nothing Nothing -> genkey Nothing
removesize k
| removeSize o = alterKey k $ \kd -> kd { keySize = Nothing }
| otherwise = k
afile = AssociatedFile (Just file) afile = AssociatedFile (Just file)

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-11-12T16:10:01Z"
content="""
Migrating to URL will not do anything since they already are url keys.
This could be scripted using `git-annex examinekey` to
convert such a key into one without a size, and then using
`git-annex rekey`, which lets the new key for a file be specified.
However, that command is a low level plumbing command, and does not copy
over the url list from the old to the new key as migrate does (nor other
metadata). So you would also have to use `git-annex addurl file url`
afterwards to add the url, and use `git-annex metadata` if you have
metadata. Very unergonomic.
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2021-11-12T16:57:15Z"
content="""
Implemented: `git-annex migrate --remove-size --backend=URL`
Be sure to only run it on files using url keys, since it will also
remove sizes from other keys. (Or use `--inbackend=URL` with it.)
Do note that `git-annex migrate` can only migrate files whose content
is present. If you have never downloaded those urls, and `git-annex get`
cannot download them now, because their size has changed, you
won't be able to migrate data you don't have. In this case, re-running
`git-annex addurl` with `--relaxed` seems like the only option.
"""]]

View file

@ -39,6 +39,18 @@ it's best to run migrate in all of them.
* Also the [[git-annex-common-options]](1) can be used. * Also the [[git-annex-common-options]](1) can be used.
* `--remove-size`
Keys often include the size of their content, which is generally a useful
thing. In fact, this command defaults to adding missing size information
to keys. With this option, the size information is removed instead.
One use of this option is to convert URL keys that were added
by `git-annex addurl --fast` to ones that would have been added if
that command was run with the `--relaxed` option. Eg:
git-annex migrate --remove-size --backend=URL somefile
# SEE ALSO # SEE ALSO
[[git-annex]](1) [[git-annex]](1)

View file

@ -13,6 +13,9 @@ both the file, and the new key to use for it.
Multiple pairs of file and key can be given in a single command line. Multiple pairs of file and key can be given in a single command line.
Note that, unlike `git-annex migrate`, this does not copy over metadata,
urls, and other such information from the old to the new key
# OPTIONS # OPTIONS
* `--force` * `--force`
@ -37,6 +40,8 @@ Multiple pairs of file and key can be given in a single command line.
[[git-annex]](1) [[git-annex]](1)
[[git-annex-migrate]](1)
# AUTHOR # AUTHOR
Joey Hess <id@joeyh.name> Joey Hess <id@joeyh.name>