importtree: support preferred content expressions needing keys

When importing from a special remote, support preferred content expressions
that use terms that match on keys (eg "present", "copies=1"). Such terms
are ignored when importing, since the key is not known yet.

When "standard" or "groupwanted" is used, the terms in those
expressions also get pruned accordingly.

This does allow setting preferred content to "not (copies=1)" to make a
special remote into a "source" type of repository. Importing from it will
import all files. Then exporting to it will drop all files from it.

In the case of setting preferred content to "present", it's pruned on
import, so everything gets imported from it. Then on export, it's applied,
and everything in it is left on it, and no new content is exported to it.

Since the old behavior on these preferred content expressions was for
importtree to error out, there's no backwards compatability to worry about.
Except that sync/pull/etc will now import where before it errored out.
This commit is contained in:
Joey Hess 2023-12-18 16:27:26 -04:00
parent 0e161a7404
commit 9a67ed0f10
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
8 changed files with 94 additions and 54 deletions

View file

@ -16,7 +16,6 @@ module Annex.FileMatcher (
matchAll, matchAll,
PreferredContentData(..), PreferredContentData(..),
preferredContentTokens, preferredContentTokens,
preferredContentKeylessTokens,
preferredContentParser, preferredContentParser,
ParseToken, ParseToken,
parsedToMatcher, parsedToMatcher,
@ -139,19 +138,15 @@ tokenizeMatcher = filter (not . null) . concatMap splitparens . words
where where
splitparens = segmentDelim (`elem` "()") splitparens = segmentDelim (`elem` "()")
commonKeylessTokens :: LimitBy -> [ParseToken (MatchFiles Annex)] commonTokens :: LimitBy -> [ParseToken (MatchFiles Annex)]
commonKeylessTokens lb = commonTokens lb =
[ SimpleToken "anything" (simply limitAnything) [ SimpleToken "anything" (simply limitAnything)
, SimpleToken "nothing" (simply limitNothing) , SimpleToken "nothing" (simply limitNothing)
, ValueToken "include" (usev limitInclude) , ValueToken "include" (usev limitInclude)
, ValueToken "exclude" (usev limitExclude) , ValueToken "exclude" (usev limitExclude)
, ValueToken "largerthan" (usev $ limitSize lb "largerthan" (>)) , ValueToken "largerthan" (usev $ limitSize lb "largerthan" (>))
, ValueToken "smallerthan" (usev $ limitSize lb "smallerthan" (<)) , ValueToken "smallerthan" (usev $ limitSize lb "smallerthan" (<))
] , SimpleToken "unused" (simply limitUnused)
commonKeyedTokens :: [ParseToken (MatchFiles Annex)]
commonKeyedTokens =
[ SimpleToken "unused" (simply limitUnused)
] ]
data PreferredContentData = PCD data PreferredContentData = PCD
@ -162,25 +157,12 @@ data PreferredContentData = PCD
, repoUUID :: Maybe UUID , repoUUID :: Maybe UUID
} }
-- Tokens of preferred content expressions that do not need a Key to be preferredContentTokens :: PreferredContentData -> [ParseToken (MatchFiles Annex)]
-- known. preferredContentTokens pcd =
--
-- When importing from a special remote, this is used to match
-- some preferred content expressions before the content is downloaded,
-- so the Key is not known.
preferredContentKeylessTokens :: PreferredContentData -> [ParseToken (MatchFiles Annex)]
preferredContentKeylessTokens pcd =
[ SimpleToken "standard" (call "standard" $ matchStandard pcd) [ SimpleToken "standard" (call "standard" $ matchStandard pcd)
, SimpleToken "groupwanted" (call "groupwanted" $ matchGroupWanted pcd) , SimpleToken "groupwanted" (call "groupwanted" $ matchGroupWanted pcd)
, SimpleToken "inpreferreddir" (simply $ limitInDir preferreddir "inpreferreddir") , SimpleToken "inpreferreddir" (simply $ limitInDir preferreddir "inpreferreddir")
] ++ commonKeylessTokens LimitAnnexFiles , SimpleToken "present" (simply $ limitPresent $ repoUUID pcd)
where
preferreddir = maybe "public" fromProposedAccepted $
M.lookup preferreddirField =<< (`M.lookup` configMap pcd) =<< repoUUID pcd
preferredContentKeyedTokens :: PreferredContentData -> [ParseToken (MatchFiles Annex)]
preferredContentKeyedTokens pcd =
[ SimpleToken "present" (simply $ limitPresent $ repoUUID pcd)
, SimpleToken "securehash" (simply limitSecureHash) , SimpleToken "securehash" (simply limitSecureHash)
, ValueToken "copies" (usev limitCopies) , ValueToken "copies" (usev limitCopies)
, ValueToken "lackingcopies" (usev $ limitLackingCopies "lackingcopies" False) , ValueToken "lackingcopies" (usev $ limitLackingCopies "lackingcopies" False)
@ -189,13 +171,10 @@ preferredContentKeyedTokens pcd =
, ValueToken "metadata" (usev limitMetaData) , ValueToken "metadata" (usev limitMetaData)
, ValueToken "inallgroup" (usev $ limitInAllGroup $ getGroupMap pcd) , ValueToken "inallgroup" (usev $ limitInAllGroup $ getGroupMap pcd)
, ValueToken "onlyingroup" (usev $ limitOnlyInGroup $ getGroupMap pcd) , ValueToken "onlyingroup" (usev $ limitOnlyInGroup $ getGroupMap pcd)
] ++ commonKeyedTokens ] ++ commonTokens LimitAnnexFiles
where
preferredContentTokens :: PreferredContentData -> [ParseToken (MatchFiles Annex)] preferreddir = maybe "public" fromProposedAccepted $
preferredContentTokens pcd = concat M.lookup preferreddirField =<< (`M.lookup` configMap pcd) =<< repoUUID pcd
[ preferredContentKeylessTokens pcd
, preferredContentKeyedTokens pcd
]
preferredContentParser :: [ParseToken (MatchFiles Annex)] -> String -> [ParseResult (MatchFiles Annex)] preferredContentParser :: [ParseToken (MatchFiles Annex)] -> String -> [ParseResult (MatchFiles Annex)]
preferredContentParser tokens = map (parseToken tokens) . tokenizeMatcher preferredContentParser tokens = map (parseToken tokens) . tokenizeMatcher
@ -210,8 +189,7 @@ mkMatchExpressionParser = do
const $ Left $ "\""++n++"\" not supported; not built with MagicMime support" const $ Left $ "\""++n++"\" not supported; not built with MagicMime support"
#endif #endif
let parse = parseToken $ let parse = parseToken $
commonKeyedTokens ++ commonTokens LimitDiskFiles ++
commonKeylessTokens LimitDiskFiles ++
#ifdef WITH_MAGICMIME #ifdef WITH_MAGICMIME
[ mimer "mimetype" $ [ mimer "mimetype" $
matchMagic "mimetype" getMagicMimeType providedMimeType userProvidedMimeType matchMagic "mimetype" getMagicMimeType providedMimeType userProvidedMimeType

View file

@ -990,20 +990,22 @@ addBackExportExcluded remote importtree =
- -
- Only keyless tokens are supported, because the keys are not known - Only keyless tokens are supported, because the keys are not known
- until an imported file is downloaded, which is too late to bother - until an imported file is downloaded, which is too late to bother
- excluding it from an import. - excluding it from an import. So prunes any tokens in the preferred
- content expression that need keys.
-} -}
makeImportMatcher :: Remote -> Annex (Either String (FileMatcher Annex)) makeImportMatcher :: Remote -> Annex (Either String (FileMatcher Annex))
makeImportMatcher r = load preferredContentKeylessTokens >>= \case makeImportMatcher r = load preferredContentTokens >>= \case
Nothing -> return $ Right (matchAll, matcherdesc) Nothing -> return $ Right (matchAll, matcherdesc)
Just (Right v) -> return $ Right (v, matcherdesc) Just (Right v) -> return $ Right (v, matcherdesc)
Just (Left err) -> load preferredContentTokens >>= \case Just (Left err) -> return $ Left err
Just (Left err') -> return $ Left err'
_ -> return $ Left $
"The preferred content expression contains terms that cannot be checked when importing: " ++ err
where where
load t = M.lookup (Remote.uuid r) . fst <$> preferredRequiredMapsLoad' t load t = M.lookup (Remote.uuid r) . fst
<$> preferredRequiredMapsLoad' pruneImportMatcher t
matcherdesc = MatcherDesc "preferred content" matcherdesc = MatcherDesc "preferred content"
pruneImportMatcher :: Utility.Matcher.Matcher (MatchFiles a) -> Utility.Matcher.Matcher (MatchFiles a)
pruneImportMatcher = Utility.Matcher.pruneMatcher matchNeedsKey
{- Gets the ImportableContents from the remote. {- Gets the ImportableContents from the remote.
- -
- Filters out any paths that include a ".git" component, because git does - Filters out any paths that include a ".git" component, because git does

View file

@ -19,6 +19,10 @@ git-annex (10.20231130) UNRELEASED; urgency=medium
filesystems that have problems with such filenames. filesystems that have problems with such filenames.
* Lower precision of timestamps in git-annex branch, which can reduce the * Lower precision of timestamps in git-annex branch, which can reduce the
size of the branch by up to 8%. size of the branch by up to 8%.
* When importing from a special remote, support preferred content
expressions that use terms that match on keys (eg "present", "copies=1").
Such terms are ignored when importing, since the key is not known yet.
Before, such expressions caused the import to fail.
-- Joey Hess <id@joeyh.name> Thu, 30 Nov 2023 14:48:12 -0400 -- Joey Hess <id@joeyh.name> Thu, 30 Nov 2023 14:48:12 -0400

View file

@ -82,7 +82,7 @@ requiredContentMap = maybe (snd <$> preferredRequiredMapsLoad preferredContentTo
preferredRequiredMapsLoad :: (PreferredContentData -> [ParseToken (MatchFiles Annex)]) -> Annex (FileMatcherMap Annex, FileMatcherMap Annex) preferredRequiredMapsLoad :: (PreferredContentData -> [ParseToken (MatchFiles Annex)]) -> Annex (FileMatcherMap Annex, FileMatcherMap Annex)
preferredRequiredMapsLoad mktokens = do preferredRequiredMapsLoad mktokens = do
(pc, rc) <- preferredRequiredMapsLoad' mktokens (pc, rc) <- preferredRequiredMapsLoad' id mktokens
let pc' = handleunknown (MatcherDesc "preferred content") pc let pc' = handleunknown (MatcherDesc "preferred content") pc
let rc' = handleunknown (MatcherDesc "required content") rc let rc' = handleunknown (MatcherDesc "required content") rc
Annex.changeState $ \s -> s Annex.changeState $ \s -> s
@ -94,12 +94,12 @@ preferredRequiredMapsLoad mktokens = do
handleunknown matcherdesc = M.mapWithKey $ \u v -> handleunknown matcherdesc = M.mapWithKey $ \u v ->
(either (const $ unknownMatcher u) id v, matcherdesc) (either (const $ unknownMatcher u) id v, matcherdesc)
preferredRequiredMapsLoad' :: (PreferredContentData -> [ParseToken (MatchFiles Annex)]) -> Annex (M.Map UUID (Either String (Matcher (MatchFiles Annex))), M.Map UUID (Either String (Matcher (MatchFiles Annex)))) preferredRequiredMapsLoad' :: (Matcher (MatchFiles Annex) -> Matcher (MatchFiles Annex)) -> (PreferredContentData -> [ParseToken (MatchFiles Annex)]) -> Annex (M.Map UUID (Either String (Matcher (MatchFiles Annex))), M.Map UUID (Either String (Matcher (MatchFiles Annex))))
preferredRequiredMapsLoad' mktokens = do preferredRequiredMapsLoad' matcherf mktokens = do
groupmap <- groupMap groupmap <- groupMap
configmap <- remoteConfigMap configmap <- remoteConfigMap
let genmap l gm = let genmap l gm =
let mk u = makeMatcher groupmap configmap gm u mktokens let mk u = makeMatcher groupmap configmap gm u matcherf mktokens
in simpleMap in simpleMap
. parseLogOldWithUUID (\u -> mk u . decodeBS <$> A.takeByteString) . parseLogOldWithUUID (\u -> mk u . decodeBS <$> A.takeByteString)
<$> Annex.Branch.get l <$> Annex.Branch.get l
@ -123,13 +123,14 @@ makeMatcher
-> M.Map UUID RemoteConfig -> M.Map UUID RemoteConfig
-> M.Map Group PreferredContentExpression -> M.Map Group PreferredContentExpression
-> UUID -> UUID
-> (Matcher (MatchFiles Annex) -> Matcher (MatchFiles Annex))
-> (PreferredContentData -> [ParseToken (MatchFiles Annex)]) -> (PreferredContentData -> [ParseToken (MatchFiles Annex)])
-> PreferredContentExpression -> PreferredContentExpression
-> Either String (Matcher (MatchFiles Annex)) -> Either String (Matcher (MatchFiles Annex))
makeMatcher groupmap configmap groupwantedmap u mktokens = go True True makeMatcher groupmap configmap groupwantedmap u matcherf mktokens = go True True
where where
go expandstandard expandgroupwanted expr go expandstandard expandgroupwanted expr
| null (lefts tokens) = Right $ generate $ rights tokens | null (lefts tokens) = Right $ matcherf $ generate $ rights tokens
| otherwise = Left $ unwords $ lefts tokens | otherwise = Left $ unwords $ lefts tokens
where where
tokens = preferredContentParser (mktokens pcd) expr tokens = preferredContentParser (mktokens pcd) expr

View file

@ -24,6 +24,7 @@ module Utility.Matcher (
MatchResult(..), MatchResult(..),
syntaxToken, syntaxToken,
generate, generate,
pruneMatcher,
match, match,
match', match',
matchM, matchM,
@ -99,6 +100,28 @@ generate = simplify . process MAny . implicitAnd . tokenGroups
simplify (MNot x) = MNot (simplify x) simplify (MNot x) = MNot (simplify x)
simplify x = x simplify x = x
{- Prunes selected ops from the Matcher. -}
pruneMatcher :: (op -> Bool) -> Matcher op -> Matcher op
pruneMatcher f = fst . go
where
go MAny = (MAny, False)
go (MAnd a b) = case (go a, go b) of
((_, True), (_, True)) -> (MAny, True)
((a', False), (b', False)) -> (MAnd a' b', False)
((_, True), (b', False)) -> (b', False)
((a', False), (_, True)) -> (a', False)
go (MOr a b) = case (go a, go b) of
((_, True), (_, True)) -> (MAny, True)
((a', False), (b', False)) -> (MOr a' b', False)
((_, True), (b', False)) -> (b', False)
((a', False), (_, True)) -> (a', False)
go (MNot a) = case go a of
(_, True) -> (MAny, True)
(a', False) -> (MNot a', False)
go (MOp op)
| f op = (MAny, True)
| otherwise = (MOp op, False)
data TokenGroup op = One (Token op) | Group [TokenGroup op] data TokenGroup op = One (Token op) | Group [TokenGroup op]
deriving (Show, Eq) deriving (Show, Eq)

View file

@ -74,16 +74,17 @@ Any files that are gitignored will not be included in the import,
but will be left on the remote. but will be left on the remote.
When the special remote has a preferred content expression set by When the special remote has a preferred content expression set by
[[git-annex-wanted]](1), it will be honored when importing from it. [[git-annex-wanted]](1), that is used to pick which files to import from
Files that are not preferred content of the remote will not be it. Files that are not preferred content of the remote will not be
imported from it, but will be left on the remote. imported from it, but will be left on the remote.
However, preferred content expressions that relate to the key So for example, a preferred content expression like
can't be matched when importing, because the content of the file is not `"include=*.jpeg or largerthan=100mb"` will make only jpegs and
known. Importing will fail when such a preferred content expression is large files be imported.
set. This includes expressions containing "copies=", "metadata=", and other
things that depend on the key. Preferred content expressions containing Parts of a preferred content expression that relate to the key,
"include=", "exclude=" "smallerthan=", "largerthan=" will work. such as "copies=" are ignored when importing, because the key
is not known before importing.
Things in the expression like "include=" match relative to the top of Things in the expression like "include=" match relative to the top of
the tree of files on the remote, even when importing into a subdirectory. the tree of files on the remote, even when importing into a subdirectory.

View file

@ -13,3 +13,5 @@ key evaluate to true. Then it would import all files
turn out that the special remote doesn't want to contain particular content turn out that the special remote doesn't want to contain particular content
that was imported from it, and it would make sense that an export to the that was imported from it, and it would make sense that an export to the
special remote would remove those files. --[[Joey]] special remote would remove those files. --[[Joey]]
> [[done]] --[[Joey]]

View file

@ -0,0 +1,29 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2023-12-18T18:02:40Z"
content="""
Hmm, consider for example a camera. If the user wants to import all jpeg
files from it, and not export to it jpeg files that have a copy in an
archive, they might use:
not copies=archive:1 and include=*.jpeg
But on import, if "copies=archive:1" were made to evaluate to true
as suggested here, then this expression would not match, and so
nothing would be imported.
Seems that the approach needs to be instead to traverse the expression
and prune terms that operate on keys. So convert the example
above to "include=*.jpeg".
How to prune in some other cases:
not (copies=archive:1 and copies=backup:1) => anything
not (copies=archive:1 and include=*.jpeg) => not (include=*.jpeg)
not (copies=archive:1 or include=*.jpeg) => not (include=*.jpeg)
not ((not copies=archive:1) or include=*.jpeg) => not (include=*.jpeg)
"""]]