Add --url option and url= preferred content expression

To match content that is recorded as present in an url.

Note that, this cannot ask remotes to provide an url using whereisKey, like
whereis does. Because preferred content expressions need to match the same
from multiple perspectives, and the remote would not always be available.

That's why the docs say "recorded as present", but still this may be
surprising to some who see an url in whereis output and are surprised they
cannot match on it.

The use of getDownloader is to strip the downloader prefix from urls like
"yt:". Note that, when OtherDownloader is used, this strips the ":" prefix,
and allows matching on those urls too.
This commit is contained in:
Joey Hess 2025-07-21 12:13:40 -04:00
parent 549569533b
commit d364e434c8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
7 changed files with 42 additions and 1 deletions

View file

@ -1,6 +1,6 @@
{- git-annex file matching {- git-annex file matching
- -
- Copyright 2012-2024 Joey Hess <id@joeyh.name> - Copyright 2012-2025 Joey Hess <id@joeyh.name>
- -
- Licensed under the GNU AGPL version 3 or higher. - Licensed under the GNU AGPL version 3 or higher.
-} -}
@ -194,6 +194,7 @@ preferredContentTokens pcd =
, ValueToken "approxlackingcopies" (usev $ limitLackingCopies "approxlackingcopies" True) , ValueToken "approxlackingcopies" (usev $ limitLackingCopies "approxlackingcopies" True)
, ValueToken "inbackend" (usev limitInBackend) , ValueToken "inbackend" (usev limitInBackend)
, ValueToken "metadata" (usev limitMetaData) , ValueToken "metadata" (usev limitMetaData)
, ValueToken "url" (usev limitUrl)
, ValueToken "inallgroup" (usev $ limitInAllGroup $ getGroupMap pcd) , ValueToken "inallgroup" (usev $ limitInAllGroup $ getGroupMap pcd)
, ValueToken "onlyingroup" (usev $ limitOnlyInGroup $ getGroupMap pcd) , ValueToken "onlyingroup" (usev $ limitOnlyInGroup $ getGroupMap pcd)
, ValueToken "balanced" (usev $ limitBalanced (repoUUID pcd) (getGroupMap pcd)) , ValueToken "balanced" (usev $ limitBalanced (repoUUID pcd) (getGroupMap pcd))

View file

@ -10,6 +10,8 @@ git-annex (10.20250631) UNRELEASED; urgency=medium
that have experienced the above bug. that have experienced the above bug.
* Fix symlinks generated to annexed content when in adjusted unlocked * Fix symlinks generated to annexed content when in adjusted unlocked
branch in a linked worktree on a filesystem not supporting symlinks. branch in a linked worktree on a filesystem not supporting symlinks.
* Add --url option and url= preferred content expression, to match
content that is recorded as present in an url.
-- Joey Hess <id@joeyh.name> Mon, 07 Jul 2025 15:59:42 -0400 -- Joey Hess <id@joeyh.name> Mon, 07 Jul 2025 15:59:42 -0400

View file

@ -348,6 +348,11 @@ keyMatchingOptions' =
<> help "match files with attached metadata" <> help "match files with attached metadata"
<> hidden <> hidden
) )
, annexOption (setAnnexState . Limit.addUrl) $ strOption
( long "url" <> metavar paramGlob
<> help "match files by url"
<> hidden
)
, annexFlag (setAnnexState Limit.Wanted.addWantGet) , annexFlag (setAnnexState Limit.Wanted.addWantGet)
( long "want-get" ( long "want-get"
<> help "match files the local repository wants to get" <> help "match files the local repository wants to get"

View file

@ -31,6 +31,7 @@ import Types.FileMatcher
import Types.MetaData import Types.MetaData
import Annex.MetaData import Annex.MetaData
import Logs.MetaData import Logs.MetaData
import Logs.Web
import Logs.Group import Logs.Group
import Logs.Unused import Logs.Unused
import Logs.Location import Logs.Location
@ -867,6 +868,26 @@ limitMetaData s = case parseMetaDataMatcher s of
. S.filter matching . S.filter matching
. metaDataValues f <$> getCurrentMetaData k . metaDataValues f <$> getCurrentMetaData k
addUrl :: String -> Annex ()
addUrl = addLimit . limitUrl
limitUrl :: MkLimit Annex
limitUrl glob = Right $ MatchFiles
{ matchAction = const $ const $ checkKey check
, matchNeedsFileName = False
, matchNeedsFileContent = False
, matchNeedsKey = True
, matchNeedsLocationLog = False
, matchNeedsLiveRepoSize = False
, matchNegationUnstable = False
, matchDesc = "url" =? glob
}
where
check k = any (matchGlob cglob)
. map (fst . getDownloader)
<$> getUrls k
cglob = compileGlob glob CaseSensitive (GlobFilePath False) -- memoized
addAccessedWithin :: Duration -> Annex () addAccessedWithin :: Duration -> Annex ()
addAccessedWithin duration = do addAccessedWithin duration = do
now <- liftIO getPOSIXTime now <- liftIO getPOSIXTime

View file

@ -178,6 +178,11 @@ in either of two repositories.
(Note that you will need to quote the second parameter to avoid (Note that you will need to quote the second parameter to avoid
the shell doing redirection.) the shell doing redirection.)
* `--url=glob`
Matches when the content is recorded as being present in an url that
matches the glob.
* `--want-get` * `--want-get`
Matches only when the preferred content settings for the local repository Matches only when the preferred content settings for the local repository

View file

@ -166,6 +166,11 @@ content not being configured.
To match PDFs with between 100 and 200 pages (assuming something has set To match PDFs with between 100 and 200 pages (assuming something has set
that metadata), use `metadata=pagecount>=100 and metadata=pagecount<=200` that metadata), use `metadata=pagecount>=100 and metadata=pagecount<=200`
* `url=glob`
Matches when the content is recorded as being present in an url that
matches the glob.
* `present` * `present`
Makes content be wanted if it's present, but not otherwise. Makes content be wanted if it's present, but not otherwise.

View file

@ -10,3 +10,5 @@ expression if adding that.
An alternative way could be to populate a metadata field with the url, An alternative way could be to populate a metadata field with the url,
if that were done without increasing the size of the git repository. if that were done without increasing the size of the git repository.
--[[Joey]] --[[Joey]]
> [[done]] --[[Joey]]