Add --url option and url= preferred content expression

To match content that is recorded as present in an url.

Note that, this cannot ask remotes to provide an url using whereisKey, like
whereis does. Because preferred content expressions need to match the same
from multiple perspectives, and the remote would not always be available.

That's why the docs say "recorded as present", but still this may be
surprising to some who see an url in whereis output and are surprised they
cannot match on it.

The use of getDownloader is to strip the downloader prefix from urls like
"yt:". Note that, when OtherDownloader is used, this strips the ":" prefix,
and allows matching on those urls too.
This commit is contained in:
Joey Hess 2025-07-21 12:13:40 -04:00
parent 549569533b
commit d364e434c8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
7 changed files with 42 additions and 1 deletions

View file

@ -1,6 +1,6 @@
{- git-annex file matching
-
- Copyright 2012-2024 Joey Hess <id@joeyh.name>
- Copyright 2012-2025 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@ -194,6 +194,7 @@ preferredContentTokens pcd =
, ValueToken "approxlackingcopies" (usev $ limitLackingCopies "approxlackingcopies" True)
, ValueToken "inbackend" (usev limitInBackend)
, ValueToken "metadata" (usev limitMetaData)
, ValueToken "url" (usev limitUrl)
, ValueToken "inallgroup" (usev $ limitInAllGroup $ getGroupMap pcd)
, ValueToken "onlyingroup" (usev $ limitOnlyInGroup $ getGroupMap pcd)
, ValueToken "balanced" (usev $ limitBalanced (repoUUID pcd) (getGroupMap pcd))

View file

@ -10,6 +10,8 @@ git-annex (10.20250631) UNRELEASED; urgency=medium
that have experienced the above bug.
* Fix symlinks generated to annexed content when in adjusted unlocked
branch in a linked worktree on a filesystem not supporting symlinks.
* Add --url option and url= preferred content expression, to match
content that is recorded as present in an url.
-- Joey Hess <id@joeyh.name> Mon, 07 Jul 2025 15:59:42 -0400

View file

@ -348,6 +348,11 @@ keyMatchingOptions' =
<> help "match files with attached metadata"
<> hidden
)
, annexOption (setAnnexState . Limit.addUrl) $ strOption
( long "url" <> metavar paramGlob
<> help "match files by url"
<> hidden
)
, annexFlag (setAnnexState Limit.Wanted.addWantGet)
( long "want-get"
<> help "match files the local repository wants to get"

View file

@ -31,6 +31,7 @@ import Types.FileMatcher
import Types.MetaData
import Annex.MetaData
import Logs.MetaData
import Logs.Web
import Logs.Group
import Logs.Unused
import Logs.Location
@ -867,6 +868,26 @@ limitMetaData s = case parseMetaDataMatcher s of
. S.filter matching
. metaDataValues f <$> getCurrentMetaData k
addUrl :: String -> Annex ()
addUrl = addLimit . limitUrl
limitUrl :: MkLimit Annex
limitUrl glob = Right $ MatchFiles
{ matchAction = const $ const $ checkKey check
, matchNeedsFileName = False
, matchNeedsFileContent = False
, matchNeedsKey = True
, matchNeedsLocationLog = False
, matchNeedsLiveRepoSize = False
, matchNegationUnstable = False
, matchDesc = "url" =? glob
}
where
check k = any (matchGlob cglob)
. map (fst . getDownloader)
<$> getUrls k
cglob = compileGlob glob CaseSensitive (GlobFilePath False) -- memoized
addAccessedWithin :: Duration -> Annex ()
addAccessedWithin duration = do
now <- liftIO getPOSIXTime

View file

@ -178,6 +178,11 @@ in either of two repositories.
(Note that you will need to quote the second parameter to avoid
the shell doing redirection.)
* `--url=glob`
Matches when the content is recorded as being present in an url that
matches the glob.
* `--want-get`
Matches only when the preferred content settings for the local repository

View file

@ -166,6 +166,11 @@ content not being configured.
To match PDFs with between 100 and 200 pages (assuming something has set
that metadata), use `metadata=pagecount>=100 and metadata=pagecount<=200`
* `url=glob`
Matches when the content is recorded as being present in an url that
matches the glob.
* `present`
Makes content be wanted if it's present, but not otherwise.

View file

@ -10,3 +10,5 @@ expression if adding that.
An alternative way could be to populate a metadata field with the url,
if that were done without increasing the size of the git repository.
--[[Joey]]
> [[done]] --[[Joey]]