New matching options --excludesamecontent and --includesamecontent

The normalisation of filenames turns out to be the tricky part here,
because the associated files coming out of the keys db may look like
"./foo/bar" or "../bar". For the former to match a glob like "foo/*",
it needs to be normalised.

Note that, on windows, normalise "./foo/bar" = "foo\\bar"
which a glob like "foo/*" won't match. So the glob is matched a second
time, on the toInternalGitPath, so allowing the user to provide a glob
with the slashes in either direction. However, this still won't support
some wacky edge cases like the user providing a glob of "foo/bar\\*"

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2021-05-25 13:05:42 -04:00
parent cd73fcc92c
commit b5f5475ed6
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 102 additions and 3 deletions

View file

@ -338,6 +338,16 @@ fileMatchingOptions' lb =
<> help "limit to files matching the glob pattern"
<> hidden
)
, globalOption (setAnnexState . Limit.addExcludeSameContent) $ strOption
( long "excludesamecontent" <> short 'x' <> metavar paramGlob
<> help "skip files whose content is the same as another file matching the glob pattern"
<> hidden
)
, globalOption (setAnnexState . Limit.addIncludeSameContent) $ strOption
( long "includesamecontent" <> short 'I' <> metavar paramGlob
<> help "limit to files whose content is the same as another file matching the glob pattern"
<> hidden
)
, globalOption (setAnnexState . Limit.addLargerThan lb) $ strOption
( long "largerthan" <> metavar paramSize
<> help "match files larger than a size"