honor preferred content when importing

Importing from a special remote honors its preferred content too; unwanted
files are not imported. But, some preferred content expressions can't be
checked before files are imported, and trying to import with such an
expression will fail.

Tested this with scenarios including changing the preferred content
expression and making sure merging the import didn't delete files that were
no longer wanted.

There was one minor inefficiency mentioned in the todo that I punted on.
This commit is contained in:
Joey Hess 2019-05-21 14:38:00 -04:00
parent ec11575d17
commit e06feb7316
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
9 changed files with 130 additions and 50 deletions

View file

@ -13,7 +13,9 @@ module Annex.Import (
ImportCommitConfig(..), ImportCommitConfig(..),
buildImportCommit, buildImportCommit,
buildImportTrees, buildImportTrees,
downloadImport downloadImport,
filterImportableContents,
makeImportMatcher,
) where ) where
import Annex.Common import Annex.Common
@ -41,6 +43,10 @@ import Messages.Progress
import Utility.DataUnits import Utility.DataUnits
import Logs.Export import Logs.Export
import Logs.Location import Logs.Location
import Logs.PreferredContent
import Types.FileMatcher
import Annex.FileMatcher
import Utility.Matcher (isEmpty)
import qualified Database.Export as Export import qualified Database.Export as Export
import qualified Database.ContentIdentifier as CIDDb import qualified Database.ContentIdentifier as CIDDb
import qualified Logs.ContentIdentifier as CIDLog import qualified Logs.ContentIdentifier as CIDLog
@ -192,7 +198,7 @@ buildImportCommit' remote importcommitconfig mtrackingcommit imported@(History t
| otherwise -> do | otherwise -> do
let oldimportedtrees = mapHistory historyCommitTree oldimported let oldimportedtrees = mapHistory historyCommitTree oldimported
mknewcommits oldhc oldimportedtrees imported mknewcommits oldhc oldimportedtrees imported
ti' <- addBackNonPreferredContent remote ti ti' <- addBackExportExcluded remote ti
Just <$> makeRemoteTrackingBranchMergeCommit' Just <$> makeRemoteTrackingBranchMergeCommit'
trackingcommit importedcommit ti' trackingcommit importedcommit ti'
where where
@ -399,11 +405,11 @@ importKey (ContentIdentifier cid) size = stubKey
-- special remote). -- special remote).
-- --
-- That presents a problem: Merging the imported tree would result -- That presents a problem: Merging the imported tree would result
-- in deletion of the non-preferred content. To avoid that happening, -- in deletion of the files that were excluded from export.
-- this adds the non-preferred content back to the imported tree. -- To avoid that happening, this adds them back to the imported tree.
--} --}
addBackNonPreferredContent :: Remote -> Sha -> Annex Sha addBackExportExcluded :: Remote -> Sha -> Annex Sha
addBackNonPreferredContent remote importtree = addBackExportExcluded remote importtree =
getExportExcluded (Remote.uuid remote) >>= \case getExportExcluded (Remote.uuid remote) >>= \case
[] -> return importtree [] -> return importtree
excludedlist -> inRepo $ excludedlist -> inRepo $
@ -417,3 +423,60 @@ addBackNonPreferredContent remote importtree =
(\imported _excluded -> imported) (\imported _excluded -> imported)
[] []
importtree importtree
{- Match the preferred content of the remote at import time.
-
- Only keyless tokens are supported, because the keys are not known
- until an imported file is downloaded, which is too late to bother
- excluding it from an import.
-}
makeImportMatcher :: Remote -> Annex (Either String (FileMatcher Annex))
makeImportMatcher r = load preferredContentKeylessTokens >>= \case
Nothing -> return $ Right matchAll
Just (Right v) -> return $ Right v
Just (Left err) -> load preferredContentTokens >>= \case
Just (Left err') -> return $ Left err'
_ -> return $ Left $
"The preferred content expression contains terms that cannot be checked when importing: " ++ err
where
load t = M.lookup (Remote.uuid r) . fst <$> preferredRequiredMapsLoad' t
wantImport :: FileMatcher Annex -> ImportLocation -> ByteSize -> Annex Bool
wantImport matcher loc sz = checkMatcher' matcher mi mempty
where
mi = MatchingInfo $ ProvidedInfo
{ providedFilePath = Right $ fromImportLocation loc
, providedKey = unavail "key"
, providedFileSize = Right sz
, providedMimeType = unavail "mime"
, providedMimeEncoding = unavail "mime"
}
-- This should never run, as long as the FileMatcher was generated
-- using the preferredContentKeylessTokens.
unavail v = Left $ error $ "Internal error: unavailable " ++ v
{- If a file is not preferred content, but it was previously exported or
- imported to the remote, not importing it would result in a remote
- tracking branch that, when merged, would delete the file.
-
- To avoid that problem, such files are included in the import.
- The next export will remove them from the remote.
-}
shouldImport :: Export.ExportHandle -> FileMatcher Annex -> ImportLocation -> ByteSize -> Annex Bool
shouldImport dbhandle matcher loc sz =
wantImport matcher loc sz
<||>
liftIO (not . null <$> Export.getExportTreeKey dbhandle loc)
filterImportableContents :: Remote -> FileMatcher Annex -> ImportableContents (ContentIdentifier, ByteSize) -> Annex (ImportableContents (ContentIdentifier, ByteSize))
filterImportableContents r matcher importable
| isEmpty matcher = return importable
| otherwise = do
dbhandle <- Export.openDb (Remote.uuid r)
go dbhandle importable
where
go dbhandle ic = ImportableContents
<$> filterM (match dbhandle) (importableContents ic)
<*> mapM (go dbhandle) (importableHistory ic)
match dbhandle (loc, (_cid, sz)) = shouldImport dbhandle matcher loc sz

View file

@ -7,6 +7,10 @@ git-annex (7.20190508) UNRELEASED; urgency=medium
annex.jobs=cpus, or using option --jobs=cpus or -Jcpus. annex.jobs=cpus, or using option --jobs=cpus or -Jcpus.
* Honor preferred content of a special remote when exporting trees to it; * Honor preferred content of a special remote when exporting trees to it;
unwanted files are filtered out of the tree that is exported. unwanted files are filtered out of the tree that is exported.
* Importing from a special remote honors its preferred content too;
unwanted files are not imported. But, some preferred content
expressions can't be checked before files are imported, and trying to
import with such an expression will fail.
* Improve shape of commit tree when importing from unversioned special * Improve shape of commit tree when importing from unversioned special
remotes. remotes.

View file

@ -293,9 +293,13 @@ listContents remote tvar = do
showStart' "list" (Just (Remote.name remote)) showStart' "list" (Just (Remote.name remote))
next $ Remote.listImportableContents (Remote.importActions remote) >>= \case next $ Remote.listImportableContents (Remote.importActions remote) >>= \case
Nothing -> giveup $ "Unable to list contents of " ++ Remote.name remote Nothing -> giveup $ "Unable to list contents of " ++ Remote.name remote
Just importable -> next $ do Just importable -> do
liftIO $ atomically $ writeTVar tvar (Just importable) importable' <- makeImportMatcher remote >>= \case
return True Right matcher -> filterImportableContents remote matcher importable
Left err -> giveup $ "Cannot import from " ++ Remote.name remote ++ " because of a problem with its configuration: " ++ err
next $ do
liftIO $ atomically $ writeTVar tvar (Just importable')
return True
commitRemote :: Remote -> Branch -> RemoteTrackingBranch -> Maybe Sha -> ImportTreeConfig -> ImportCommitConfig -> ImportableContents Key -> CommandStart commitRemote :: Remote -> Branch -> RemoteTrackingBranch -> Maybe Sha -> ImportTreeConfig -> ImportCommitConfig -> ImportableContents Key -> CommandStart
commitRemote remote branch tb trackingcommit importtreeconfig importcommitconfig importable = do commitRemote remote branch tb trackingcommit importtreeconfig importcommitconfig importable = do

View file

@ -20,6 +20,7 @@ module Logs.PreferredContent (
setStandardGroup, setStandardGroup,
defaultStandardGroup, defaultStandardGroup,
preferredRequiredMapsLoad, preferredRequiredMapsLoad,
preferredRequiredMapsLoad',
prop_standardGroups_parse, prop_standardGroups_parse,
) where ) where
@ -71,24 +72,37 @@ requiredContentMap = maybe (snd <$> preferredRequiredMapsLoad preferredContentTo
preferredRequiredMapsLoad :: (PreferredContentData -> [ParseToken (MatchFiles Annex)]) -> Annex (FileMatcherMap Annex, FileMatcherMap Annex) preferredRequiredMapsLoad :: (PreferredContentData -> [ParseToken (MatchFiles Annex)]) -> Annex (FileMatcherMap Annex, FileMatcherMap Annex)
preferredRequiredMapsLoad mktokens = do preferredRequiredMapsLoad mktokens = do
(pc, rc) <- preferredRequiredMapsLoad' mktokens
let pc' = handleunknown pc
let rc' = handleunknown rc
Annex.changeState $ \s -> s
{ Annex.preferredcontentmap = Just pc'
, Annex.requiredcontentmap = Just rc'
}
return (pc', rc')
where
handleunknown = M.mapWithKey $ \u ->
fromRight (unknownMatcher u)
preferredRequiredMapsLoad' :: (PreferredContentData -> [ParseToken (MatchFiles Annex)]) -> Annex (M.Map UUID (Either String (FileMatcher Annex)), M.Map UUID (Either String (FileMatcher Annex)))
preferredRequiredMapsLoad' mktokens = do
groupmap <- groupMap groupmap <- groupMap
configmap <- readRemoteLog configmap <- readRemoteLog
let genmap l gm = let genmap l gm =
let mk u = fromRight (unknownMatcher u) . let mk u = makeMatcher groupmap configmap gm u mktokens
makeMatcher groupmap configmap gm u mktokens
in simpleMap in simpleMap
. parseLogOldWithUUID (\u -> mk u . decodeBS <$> A.takeByteString) . parseLogOldWithUUID (\u -> mk u . decodeBS <$> A.takeByteString)
<$> Annex.Branch.get l <$> Annex.Branch.get l
pc <- genmap preferredContentLog =<< groupPreferredContentMapRaw pc <- genmap preferredContentLog =<< groupPreferredContentMapRaw
rc <- genmap requiredContentLog M.empty rc <- genmap requiredContentLog M.empty
-- Required content is implicitly also preferred content, so -- Required content is implicitly also preferred content, so combine.
-- combine. let pc' = M.unionWith combiner pc rc
let m = M.unionWith combineMatchers pc rc return (pc', rc)
Annex.changeState $ \s -> s where
{ Annex.preferredcontentmap = Just m combiner (Right a) (Right b) = Right (combineMatchers a b)
, Annex.requiredcontentmap = Just rc combiner (Left a) (Left b) = Left (a ++ " " ++ b)
} combiner (Left a) (Right _) = Left a
return (m, rc) combiner (Right _) (Left b) = Left b
{- This intentionally never fails, even on unparsable expressions, {- This intentionally never fails, even on unparsable expressions,
- because the configuration is shared among repositories and newer - because the configuration is shared among repositories and newer

View file

@ -31,7 +31,7 @@ data FileInfo = FileInfo
} }
-- This is used when testing a matcher, with values to match against -- This is used when testing a matcher, with values to match against
-- provided by the user, rather than queried from files. -- provided in some way, rather than queried from files on disk.
data ProvidedInfo = ProvidedInfo data ProvidedInfo = ProvidedInfo
{ providedFilePath :: OptInfo FilePath { providedFilePath :: OptInfo FilePath
, providedKey :: OptInfo Key , providedKey :: OptInfo Key
@ -48,7 +48,7 @@ getInfo :: MonadIO m => OptInfo a -> m a
getInfo (Right i) = return i getInfo (Right i) = return i
getInfo (Left e) = liftIO e getInfo (Left e) = liftIO e
type FileMatcherMap a = M.Map UUID (Utility.Matcher.Matcher (S.Set UUID -> MatchInfo -> a Bool)) type FileMatcherMap a = M.Map UUID (FileMatcher a)
type MkLimit a = String -> Either String (MatchFiles a) type MkLimit a = String -> Either String (MatchFiles a)

View file

@ -68,8 +68,6 @@ let an export overwrite the modified file; then `git annex import`
will create a sequence of commits that includes the modified file, will create a sequence of commits that includes the modified file,
so the overwritten modification is not lost.) so the overwritten modification is not lost.)
# PREFERRED
# OPTIONS # OPTIONS
* `--to=remote` * `--to=remote`

View file

@ -68,18 +68,14 @@ to tell it what branch to track. For example:
If a preferred content expression is configured for the special remote, If a preferred content expression is configured for the special remote,
it will be honored when importing from it. Files that are not preferred it will be honored when importing from it. Files that are not preferred
content of the remote will not be imported from it, but will be left on the content of the remote will not be imported from it, but will be left on the
remote. A couple of caveats: remote.
References to directories in the preferred content expression However, preferred content expressions that relate to the key
are relative to the top of the special remote, not of the git repository can't be matched when importing, because the content of the file is not
it's being imported into. known. Importing will fail when such a preferred content expression is
set. This includes expressions containing "copies=", "metadata=", and other
Preferred content expressions that relate to the content of a file will things that depend on the key. Preferred content expressions containing
make the file be downloaded from the special remote, even when it turns out "include=", "exclude=" "smallerthan=", "largerthan=" will work.
not to be preferred content. The download will only happen once for each
version of a file, and the unwanted content will be thrown away. Such
expressions include "copies=", "metadata=", and other things that depend on
the key, but not "smallerthan=", "largerthan=", "include=", "exclude="
# IMPORTING FROM A DIRECTORY # IMPORTING FROM A DIRECTORY

View file

@ -44,9 +44,9 @@ elsewhere to allow removing it).
when you're done with them. Then you could configure your laptop to prefer when you're done with them. Then you could configure your laptop to prefer
to not retain those files, like this: `exclude=*/archive/*` to not retain those files, like this: `exclude=*/archive/*`
When a subdirectory is being exported to a special remote (see When a subdirectory is being exported or imported to a special remote (see
[[git-annex-export]](1)), these match relative to the top of the [[git-annex-export]](1)) and [[git-annex-import]](1), these match relative
subdirectory. to the top of the subdirectory.
* `copies=number` * `copies=number`

View file

@ -4,6 +4,8 @@ But, in some situations, the user may want to export a subset of files,
in a way that can be well expressed by a preferred content expression. in a way that can be well expressed by a preferred content expression.
> started work on this in the `preferred` branch. --[[Joey]] > started work on this in the `preferred` branch. --[[Joey]]
>
> > And [[done]]! --[[Joey]]
For example, they may want to export .mp3 files but not the .wav For example, they may want to export .mp3 files but not the .wav
files used to produce those. files used to produce those.
@ -39,10 +41,9 @@ exclude= etc match relative to the top of the exported tree when exporting
a subtree. a subtree.
> done > done
Problem: Each `git-annex sync --content` re-filters the exported tree. Note: Each `git-annex sync --content` re-filters the exported tree.
Unnecessary work. If there were a way to look up the original tree that Unnecessary work. If there were a way to look up the original tree that
corresponds with the filtered exported tree, that could be avoided. corresponds with the filtered exported tree, that could be avoided.
TODO
---- ----
@ -54,6 +55,7 @@ TODO
> is added to the remote, it shouldn't be downloaded. Or a better example, > is added to the remote, it shouldn't be downloaded. Or a better example,
> if directory Music is excluded from an android remote, importing from > if directory Music is excluded from an android remote, importing from
> it should exclude that directory. > it should exclude that directory.
> > done
## import after limited export ## import after limited export
@ -167,6 +169,9 @@ TODO
> and if they changed to `exclude=*.mp3 or metadata=tag=podcast` > and if they changed to `exclude=*.mp3 or metadata=tag=podcast`
> and it did all that extra work, that would be surprising. > and it did all that extra work, that would be surprising.
> > done; it seemed to make sense at least at first to make import
> > fail when the preferred content dependened on a key.
## different preferred content for export and import? ## different preferred content for export and import?
May be cases where this makes sense. For example, I might make my phone May be cases where this makes sense. For example, I might make my phone
@ -226,14 +231,10 @@ But, if some other file got deleted from the special remote after the
export, the import would then not delete it. export, the import would then not delete it.
Alternatively, when a preferred content expression doesn't match a file at Alternatively, when a preferred content expression doesn't match a file at
import, could check if the same file was present in the last export. (With import, could check if the same file is known to be present on the remote
same or different content.) If so, assume the preferred content has changed as of the last import or export. (With same or different content.) If so,
and that the user does not want to delete this file, so keep it in the assume the preferred content has changed and that the user does not want to
import anyway (using the content that was last exported to it). delete this file, so keep it in the import anyway. This way the import does
(The state does not currently differentiate between the last export not delete files from master, and when the next export removes it from
and the last import, so the file would keep being included in the remote it will still not get deleted from master.
imports until an export was made that removed it.) > done
OR, don't match preferred content expressions on import at all; download
everything, and let the user delete unwanted imports locally. Does avoid
all these complications.