avoid git check-ignore overhead on importing known files

isKnownImportLocation does a database lookup and there's an index
to make that lookup fast, so it's probably faster than talking to git
check-ignore. Checking the matcher is faster still.

While before the gitignore check was added it did not need to always
check isknown, now it does, because it's that or the more expensive
notignored. But at least we can skip notignored when a file is known,
which will often be the common case: Importing from a remote that's been
exported to, and/or imported from before, only new files will not be
known, so only those will need to check notignored.

At first, I had this:
	(matches <&&> (isknown <||> notignored)) <||> isknown
Notice that checks isknown every time, whether it matches or not.

So, it's no slower to instead do this:
	isknown <||> (matches <&&> notignored)
That has the benefit that, when it's known, it doesn't need to run
matches, which while faster than isknown, is still going to use some CPU.

And it perhaps more clearly expresses the condition: Any known file is
wanted, otherwise it's down to what matches and is not ignored.

This commit was sponsored by Jack Hill on Patren.
This commit is contained in:
Joey Hess 2020-09-30 11:09:09 -04:00
parent c56efbbdb6
commit 41271e4eb4
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38

View file

@ -643,25 +643,22 @@ getImportableContents r importtreeconfig ci matcher =
<*> mapM (filterunwanted dbhandle) (importableHistory ic) <*> mapM (filterunwanted dbhandle) (importableHistory ic)
wanted dbhandle (loc, (_cid, sz)) wanted dbhandle (loc, (_cid, sz))
| ".git" `elem` Posix.splitDirectories (fromImportLocation loc) = | ingitdir = pure False
pure False | otherwise =
| otherwise = wantImport importtreeconfig ci matcher loc sz isknown <||> (matches <&&> notignored)
<||> isKnownImportLocation dbhandle loc where
-- Checks, from least to most expensive.
ingitdir = ".git" `elem` Posix.splitDirectories (fromImportLocation loc)
matches = matchesImportLocation matcher loc sz
isknown = isKnownImportLocation dbhandle loc
notignored = notIgnoredImportLocation importtreeconfig ci loc
isKnownImportLocation :: Export.ExportHandle -> ImportLocation -> Annex Bool isKnownImportLocation :: Export.ExportHandle -> ImportLocation -> Annex Bool
isKnownImportLocation dbhandle loc = liftIO $ isKnownImportLocation dbhandle loc = liftIO $
not . null <$> Export.getExportTreeKey dbhandle loc not . null <$> Export.getExportTreeKey dbhandle loc
{- The matcher is matched relative to the top of the tree of files on the matchesImportLocation :: FileMatcher Annex -> ImportLocation -> Integer -> Annex Bool
- remote, even when importing into a subdirectory. matchesImportLocation matcher loc sz = checkMatcher' matcher mi mempty
-
- However, when checking gitignores, the subdirectory is included
- so it will look at the gitignore file in it.
-}
wantImport :: ImportTreeConfig -> CheckGitIgnore -> FileMatcher Annex -> ImportLocation -> ByteSize -> Annex Bool
wantImport importtreeconfig ci matcher loc sz =
checkMatcher' matcher mi mempty
<&&> (not <$> checkIgnored ci f)
where where
mi = MatchingInfo $ ProvidedInfo mi = MatchingInfo $ ProvidedInfo
{ providedFilePath = fromImportLocation loc { providedFilePath = fromImportLocation loc
@ -670,6 +667,10 @@ wantImport importtreeconfig ci matcher loc sz =
, providedMimeType = Nothing , providedMimeType = Nothing
, providedMimeEncoding = Nothing , providedMimeEncoding = Nothing
} }
notIgnoredImportLocation :: ImportTreeConfig -> CheckGitIgnore -> ImportLocation -> Annex Bool
notIgnoredImportLocation importtreeconfig ci loc = not <$> checkIgnored ci f
where
f = fromRawFilePath $ case importtreeconfig of f = fromRawFilePath $ case importtreeconfig of
ImportSubTree dir _ -> ImportSubTree dir _ ->
getTopFilePath dir P.</> fromImportLocation loc getTopFilePath dir P.</> fromImportLocation loc