Improve startup time for commands that do not operate on remotes

And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.

Got rid of the remotes member of Git.Repo. This was a bit painful.

Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.

This commit was sponsored by Jake Vosloo on Patreon.
This commit is contained in:
Joey Hess 2018-01-09 15:36:56 -04:00
parent d0fe4d7308
commit 2b66492d6e
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
22 changed files with 148 additions and 70 deletions

View file

@ -1,6 +1,6 @@
{- git-annex monad
-
- Copyright 2010-2013 Joey Hess <id@joeyh.name>
- Copyright 2010-2018 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU GPL version 3 or higher.
-}
@ -34,12 +34,14 @@ module Annex (
getRemoteGitConfig,
withCurrentState,
changeDirectory,
getGitRemotes,
incError,
) where
import Common
import qualified Git
import qualified Git.Config
import qualified Git.Construct
import Annex.Fixup
import Git.CatFile
import Git.HashObject
@ -98,6 +100,7 @@ data AnnexState = AnnexState
{ repo :: Git.Repo
, repoadjustment :: (Git.Repo -> IO Git.Repo)
, gitconfig :: GitConfig
, gitremotes :: Maybe [Git.Repo]
, backend :: Maybe (BackendA Annex)
, remotes :: [Types.Remote.RemoteA Annex]
, remoteannexstate :: M.Map UUID AnnexState
@ -153,6 +156,7 @@ newState c r = do
{ repo = r
, repoadjustment = return
, gitconfig = c
, gitremotes = Nothing
, backend = Nothing
, remotes = []
, remoteannexstate = M.empty
@ -357,3 +361,13 @@ incError = changeState $ \s ->
let ! c = errcounter s + 1
! s' = s { errcounter = c }
in s'
getGitRemotes :: Annex [Git.Repo]
getGitRemotes = do
s <- getState id
case gitremotes s of
Just rs -> return rs
Nothing -> do
rs <- liftIO $ Git.Construct.fromRemotes (repo s)
changeState $ \s' -> s' { gitremotes = Just rs }
return rs

View file

@ -60,7 +60,6 @@ import Logs.Transitions
import Logs.File
import Logs.Trust.Pure
import Logs.Difference.Pure
import Annex.ReplaceFile
import qualified Annex.Queue
import Annex.Branch.Transitions
import qualified Annex

View file

@ -32,7 +32,6 @@ module Annex.Content.Direct (
import Annex.Common
import Annex.Perms
import qualified Git
import Utility.Tmp
import Logs.Location
import Logs.File
import Utility.InodeCache

View file

@ -10,7 +10,6 @@ module Annex.Fixup where
import Git.Types
import Git.Config
import Types.GitConfig
import qualified Git.Construct as Construct
import qualified Git.BuildVersion
import Utility.Path
import Utility.SafeCommand
@ -30,7 +29,7 @@ fixupRepo r c = do
let r' = disableWildcardExpansion r
r'' <- fixupSubmodule r' c
if annexDirect c
then fixupDirect r''
then return (fixupDirect r'')
else return r''
{- Disable git's built-in wildcard expansion, which is not wanted
@ -44,19 +43,16 @@ disableWildcardExpansion r
{- Direct mode repos have core.bare=true, but are not really bare.
- Fix up the Repo to be a non-bare repo, and arrange for git commands
- run by git-annex to be passed parameters that override this setting. -}
fixupDirect :: Repo -> IO Repo
fixupDirect :: Repo -> Repo
fixupDirect r@(Repo { location = l@(Local { gitdir = d, worktree = Nothing }) }) = do
let r' = r
r
{ location = l { worktree = Just (parentDir d) }
, gitGlobalOpts = gitGlobalOpts r ++
[ Param "-c"
, Param $ coreBare ++ "=" ++ boolConfig False
]
}
-- Recalc now that the worktree is correct.
rs' <- Construct.fromRemotes r'
return $ r' { remotes = rs' }
fixupDirect r = return r
fixupDirect r = r
{- Submodules have their gitdir containing ".git/modules/", and
- have core.worktree set, and also have a .git file in the top

View file

@ -16,6 +16,7 @@ import qualified Remote.Rsync as Rsync
import qualified Remote.GCrypt as GCrypt
import qualified Git
import qualified Git.Command
import qualified Annex
import qualified Annex.SpecialRemote
import Logs.UUID
import Logs.Remote
@ -122,26 +123,26 @@ makeGitRemote basename location = makeRemote basename location $ \name ->
- Returns the name of the remote. -}
makeRemote :: String -> String -> (RemoteName -> Annex ()) -> Annex RemoteName
makeRemote basename location a = do
g <- gitRepo
if not (any samelocation $ Git.remotes g)
rs <- Annex.getGitRemotes
if not (any samelocation rs)
then do
let name = uniqueRemoteName basename 0 g
let name = uniqueRemoteName basename 0 rs
a name
return name
else return basename
where
samelocation x = Git.repoLocation x == location
{- Generate an unused name for a remote, adding a number if
- necessary.
{- Given a list of all remotes, generate an unused name for a new
- remote, adding a number if necessary.
-
- Ensures that the returned name is a legal git remote name. -}
uniqueRemoteName :: String -> Int -> Git.Repo -> RemoteName
uniqueRemoteName basename n r
uniqueRemoteName :: String -> Int -> [Git.Repo] -> RemoteName
uniqueRemoteName basename n rs
| null namecollision = name
| otherwise = uniqueRemoteName legalbasename (succ n) r
| otherwise = uniqueRemoteName legalbasename (succ n) rs
where
namecollision = filter samename (Git.remotes r)
namecollision = filter samename rs
samename x = Git.remoteName x == Just name
name
| n == 0 = legalbasename

View file

@ -94,7 +94,7 @@ setRepoConfig uuid mremote oldc newc = do
void uuidMapLoad
when nameChanged $ do
liftAnnex $ do
name <- fromRepo $ uniqueRemoteName (legalName newc) 0
name <- uniqueRemoteName (legalName newc) 0 <$> Annex.getGitRemotes
{- git remote rename expects there to be a
- remote.<name>.fetch, and exits nonzero if
- there's not. Special remotes don't normally

View file

@ -69,7 +69,7 @@ withNewSecretKey use = do
-}
getGCryptRemoteName :: UUID -> String -> Annex RemoteName
getGCryptRemoteName u repoloc = do
tmpremote <- uniqueRemoteName "tmpgcryptremote" 0 <$> gitRepo
tmpremote <- uniqueRemoteName "tmpgcryptremote" 0 <$> Annex.getGitRemotes
void $ inRepo $ Git.Command.runBool
[ Param "remote"
, Param "add"

View file

@ -13,6 +13,9 @@ git-annex (6.20171215) UNRELEASED; urgency=medium
* git-annex.cabal: Add back custom-setup stanza, so cabal new-build works.
* Fix several places where files in .git/annex/ were written with modes
that did not take the core.sharedRepository config into account.
* Improve startup time for commands that do not operate on remotes,
and for tab completion, by not unnessessarily statting paths to
remotes, which used to cause eg, spin-up of removable drives.
-- Joey Hess <id@joeyh.name> Wed, 20 Dec 2017 12:11:46 -0400

View file

@ -1,6 +1,6 @@
{- git-annex command-line option parsing
-
- Copyright 2010-2017 Joey Hess <id@joeyh.name>
- Copyright 2010-2018 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU GPL version 3 or higher.
-}
@ -10,10 +10,12 @@ module CmdLine.GitAnnex.Options where
import Options.Applicative
import Options.Applicative.Builder.Internal
import Control.Concurrent
import qualified Data.Map as M
import Annex.Common
import qualified Git.Config
import qualified Git.Construct
import Git.Remote
import Git.Types
import Types.Key
import Types.TrustLevel
@ -348,9 +350,10 @@ completeRemotes :: HasCompleter f => Mod f a
completeRemotes = completer $ mkCompleter $ \input -> do
r <- maybe (pure Nothing) (Just <$$> Git.Config.read)
=<< Git.Construct.fromCwd
return $ filter (input `isPrefixOf`)
(maybe [] (mapMaybe remoteName . remotes) r)
return $ filter (input `isPrefixOf`) $
map remoteKeyToRemoteName $
filter isRemoteKey $
maybe [] (M.keys . config) r
completeBackends :: HasCompleter f => Mod f a
completeBackends = completeWith $

View file

@ -36,7 +36,7 @@ seek = withWords start
start :: [String] -> CommandStart
start [] = unknownNameError "Specify the remote to enable."
start (name:rest) = go =<< filter matchingname <$> Annex.fromRepo Git.remotes
start (name:rest) = go =<< filter matchingname <$> Annex.getGitRemotes
where
matchingname r = Git.remoteName r == Just name
go [] = startSpecialRemote name (Logs.Remote.keyValToConfig rest)
@ -104,7 +104,7 @@ unknownNameError prefix = do
else Remote.prettyPrintUUIDsDescs
"known special remotes"
descm (M.keys m)
disabledremotes <- filterM isdisabled =<< Annex.fromRepo Git.remotes
disabledremotes <- filterM isdisabled =<< Annex.getGitRemotes
let remotesmsg = unlines $ map ("\t" ++) $
mapMaybe Git.remoteName disabledremotes
giveup $ concat $ filter (not . null) [prefix ++ "\n", remotesmsg, specialmsg]

View file

@ -27,6 +27,9 @@ import qualified Utility.Dot as Dot
-- a link from the first repository to the second (its remote)
data Link = Link Git.Repo Git.Repo
-- a repo and its remotes
type RepoRemotes = (Git.Repo, [Git.Repo])
cmd :: Command
cmd = dontCheck repoExists $
command "map" SectionQuery
@ -76,11 +79,11 @@ runViewer file ((c, ps):rest) = ifM (liftIO $ inPath c)
- the repositories first, followed by uuids that were not matched
- to a repository.
-}
drawMap :: [Git.Repo] -> TrustMap -> M.Map UUID String -> String
drawMap :: [RepoRemotes] -> TrustMap -> M.Map UUID String -> String
drawMap rs trustmap umap = Dot.graph $ repos ++ others
where
repos = map (node umap rs trustmap) rs
ruuids = map getUncachedUUID rs
repos = map (node umap (map fst rs) trustmap) rs
ruuids = map (getUncachedUUID . fst) rs
others = map uuidnode $
filter (\u -> M.lookup u trustmap /= Just DeadTrusted) $
filter (`notElem` ruuids) (M.keys umap)
@ -113,13 +116,13 @@ nodeId r =
UUID u -> u
{- A node representing a repo. -}
node :: M.Map UUID String -> [Git.Repo] -> TrustMap -> Git.Repo -> String
node umap fullinfo trustmap r = unlines $ n:edges
node :: M.Map UUID String -> [Git.Repo] -> TrustMap -> RepoRemotes -> String
node umap fullinfo trustmap (r, rs) = unlines $ n:edges
where
n = Dot.subGraph (hostname r) (basehostname r) "lightblue" $
trustDecorate trustmap (getUncachedUUID r) $
Dot.graphNode (nodeId r) (repoName umap r)
edges = map (edge umap fullinfo r) (Git.remotes r)
edges = map (edge umap fullinfo r) rs
{- An edge between two repos. The second repo is a remote of the first. -}
edge :: M.Map UUID String -> [Git.Repo] -> Git.Repo -> Git.Repo -> String
@ -149,21 +152,21 @@ trustDecorate trustmap u s = case M.lookup u trustmap of
Nothing -> Dot.fillColor "white" s
{- Recursively searches out remotes starting with the specified repo. -}
spider :: Git.Repo -> Annex [Git.Repo]
spider :: Git.Repo -> Annex [RepoRemotes]
spider r = spider' [r] []
spider' :: [Git.Repo] -> [Git.Repo] -> Annex [Git.Repo]
spider' :: [Git.Repo] -> [RepoRemotes] -> Annex [RepoRemotes]
spider' [] known = return known
spider' (r:rs) known
| any (same r) known = spider' rs known
| any (same r) (map fst known) = spider' rs known
| otherwise = do
r' <- scan r
-- The remotes will be relative to r', and need to be
-- made absolute for later use.
remotes <- mapM (absRepo r') (Git.remotes r')
let r'' = r' { Git.remotes = remotes }
spider' (rs ++ remotes) (r'':known)
remotes <- mapM (absRepo r')
=<< (liftIO $ Git.Construct.fromRemotes r')
spider' (rs ++ remotes) ((r', remotes):known)
{- Converts repos to a common absolute form. -}
absRepo :: Git.Repo -> Git.Repo -> Annex Git.Repo
@ -260,11 +263,11 @@ tryScan r
{- Spidering can find multiple paths to the same repo, so this is used
- to combine (really remove) duplicate repos with the same UUID. -}
combineSame :: [Git.Repo] -> [Git.Repo]
combineSame :: [RepoRemotes] -> [RepoRemotes]
combineSame = map snd . nubBy sameuuid . map pair
where
sameuuid (u1, _) (u2, _) = u1 == u2 && u1 /= NoUUID
pair r = (getUncachedUUID r, r)
pair (r, rs) = (getUncachedUUID r, (r, rs))
safely :: IO Git.Repo -> IO (Maybe Git.Repo)
safely a = do

View file

@ -76,7 +76,7 @@ seek (Pair, Nothing) = commandAction $ do
unusedPeerRemoteName :: Annex RemoteName
unusedPeerRemoteName = go (1 :: Integer) =<< usednames
where
usednames = mapMaybe remoteName . remotes <$> Annex.gitRepo
usednames = mapMaybe remoteName <$> Annex.getGitRemotes
go n names = do
let name = "peer" ++ show n
if name `elem` names

View file

@ -13,8 +13,8 @@ import Data.Char
import Common
import Git
import Git.Types
import qualified Git.Construct
import qualified Git.Command
import qualified Git.Construct
import Utility.UserInfo
{- Returns a single git config setting, or a default value if not set. -}
@ -89,12 +89,10 @@ hRead repo h = do
store :: String -> Repo -> IO Repo
store s repo = do
let c = parse s
repo' <- updateLocation $ repo
updateLocation $ repo
{ config = (M.map Prelude.head c) `M.union` config repo
, fullconfig = M.unionWith (++) c (fullconfig repo)
}
rs <- Git.Construct.fromRemotes repo'
return $ repo' { remotes = rs }
{- Updates the location of a repo, based on its configuration.
-

View file

@ -127,8 +127,7 @@ fromRemotes repo = mapM construct remotepairs
where
filterconfig f = filter f $ M.toList $ config repo
filterkeys f = filterconfig (\(k,_) -> f k)
remotepairs = filterkeys isremote
isremote k = "remote." `isPrefixOf` k && ".url" `isSuffixOf` k
remotepairs = filterkeys isRemoteKey
construct (k,v) = remoteNamedFromKey k $ fromRemoteLocation v repo
{- Sets the name of a remote when constructing the Repo to represent it. -}
@ -140,10 +139,7 @@ remoteNamed n constructor = do
{- Sets the name of a remote based on the git config key, such as
- "remote.foo.url". -}
remoteNamedFromKey :: String -> IO Repo -> IO Repo
remoteNamedFromKey k = remoteNamed basename
where
basename = intercalate "." $
reverse $ drop 1 $ reverse $ drop 1 $ splitc '.' k
remoteNamedFromKey = remoteNamed . remoteKeyToRemoteName
{- Constructs a new Repo for one of a Repo's remotes using a given
- location (ie, an url). -}
@ -233,7 +229,6 @@ newFrom l = Repo
{ location = l
, config = M.empty
, fullconfig = M.empty
, remotes = []
, remoteName = Nothing
, gitEnv = Nothing
, gitEnvOverridesGitDir = False

View file

@ -20,6 +20,15 @@ import Network.URI
import Git.FilePath
#endif
{- Is a git config key one that specifies the location of a remote? -}
isRemoteKey :: String -> Bool
isRemoteKey k = "remote." `isPrefixOf` k && ".url" `isSuffixOf` k
{- Get a remote's name from the config key that specifies its location. -}
remoteKeyToRemoteName :: String -> RemoteName
remoteKeyToRemoteName k = intercalate "." $
reverse $ drop 1 $ reverse $ drop 1 $ splitc '.' k
{- Construct a legal git remote name out of an arbitrary input string.
-
- There seems to be no formal definition of this in the git source,

View file

@ -11,7 +11,6 @@ module Git.Repair (
removeBadBranches,
successfulRepair,
cleanCorruptObjects,
retrieveMissingObjects,
resetLocalBranches,
checkIndex,
checkIndexFast,
@ -102,10 +101,11 @@ retrieveMissingObjects missing referencerepo r
unlessM (boolSystem "git" [Param "init", File tmpdir]) $
error $ "failed to create temp repository in " ++ tmpdir
tmpr <- Config.read =<< Construct.fromAbsPath tmpdir
stillmissing <- pullremotes tmpr (remotes r) fetchrefstags missing
rs <- Construct.fromRemotes r
stillmissing <- pullremotes tmpr rs fetchrefstags missing
if S.null (knownMissing stillmissing)
then return stillmissing
else pullremotes tmpr (remotes r) fetchallrefs stillmissing
else pullremotes tmpr rs fetchallrefs stillmissing
where
pullremotes tmpr [] fetchrefs stillmissing = case referencerepo of
Nothing -> return stillmissing

View file

@ -34,8 +34,8 @@ data Repo = Repo
, config :: M.Map String String
-- a given git config key can actually have multiple values
, fullconfig :: M.Map String [String]
, remotes :: [Repo]
-- remoteName holds the name used for this repo in remotes
-- remoteName holds the name used for this repo in some other
-- repo's list of remotes, when this repo is such a remote
, remoteName :: Maybe RemoteName
-- alternate environment to use when running git commands
, gitEnv :: Maybe [(String, String)]

View file

@ -182,7 +182,7 @@ gCryptSetup _ mu _ c gc = go $ M.lookup "gitrepo" c
(c', _encsetup) <- encryptionSetup c gc
let url = Git.GCrypt.urlPrefix ++ gitrepo
rs <- fromRepo Git.remotes
rs <- Annex.getGitRemotes
case filter (\r -> Git.remoteName r == Just remotename) rs of
[] -> inRepo $ Git.Command.run
[ Param "remote", Param "add"

View file

@ -78,7 +78,7 @@ remote = RemoteType
list :: Bool -> Annex [Git.Repo]
list autoinit = do
c <- fromRepo Git.config
rs <- mapM (tweakurl c) =<< fromRepo Git.remotes
rs <- mapM (tweakurl c) =<< Annex.getGitRemotes
mapM (configRead autoinit) rs
where
annexurl n = "remote." ++ n ++ ".annexurl"
@ -104,8 +104,8 @@ gitSetup :: SetupStage -> Maybe UUID -> Maybe CredPair -> RemoteConfig -> Remote
gitSetup Init mu _ c _ = do
let location = fromMaybe (giveup "Specify location=url") $
Url.parseURIRelaxed =<< M.lookup "location" c
g <- Annex.gitRepo
u <- case filter (\r -> Git.location r == Git.Url location) (Git.remotes g) of
rs <- Annex.getGitRemotes
u <- case filter (\r -> Git.location r == Git.Url location) rs of
[r] -> getRepoUUID r
[] -> giveup "could not find existing git remote with specified location"
_ -> giveup "found multiple git remotes with specified location"
@ -263,10 +263,9 @@ tryGitConfigRead autoinit r
return r
store = observe $ \r' -> do
g <- gitRepo
let l = Git.remotes g
let g' = g { Git.remotes = exchange l r' }
Annex.changeState $ \s -> s { Annex.repo = g' }
l <- Annex.getGitRemotes
let rs = exchange l r'
Annex.changeState $ \s -> s { Annex.gitremotes = Just rs }
exchange [] _ = []
exchange (old:ls) new

View file

@ -17,6 +17,7 @@ import RemoteDaemon.Transport
import qualified Git
import qualified Git.Types as Git
import qualified Git.CurrentRepo
import qualified Git.Construct
import Utility.SimpleProtocol
import Utility.ThreadScheduler
import Config
@ -137,8 +138,9 @@ runController ichan ochan = do
-- Generates a map with a transport for each supported remote in the git repo,
-- except those that have annex.sync = false
genRemoteMap :: TransportHandle -> TChan Emitted -> IO RemoteMap
genRemoteMap h@(TransportHandle (LocalRepo g) _) ochan =
M.fromList . catMaybes <$> mapM gen (Git.remotes g)
genRemoteMap h@(TransportHandle (LocalRepo g) _) ochan = do
rs <- Git.Construct.fromRemotes g
M.fromList . catMaybes <$> mapM gen rs
where
gen r = do
gc <- atomically $ extractRemoteGitConfig g (Git.repoDescribe r)

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2018-01-09T17:02:41Z"
content="""
There are a couple of parts to this, so let's get this one out of the way
first: Tab completion etc should not be looking at remotes.
It seems that even `git annex --help` does for some reason; so does
stuff like `git annex examinekey`. So it's happening in a core code-path.
Ah, ok.. Git.Config.read uses Git.Construct.fromRemotes,
which uses Git.Construct.fromAbsPath, which stats
the remote directory to handle ".git" canonicalization.
Fixed this part of it; now only when the remoteList is built does it
stat remotes.
"""]]

View file

@ -0,0 +1,39 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2018-01-09T19:56:42Z"
content="""
With the above dealt with, the remaining problem is with commands
like `git annex whereis` or `git annex info`, which don't really
any on any remote, but still need to examine the remotes as part of
building the remoteList.
git-annex supports remotes that point to a mount point that might have
different drives mounted at it at different times. So, it needs to
check the git config of the remote each time, to see what repository is
currently there.
Even commands like "whereis" and "info" have output that depends on
what repository a remote is currently pointing to. In some cases,
"whereis" might not output anything that depends on a given remote,
so in theory it could avoid looking at the config of that remote.
And a command like "git annex copy --to origin" doesn't really
need to look at the configs of any other remotes.
But to avoid unncessarily checking the git configs of remotes that a
command does not use would need each use of the current remoteList
to be replaced with something else that does the minimal needed work,
instead of building the whole remoteList. I think this would be quite
complicated.
And, I don't know that it would address the bug report adequequately, even
if it were done. Running `git annex info` would
still block waiting for the automount; `git annex whereis` would
only *sometimes* block, depending on where content is.
So instead of that approach, perhaps a config setting will do?
A per-remote config that tells git-annex that only one repository
should ever be mounted at its location. That would make git-annex
avoid checking the git config of that remote each time, except
when it's actually storing/dropping content on it.
"""]]