don't frontload reconcileStaged in git-annex init

init: Avoid scanning for annexed files, which can be lengthy in a
large repository. Instead that scan is done on demand. This lets git-annex
init be run and some query commands be used in a repository without
waiting.

Note that autoinit already behaved this way, so while this will mean some
commands like git-annex get/unlock/add will do the scan the first time run,
that is not really a significant behavior change.

And, it's really better to have a consistent behavior. The reason for
the inconsistency was a strange bug discussed in
b3c4579c79. Avoiding reconcileStaged in
init will keep avoiding whatever that was.

Sponsored-by: Dartmouth College's DANDI project
This commit is contained in:
Joey Hess 2022-11-18 13:58:35 -04:00
parent c834d2025a
commit 2b014f1a8b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
8 changed files with 26 additions and 16 deletions

View file

@ -37,7 +37,6 @@ import Types.RepoVersion
import Annex.Version
import Annex.Difference
import Annex.UUID
import Annex.WorkTree
import Annex.Fixup
import Annex.Path
import Config
@ -102,8 +101,8 @@ genDescription Nothing = do
Right username -> [username, at, hostname, ":", reldir]
Left _ -> [hostname, ":", reldir]
initialize :: Bool -> Maybe String -> Maybe RepoVersion -> Annex ()
initialize autoinit mdescription mversion = checkInitializeAllowed $ \initallowed -> do
initialize :: Maybe String -> Maybe RepoVersion -> Annex ()
initialize mdescription mversion = checkInitializeAllowed $ \initallowed -> do
{- Has to come before any commits are made as the shared
- clone heuristic expects no local objects. -}
sharedclone <- checkSharedClone
@ -113,7 +112,7 @@ initialize autoinit mdescription mversion = checkInitializeAllowed $ \initallowe
ensureCommit $ Annex.Branch.create
prepUUID
initialize' autoinit mversion initallowed
initialize' mversion initallowed
initSharedClone sharedclone
@ -125,8 +124,8 @@ initialize autoinit mdescription mversion = checkInitializeAllowed $ \initallowe
-- Everything except for uuid setup, shared clone setup, and initial
-- description.
initialize' :: Bool -> Maybe RepoVersion -> InitializeAllowed -> Annex ()
initialize' autoinit mversion _initallowed = do
initialize' :: Maybe RepoVersion -> InitializeAllowed -> Annex ()
initialize' mversion _initallowed = do
checkLockSupport
checkFifoSupport
checkCrippledFileSystem
@ -143,8 +142,6 @@ initialize' autoinit mversion _initallowed = do
unlessM isBareRepo $ do
hookWrite postCheckoutHook
hookWrite postMergeHook
unless autoinit $
scanAnnexedFiles
AdjustedBranch.checkAdjustedClone >>= \case
AdjustedBranch.InAdjustedClone -> return ()
@ -206,7 +203,7 @@ ensureInitialized remotelist = getInitializedVersion >>= maybe needsinit checkUp
where
needsinit = ifM autoInitializeAllowed
( do
tryNonAsync (initialize True Nothing Nothing) >>= \case
tryNonAsync (initialize Nothing Nothing) >>= \case
Right () -> noop
Left e -> giveup $ show e ++ "\n" ++
"git-annex: automatic initialization failed due to above problems"
@ -259,7 +256,7 @@ autoInitialize remotelist = getInitializedVersion >>= maybe needsinit checkUpgra
where
needsinit =
whenM (initializeAllowed <&&> autoInitializeAllowed) $ do
initialize True Nothing Nothing
initialize Nothing Nothing
autoEnableSpecialRemotes remotelist
{- Checks if a repository is initialized. Does not check version for ugrade. -}

View file

@ -85,7 +85,7 @@ initRepo False _ dir desc mgroup = inDir dir $ do
initRepo' :: Maybe String -> Maybe StandardGroup -> Annex ()
initRepo' desc mgroup = unlessM isInitialized $ do
initialize False desc Nothing
initialize desc Nothing
u <- getUUID
maybe noop (defaultStandardGroup u) mgroup
{- Ensure branch gets committed right away so it is

View file

@ -1,7 +1,9 @@
git-annex (10.20221105) UNRELEASED; urgency=medium
* Support quettabyte and yottabyte.
* Sped up the initial scanning for annexed files by 21%.
* Sped up the initial scan for annexed files by 21%.
* init: Avoid scanning for annexed files, which can be lengthy in a
large repository. Instead that scan is done on demand.
-- Joey Hess <id@joeyh.name> Fri, 18 Nov 2022 12:58:06 -0400

View file

@ -47,7 +47,7 @@ findOrGenUUID = do
else ifM (Annex.Branch.hasSibling <||> (isJust <$> Fields.getField Fields.autoInit))
( do
liftIO checkNotReadOnly
initialize True Nothing Nothing
initialize Nothing Nothing
getUUID
, return NoUUID
)

View file

@ -75,7 +75,7 @@ perform os = do
Just v | v /= wantversion ->
giveup $ "This repository is already a initialized with version " ++ show (fromRepoVersion v) ++ ", not changing to requested version."
_ -> noop
initialize False
initialize
(if null (initDesc os) then Nothing else Just (initDesc os))
(initVersion os)
unless (noAutoEnable os)

View file

@ -35,6 +35,6 @@ perform s = do
then return $ toUUID s
else Remote.nameToUUID s
storeUUID u
checkInitializeAllowed $ initialize' False Nothing
checkInitializeAllowed $ initialize' Nothing
Annex.SpecialRemote.autoEnable
next $ return True

View file

@ -45,6 +45,6 @@ start (UpgradeOptions { autoOnly = True }) =
start _ =
starting "upgrade" (ActionItemOther Nothing) (SeekInput []) $ do
whenM (isNothing <$> getVersion) $ do
initialize False Nothing Nothing
initialize Nothing Nothing
r <- upgrade False latestVersion
next $ return r

View file

@ -5,4 +5,15 @@
content="""
Implemented the two optimisations discussed above, and init in that
repository dropped from 24 seconds to 19 seconds, a 21% speedup.
I think that's as fast as reconcileStaged is likely to get without
some deep optimisation of the persistent library.
Then I realized that `git-annex init` does not really need to scan for
associated files. That can be done later, when running a command that needs
to access the keys database. Indeed, when git-annex is used in a clone of
an annexed repo without explicitly running `git-annex init`, that's what
it already did. I've implemented that, so now `git-annex init` takes 3
seconds or so. The price will be paid later, the first time running a
`git-annex add` or `git-annex unlock` or `git-annex get`.
"""]]