git-annex/Database/Keys/Handle.hs
Joey Hess 4fd71c125e
Improve performance when used with a local git remote that has a large working tree
git write-tree was being run once per file git-annex acts on when eg,
getting files, which is slow when the remote repository has a large
tree.

onLocal calls quiesce after each action, and quiesce closes the keys db
since [[!commit ba7ecbc6a9c]]. Which has a relevant comment about
performance. I have not addressed that, the keys db still gets closed and
reopened after each file.

Turns out that, since git write-tree was run by each call to
reconcileStaged, the .git/annex/keysdb.cache value was never the
same as the git index's inode. Because git write-tree updates the index's
mtime even when no changes have been made.

And so, when the database got closed and reopened, reconcileStaged would
see a changed index, and run git write-tree again. Over and over.

I considered writing the index's new inodecache after write-tree to the
keysdb.cache, but that would be vulnerable to a race, if the index was
changed just after write-tree.

The fix was to stop using keysb.cache at all. When the database is closed
and later reopened by the same process, avoid re-doing reconcileStaged.

Now that .git/annex/keysdb.cache is no longer used. It could be removed,
but the time overhead of removing it would be more than the space overhead
of keeping it. Defferred removal to the v11 upgrade.

Sponsored-by: unqueued
2025-09-10 12:08:11 -04:00

76 lines
1.9 KiB
Haskell

{- Handle for the Keys database.
-
- Copyright 2015 Joey Hess <id@joeyh.name>
-:
- Licensed under the GNU AGPL version 3 or higher.
-}
module Database.Keys.Handle (
DbHandle,
newDbHandle,
DbState(..),
DbWasOpen(..),
withDbState,
flushDbQueue,
closeDbHandle,
) where
import qualified Database.Queue as H
import Database.Keys.Tables
import Utility.Exception
import Utility.DebugLocks
import Control.Concurrent
import Control.Monad.IO.Class (liftIO, MonadIO)
import Control.Applicative
import Prelude
-- The MVar is always left full except when actions are run
-- that access the database.
newtype DbHandle = DbHandle (MVar DbState)
-- The database can be closed or open, but it also may have been
-- tried to open (for read) and didn't exist yet or is not readable.
data DbState
= DbClosed DbWasOpen
| DbOpen (H.DbQueue, DbTablesChanged)
| DbUnavailable
-- Was the database previously opened by this process?
data DbWasOpen = DbWasOpen Bool
newDbHandle :: IO DbHandle
newDbHandle = DbHandle <$> newMVar (DbClosed (DbWasOpen False))
-- Runs an action on the state of the handle, which can change its state.
-- The MVar is empty while the action runs, which blocks other users
-- of the handle from running.
withDbState
:: (MonadIO m, MonadCatch m)
=> DbHandle
-> (DbState -> m (v, DbState))
-> m v
withDbState (DbHandle mvar) a = do
st <- liftIO $ debugLocks $ takeMVar mvar
go st `onException` (liftIO $ debugLocks $ putMVar mvar st)
where
go st = do
(v, st') <- a st
liftIO $ debugLocks $ putMVar mvar st'
return v
flushDbQueue :: DbHandle -> IO ()
flushDbQueue h = withDbState h go
where
go (DbOpen (qh, _)) = do
H.flushDbQueue qh
return ((), DbOpen (qh, mempty))
go st = return ((), st)
closeDbHandle :: DbHandle -> IO ()
closeDbHandle h = withDbState h go
where
go (DbOpen (qh, _)) = do
H.closeDbQueue qh
return ((), DbClosed (DbWasOpen True))
go st = return ((), st)