git-annex/Database/Fsck.hs

109 lines
2.9 KiB
Haskell
Raw Normal View History

{- Sqlite database used for incremental fsck.
-
- Copyright 2015 Joey Hess <id@joeyh.name>
-:
- Licensed under the GNU GPL version 3 or higher.
-}
{-# LANGUAGE QuasiQuotes, TypeFamilies, TemplateHaskell #-}
{-# LANGUAGE OverloadedStrings, GADTs, FlexibleContexts #-}
{-# LANGUAGE MultiParamTypeClasses, GeneralizedNewtypeDeriving #-}
2015-02-22 20:57:19 +00:00
{-# LANGUAGE RankNTypes #-}
module Database.Fsck (
FsckHandle,
newPass,
openDb,
closeDb,
addDb,
inDb,
FsckedId,
) where
import Database.Types
import qualified Database.Handle as H
import Locations
import Utility.PosixFiles
import Utility.Exception
import Common
import Annex
import Types.Key
import Types.UUID
import Annex.Perms
import Annex.LockFile
import Database.Persist.TH
import Database.Esqueleto hiding (Key)
import Data.Time.Clock
data FsckHandle = FsckHandle H.DbHandle UUID
{- Each key stored in the database has already been fscked as part
- of the latest incremental fsck pass. -}
share [mkPersist sqlSettings, mkMigrate "migrateFsck"] [persistLowerCase|
Fscked
key SKey
UniqueKey key
|]
{- The database is removed when starting a new incremental fsck pass.
-
- This may fail, if other fsck processes are currently running using the
- database. Removing the database in that situation would lead to crashes
- or unknown behavior.
-}
newPass :: UUID -> Annex Bool
newPass u = isJust <$> tryExclusiveLock (gitAnnexFsckDbLock u) go
where
go = liftIO . void . tryIO . removeDirectoryRecursive
=<< fromRepo (gitAnnexFsckDbDir u)
{- Opens the database, creating it atomically if it doesn't exist yet. -}
openDb :: UUID -> Annex FsckHandle
openDb u = do
dbdir <- fromRepo (gitAnnexFsckDbDir u)
let db = dbdir </> "db"
unlessM (liftIO $ doesFileExist db) $ do
let tmpdbdir = dbdir ++ ".tmp"
let tmpdb = tmpdbdir </> "db"
liftIO $ do
createDirectoryIfMissing True tmpdbdir
H.initDb tmpdb $ void $
runMigrationSilent migrateFsck
setAnnexDirPerm tmpdbdir
setAnnexFilePerm tmpdb
liftIO $ do
void $ tryIO $ removeDirectoryRecursive dbdir
rename tmpdbdir dbdir
lockFileCached =<< fromRepo (gitAnnexFsckDbLock u)
h <- liftIO $ H.openDb db "fscked"
return $ FsckHandle h u
closeDb :: FsckHandle -> Annex ()
closeDb (FsckHandle h u) = do
liftIO $ H.closeDb h
unlockFile =<< fromRepo (gitAnnexFsckDbLock u)
addDb :: FsckHandle -> Key -> IO ()
addDb (FsckHandle h _) k = H.queueDb h checkcommit $
void $ insertUnique $ Fscked sk
allow for concurrent incremental fsck processes again (sorta) Sqlite doesn't support multiple concurrent writers at all. One of them will fail to write. It's not even possible to have two processes building up separate transactions at the same time. Before using sqlite, incremental fsck could work perfectly well with multiple fsck processes running concurrently. I'd like to keep that working. My partial solution, so far, is to make git-annex buffer writes, and every so often send them all to sqlite at once, in a transaction. So most of the time, nothing is writing to the database. (And if it gets unlucky and a write fails due to a collision with another writer, it can just wait and retry the write later.) This lets multiple processes write to the database successfully. But, for the purposes of concurrent, incremental fsck, it's not ideal. Each process doesn't immediately learn of files that another process has checked. So they'll tend to do redundant work. Only way I can see to improve this is to use some other mechanism for short-term IPC between the fsck processes. Not yet done. ---- Also, make addDb check if an item is in the database already, and not try to re-add it. That fixes an intermittent crash with "SQLite3 returned ErrorConstraint while attempting to perform step." I am not 100% sure why; it only started happening when I moved write buffering into the queue. It seemed to generally happen on the same file each time, so could just be due to multiple files having the same key. However, I doubt my sound repo has many duplicate keys, and I suspect something else is going on. ---- Updated benchmark, with the 1000 item queue: 6m33.808s
2015-02-17 20:39:35 +00:00
where
sk = toSKey k
-- commit queue after 1000 files or 5 minutes, whichever comes first
checkcommit sz lastcommittime
| sz > 1000 = return True
| otherwise = do
now <- getCurrentTime
return $ diffUTCTime lastcommittime now > 300
inDb :: FsckHandle -> Key -> IO Bool
inDb (FsckHandle h _) = H.queryDb h . inDb' . toSKey
allow for concurrent incremental fsck processes again (sorta) Sqlite doesn't support multiple concurrent writers at all. One of them will fail to write. It's not even possible to have two processes building up separate transactions at the same time. Before using sqlite, incremental fsck could work perfectly well with multiple fsck processes running concurrently. I'd like to keep that working. My partial solution, so far, is to make git-annex buffer writes, and every so often send them all to sqlite at once, in a transaction. So most of the time, nothing is writing to the database. (And if it gets unlucky and a write fails due to a collision with another writer, it can just wait and retry the write later.) This lets multiple processes write to the database successfully. But, for the purposes of concurrent, incremental fsck, it's not ideal. Each process doesn't immediately learn of files that another process has checked. So they'll tend to do redundant work. Only way I can see to improve this is to use some other mechanism for short-term IPC between the fsck processes. Not yet done. ---- Also, make addDb check if an item is in the database already, and not try to re-add it. That fixes an intermittent crash with "SQLite3 returned ErrorConstraint while attempting to perform step." I am not 100% sure why; it only started happening when I moved write buffering into the queue. It seemed to generally happen on the same file each time, so could just be due to multiple files having the same key. However, I doubt my sound repo has many duplicate keys, and I suspect something else is going on. ---- Updated benchmark, with the 1000 item queue: 6m33.808s
2015-02-17 20:39:35 +00:00
inDb' :: SKey -> SqlPersistM Bool
inDb' sk = do
r <- select $ from $ \r -> do
allow for concurrent incremental fsck processes again (sorta) Sqlite doesn't support multiple concurrent writers at all. One of them will fail to write. It's not even possible to have two processes building up separate transactions at the same time. Before using sqlite, incremental fsck could work perfectly well with multiple fsck processes running concurrently. I'd like to keep that working. My partial solution, so far, is to make git-annex buffer writes, and every so often send them all to sqlite at once, in a transaction. So most of the time, nothing is writing to the database. (And if it gets unlucky and a write fails due to a collision with another writer, it can just wait and retry the write later.) This lets multiple processes write to the database successfully. But, for the purposes of concurrent, incremental fsck, it's not ideal. Each process doesn't immediately learn of files that another process has checked. So they'll tend to do redundant work. Only way I can see to improve this is to use some other mechanism for short-term IPC between the fsck processes. Not yet done. ---- Also, make addDb check if an item is in the database already, and not try to re-add it. That fixes an intermittent crash with "SQLite3 returned ErrorConstraint while attempting to perform step." I am not 100% sure why; it only started happening when I moved write buffering into the queue. It seemed to generally happen on the same file each time, so could just be due to multiple files having the same key. However, I doubt my sound repo has many duplicate keys, and I suspect something else is going on. ---- Updated benchmark, with the 1000 item queue: 6m33.808s
2015-02-17 20:39:35 +00:00
where_ (r ^. FsckedKey ==. val sk)
return (r ^. FsckedKey)
return $ not $ null r