speed up keys database writes
There seems to be no reason to check the time here. I think it was inherited from code in Database.Fsck, which does have a reason to commit every few minutes. Removing that syscall speeds up a git-annex init in a repo with 100000 annexed files by about 3 seconds. Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
parent
0f54e5e0ae
commit
eb6f6ff9b8
4 changed files with 26 additions and 8 deletions
|
@ -88,7 +88,9 @@ addDb :: FsckHandle -> Key -> IO ()
|
||||||
addDb (FsckHandle h _) k = H.queueDb h checkcommit $
|
addDb (FsckHandle h _) k = H.queueDb h checkcommit $
|
||||||
void $ insertUnique $ Fscked k
|
void $ insertUnique $ Fscked k
|
||||||
where
|
where
|
||||||
-- commit queue after 1000 files or 5 minutes, whichever comes first
|
-- Commit queue after 1000 changes or 5 minutes, whichever comes first.
|
||||||
|
-- The time based commit allows for an incremental fsck to be
|
||||||
|
-- interrupted and not lose much work.
|
||||||
checkcommit sz lastcommittime
|
checkcommit sz lastcommittime
|
||||||
| sz > 1000 = return True
|
| sz > 1000 = return True
|
||||||
| otherwise = do
|
| otherwise = do
|
||||||
|
|
|
@ -27,7 +27,6 @@ import Git.FilePath
|
||||||
|
|
||||||
import Database.Persist.Sql hiding (Key)
|
import Database.Persist.Sql hiding (Key)
|
||||||
import Database.Persist.TH
|
import Database.Persist.TH
|
||||||
import Data.Time.Clock
|
|
||||||
import Control.Monad
|
import Control.Monad
|
||||||
import Data.Maybe
|
import Data.Maybe
|
||||||
|
|
||||||
|
@ -77,12 +76,8 @@ newtype WriteHandle = WriteHandle H.DbQueue
|
||||||
queueDb :: SqlPersistM () -> WriteHandle -> IO ()
|
queueDb :: SqlPersistM () -> WriteHandle -> IO ()
|
||||||
queueDb a (WriteHandle h) = H.queueDb h checkcommit a
|
queueDb a (WriteHandle h) = H.queueDb h checkcommit a
|
||||||
where
|
where
|
||||||
-- commit queue after 1000 changes or 5 minutes, whichever comes first
|
-- commit queue after 1000 changes
|
||||||
checkcommit sz lastcommittime
|
checkcommit sz _lastcommittime = pure (sz > 1000)
|
||||||
| sz > 1000 = return True
|
|
||||||
| otherwise = do
|
|
||||||
now <- getCurrentTime
|
|
||||||
return $ diffUTCTime now lastcommittime > 300
|
|
||||||
|
|
||||||
addAssociatedFile :: Key -> TopFilePath -> WriteHandle -> IO ()
|
addAssociatedFile :: Key -> TopFilePath -> WriteHandle -> IO ()
|
||||||
addAssociatedFile k f = queueDb $ do
|
addAssociatedFile k f = queueDb $ do
|
||||||
|
|
|
@ -4,3 +4,6 @@ E.g. following idea came to mind: git-annex could add some flag/beacon file (e.g
|
||||||
|
|
||||||
[[!meta author=yoh]]
|
[[!meta author=yoh]]
|
||||||
[[!tag projects/datalad]]
|
[[!tag projects/datalad]]
|
||||||
|
|
||||||
|
> I think I've improved this all that it can reasonably be sped up,
|
||||||
|
> so [[done]]. --[[Joey]]
|
||||||
|
|
|
@ -0,0 +1,18 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 13"""
|
||||||
|
date="2021-05-31T18:40:59Z"
|
||||||
|
content="""
|
||||||
|
There was an unncessary check of the current time per sql insert, removing
|
||||||
|
that sped it up by 3 seconds in my benchmark.
|
||||||
|
|
||||||
|
Also tried increasing the number of inserts per sqlite transaction from 1k
|
||||||
|
to 10k. Memory use increased to 90 mb, but no measurable speed increase.
|
||||||
|
|
||||||
|
I don't see much else that can speed up the sqlite part, without going deep
|
||||||
|
into the weeds of populating sqlite databases without using sql, or using
|
||||||
|
multi-value inserts ([like described here](https://medium.com/@JasonWyatt/squeezing-performance-from-sqlite-insertions-971aff98eef2).
|
||||||
|
Both would prevent using persistent to abstract sql away, and would
|
||||||
|
only be usable in this case, not speeding up git-annex generally,
|
||||||
|
so not too enthused.
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue