speed up keys database writes
There seems to be no reason to check the time here. I think it was inherited from code in Database.Fsck, which does have a reason to commit every few minutes. Removing that syscall speeds up a git-annex init in a repo with 100000 annexed files by about 3 seconds. Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
parent
0f54e5e0ae
commit
eb6f6ff9b8
4 changed files with 26 additions and 8 deletions
|
@ -88,7 +88,9 @@ addDb :: FsckHandle -> Key -> IO ()
|
|||
addDb (FsckHandle h _) k = H.queueDb h checkcommit $
|
||||
void $ insertUnique $ Fscked k
|
||||
where
|
||||
-- commit queue after 1000 files or 5 minutes, whichever comes first
|
||||
-- Commit queue after 1000 changes or 5 minutes, whichever comes first.
|
||||
-- The time based commit allows for an incremental fsck to be
|
||||
-- interrupted and not lose much work.
|
||||
checkcommit sz lastcommittime
|
||||
| sz > 1000 = return True
|
||||
| otherwise = do
|
||||
|
|
|
@ -27,7 +27,6 @@ import Git.FilePath
|
|||
|
||||
import Database.Persist.Sql hiding (Key)
|
||||
import Database.Persist.TH
|
||||
import Data.Time.Clock
|
||||
import Control.Monad
|
||||
import Data.Maybe
|
||||
|
||||
|
@ -77,12 +76,8 @@ newtype WriteHandle = WriteHandle H.DbQueue
|
|||
queueDb :: SqlPersistM () -> WriteHandle -> IO ()
|
||||
queueDb a (WriteHandle h) = H.queueDb h checkcommit a
|
||||
where
|
||||
-- commit queue after 1000 changes or 5 minutes, whichever comes first
|
||||
checkcommit sz lastcommittime
|
||||
| sz > 1000 = return True
|
||||
| otherwise = do
|
||||
now <- getCurrentTime
|
||||
return $ diffUTCTime now lastcommittime > 300
|
||||
-- commit queue after 1000 changes
|
||||
checkcommit sz _lastcommittime = pure (sz > 1000)
|
||||
|
||||
addAssociatedFile :: Key -> TopFilePath -> WriteHandle -> IO ()
|
||||
addAssociatedFile k f = queueDb $ do
|
||||
|
|
|
@ -4,3 +4,6 @@ E.g. following idea came to mind: git-annex could add some flag/beacon file (e.g
|
|||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> I think I've improved this all that it can reasonably be sped up,
|
||||
> so [[done]]. --[[Joey]]
|
||||
|
|
|
@ -0,0 +1,18 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 13"""
|
||||
date="2021-05-31T18:40:59Z"
|
||||
content="""
|
||||
There was an unncessary check of the current time per sql insert, removing
|
||||
that sped it up by 3 seconds in my benchmark.
|
||||
|
||||
Also tried increasing the number of inserts per sqlite transaction from 1k
|
||||
to 10k. Memory use increased to 90 mb, but no measurable speed increase.
|
||||
|
||||
I don't see much else that can speed up the sqlite part, without going deep
|
||||
into the weeds of populating sqlite databases without using sql, or using
|
||||
multi-value inserts ([like described here](https://medium.com/@JasonWyatt/squeezing-performance-from-sqlite-insertions-971aff98eef2).
|
||||
Both would prevent using persistent to abstract sql away, and would
|
||||
only be usable in this case, not speeding up git-annex generally,
|
||||
so not too enthused.
|
||||
"""]]
|
Loading…
Reference in a new issue