speed up keys database writes

There seems to be no reason to check the time here. I think it was inherited from code in Database.Fsck, which does have a reason to commit every few minutes. Removing that syscall speeds up a git-annex init in a repo with 100000 annexed files by about 3 seconds. Sponsored-by: Dartmouth College's Datalad project
2021-05-31 14:56:14 -04:00 · 2021-05-31 14:56:14 -04:00 · eb6f6ff9b8
commit eb6f6ff9b8
parent 0f54e5e0ae
4 changed files with 26 additions and 8 deletions
--- a/Database/Fsck.hs
+++ b/Database/Fsck.hs
@ -88,7 +88,9 @@ addDb :: FsckHandle -> Key -> IO ()
 addDb (FsckHandle h _) k = H.queueDb h checkcommit $
 	void $ insertUnique $ Fscked k
  where
-	-- commit queue after 1000 files or 5 minutes, whichever comes first
+	-- Commit queue after 1000 changes or 5 minutes, whichever comes first.
+	-- The time based commit allows for an incremental fsck to be
+	-- interrupted and not lose much work.
 	checkcommit sz lastcommittime
 		| sz > 1000 = return True
 		| otherwise = do
--- a/Database/Keys/SQL.hs
+++ b/Database/Keys/SQL.hs
@ -27,7 +27,6 @@ import Git.FilePath

 import Database.Persist.Sql hiding (Key)
 import Database.Persist.TH
-import Data.Time.Clock
 import Control.Monad
 import Data.Maybe

@ -77,12 +76,8 @@ newtype WriteHandle = WriteHandle H.DbQueue
 queueDb :: SqlPersistM () -> WriteHandle -> IO ()
 queueDb a (WriteHandle h) = H.queueDb h checkcommit a
  where
-	-- commit queue after 1000 changes or 5 minutes, whichever comes first
-	checkcommit sz lastcommittime
-		| sz > 1000 = return True
-		| otherwise = do
-			now <- getCurrentTime
-			return $ diffUTCTime now lastcommittime > 300
+	-- commit queue after 1000 changes
+	checkcommit sz _lastcommittime = pure (sz > 1000)

 addAssociatedFile :: Key -> TopFilePath -> WriteHandle -> IO ()
 addAssociatedFile k f = queueDb $ do
--- a/doc/todo/Avoid_lengthy_34Scanning_for_unlocked_files_...34.mdwn
+++ b/doc/todo/Avoid_lengthy_34Scanning_for_unlocked_files_...34.mdwn
@ -4,3 +4,6 @@ E.g. following idea came to mind: git-annex could add some flag/beacon file (e.g

 [[!meta author=yoh]]
 [[!tag projects/datalad]]
+
+> I think I've improved this all that it can reasonably be sped up,
+> so [[done]]. --[[Joey]]
--- a/doc/todo/Avoid_lengthy_34Scanning_for_unlocked_files_...34/comment_13_440624874dd3697dd538655765f2b6a2._comment
+++ b/doc/todo/Avoid_lengthy_34Scanning_for_unlocked_files_...34/comment_13_440624874dd3697dd538655765f2b6a2._comment
@ -0,0 +1,18 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 13"""
+ date="2021-05-31T18:40:59Z"
+ content="""
+There was an unncessary check of the current time per sql insert, removing
+that sped it up by 3 seconds in my benchmark.
+
+Also tried increasing the number of inserts per sqlite transaction from 1k
+to 10k. Memory use increased to 90 mb, but no measurable speed increase.
+
+I don't see much else that can speed up the sqlite part, without going deep
+into the weeds of populating sqlite databases without using sql, or using
+multi-value inserts ([like described here](https://medium.com/@JasonWyatt/squeezing-performance-from-sqlite-insertions-971aff98eef2).
+Both would prevent using persistent to abstract sql away, and would
+only be usable in this case, not speeding up git-annex generally, 
+so not too enthused.
+"""]]