optimise journal writes to not mkdir journal directory when it already exists

Sponsored-by: Dartmouth College's DANDI project
2022-07-14 12:28:16 -04:00 · 2022-07-14 12:28:16 -04:00 · ad467791c1
commit ad467791c1
parent 5e407304a2
4 changed files with 19 additions and 4 deletions
--- a/Annex/Journal.hs
+++ b/Annex/Journal.hs
@ -81,14 +81,16 @@ setJournalFile _jl ru file content = withOtherTmp $ \tmp -> do
 		( return gitAnnexPrivateJournalDir
 		, return gitAnnexJournalDir
 		)
-	createAnnexDirectory jd
 	-- journal file is written atomically
 	let jfile = journalFile file
 	let tmpfile = tmp P.</> jfile
-	liftIO $ do
+	let write = liftIO $ do
 		withFile (fromRawFilePath tmpfile) WriteMode $ \h ->
 			writeJournalHandle h content
 		moveFile tmpfile (jd P.</> jfile)
+	-- avoid overhead of creating the journal directory when it already
+	-- exists
+	write `catchIO` (const (createAnnexDirectory jd >> write))

 data JournalledContent
 	= NoJournalledContent
--- a/doc/todo/registerurl58_do_changes_in_journal_34in_place3463.mdwn
+++ b/doc/todo/registerurl58_do_changes_in_journal_34in_place3463.mdwn
@ -27,3 +27,5 @@ May be changes to those .web files in journal could be done "in place" by append
 may be there is a way to "stagger" those --batch additions somehow so all thousands of URLs are added in a single "run" thus having a single "copy/move" and locking/stat'ing syscalls?

 PS More information could be found at [dandisets/issues/225](https://github.com/dandi/dandisets/issues/225 )
+
+[[!tag projects/dandi]]
--- a/doc/todo/registerurl58_do_changes_in_journal_34in_place3463/comment_2_a4fce84f5777ed582fa599778835455f._comment
+++ b/doc/todo/registerurl58_do_changes_in_journal_34in_place3463/comment_2_a4fce84f5777ed582fa599778835455f._comment
@ -9,9 +9,10 @@ randomly distributed?

 It sounds like it's more randomly distributed, if you're walking a tree and
 adding each file you encounter, and some of them have the same content so
-the same url and key.
+the same key.

-If it was not randomly distributed, a nice optimisation would be for
+But your stace shows repeated writes for the same key, so maybe they bunch
+up? If it was not randomly distributed, a nice optimisation would be for
 registerurl to buffer urls as long as the key is the same, and then do a
 single write for that key of all the urls. But it can't really buffer like
 that if it's randomly distributed; the buffer could use a large amount of
--- a/doc/todo/registerurl58_do_changes_in_journal_34in_place3463/comment_3_56c313fdcb88e95abaa10647678bc108._comment
+++ b/doc/todo/registerurl58_do_changes_in_journal_34in_place3463/comment_3_56c313fdcb88e95abaa10647678bc108._comment
@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 3"""
+ date="2022-07-14T16:16:35Z"
+ content="""
+I've optimised away the repeated mkdir of the journal.
+
+Probably not a big win in this particular edge case, but a nice general
+win..
+"""]]