optimise journal writes to not mkdir journal directory when it already exists

Sponsored-by: Dartmouth College's DANDI project
This commit is contained in:
Joey Hess 2022-07-14 12:28:16 -04:00
parent 5e407304a2
commit ad467791c1
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 19 additions and 4 deletions

View file

@ -81,14 +81,16 @@ setJournalFile _jl ru file content = withOtherTmp $ \tmp -> do
( return gitAnnexPrivateJournalDir
, return gitAnnexJournalDir
)
createAnnexDirectory jd
-- journal file is written atomically
let jfile = journalFile file
let tmpfile = tmp P.</> jfile
liftIO $ do
let write = liftIO $ do
withFile (fromRawFilePath tmpfile) WriteMode $ \h ->
writeJournalHandle h content
moveFile tmpfile (jd P.</> jfile)
-- avoid overhead of creating the journal directory when it already
-- exists
write `catchIO` (const (createAnnexDirectory jd >> write))
data JournalledContent
= NoJournalledContent

View file

@ -27,3 +27,5 @@ May be changes to those .web files in journal could be done "in place" by append
may be there is a way to "stagger" those --batch additions somehow so all thousands of URLs are added in a single "run" thus having a single "copy/move" and locking/stat'ing syscalls?
PS More information could be found at [dandisets/issues/225](https://github.com/dandi/dandisets/issues/225 )
[[!tag projects/dandi]]

View file

@ -9,9 +9,10 @@ randomly distributed?
It sounds like it's more randomly distributed, if you're walking a tree and
adding each file you encounter, and some of them have the same content so
the same url and key.
the same key.
If it was not randomly distributed, a nice optimisation would be for
But your stace shows repeated writes for the same key, so maybe they bunch
up? If it was not randomly distributed, a nice optimisation would be for
registerurl to buffer urls as long as the key is the same, and then do a
single write for that key of all the urls. But it can't really buffer like
that if it's randomly distributed; the buffer could use a large amount of

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2022-07-14T16:16:35Z"
content="""
I've optimised away the repeated mkdir of the journal.
Probably not a big win in this particular edge case, but a nice general
win..
"""]]