optimise journal writes to not mkdir journal directory when it already exists
Sponsored-by: Dartmouth College's DANDI project
This commit is contained in:
parent
5e407304a2
commit
ad467791c1
4 changed files with 19 additions and 4 deletions
|
@ -81,14 +81,16 @@ setJournalFile _jl ru file content = withOtherTmp $ \tmp -> do
|
||||||
( return gitAnnexPrivateJournalDir
|
( return gitAnnexPrivateJournalDir
|
||||||
, return gitAnnexJournalDir
|
, return gitAnnexJournalDir
|
||||||
)
|
)
|
||||||
createAnnexDirectory jd
|
|
||||||
-- journal file is written atomically
|
-- journal file is written atomically
|
||||||
let jfile = journalFile file
|
let jfile = journalFile file
|
||||||
let tmpfile = tmp P.</> jfile
|
let tmpfile = tmp P.</> jfile
|
||||||
liftIO $ do
|
let write = liftIO $ do
|
||||||
withFile (fromRawFilePath tmpfile) WriteMode $ \h ->
|
withFile (fromRawFilePath tmpfile) WriteMode $ \h ->
|
||||||
writeJournalHandle h content
|
writeJournalHandle h content
|
||||||
moveFile tmpfile (jd P.</> jfile)
|
moveFile tmpfile (jd P.</> jfile)
|
||||||
|
-- avoid overhead of creating the journal directory when it already
|
||||||
|
-- exists
|
||||||
|
write `catchIO` (const (createAnnexDirectory jd >> write))
|
||||||
|
|
||||||
data JournalledContent
|
data JournalledContent
|
||||||
= NoJournalledContent
|
= NoJournalledContent
|
||||||
|
|
|
@ -27,3 +27,5 @@ May be changes to those .web files in journal could be done "in place" by append
|
||||||
may be there is a way to "stagger" those --batch additions somehow so all thousands of URLs are added in a single "run" thus having a single "copy/move" and locking/stat'ing syscalls?
|
may be there is a way to "stagger" those --batch additions somehow so all thousands of URLs are added in a single "run" thus having a single "copy/move" and locking/stat'ing syscalls?
|
||||||
|
|
||||||
PS More information could be found at [dandisets/issues/225](https://github.com/dandi/dandisets/issues/225 )
|
PS More information could be found at [dandisets/issues/225](https://github.com/dandi/dandisets/issues/225 )
|
||||||
|
|
||||||
|
[[!tag projects/dandi]]
|
||||||
|
|
|
@ -9,9 +9,10 @@ randomly distributed?
|
||||||
|
|
||||||
It sounds like it's more randomly distributed, if you're walking a tree and
|
It sounds like it's more randomly distributed, if you're walking a tree and
|
||||||
adding each file you encounter, and some of them have the same content so
|
adding each file you encounter, and some of them have the same content so
|
||||||
the same url and key.
|
the same key.
|
||||||
|
|
||||||
If it was not randomly distributed, a nice optimisation would be for
|
But your stace shows repeated writes for the same key, so maybe they bunch
|
||||||
|
up? If it was not randomly distributed, a nice optimisation would be for
|
||||||
registerurl to buffer urls as long as the key is the same, and then do a
|
registerurl to buffer urls as long as the key is the same, and then do a
|
||||||
single write for that key of all the urls. But it can't really buffer like
|
single write for that key of all the urls. But it can't really buffer like
|
||||||
that if it's randomly distributed; the buffer could use a large amount of
|
that if it's randomly distributed; the buffer could use a large amount of
|
||||||
|
|
|
@ -0,0 +1,10 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 3"""
|
||||||
|
date="2022-07-14T16:16:35Z"
|
||||||
|
content="""
|
||||||
|
I've optimised away the repeated mkdir of the journal.
|
||||||
|
|
||||||
|
Probably not a big win in this particular edge case, but a nice general
|
||||||
|
win..
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue