optimise journal writes to not mkdir journal directory when it already exists
Sponsored-by: Dartmouth College's DANDI project
This commit is contained in:
parent
5e407304a2
commit
ad467791c1
4 changed files with 19 additions and 4 deletions
|
@ -81,14 +81,16 @@ setJournalFile _jl ru file content = withOtherTmp $ \tmp -> do
|
|||
( return gitAnnexPrivateJournalDir
|
||||
, return gitAnnexJournalDir
|
||||
)
|
||||
createAnnexDirectory jd
|
||||
-- journal file is written atomically
|
||||
let jfile = journalFile file
|
||||
let tmpfile = tmp P.</> jfile
|
||||
liftIO $ do
|
||||
let write = liftIO $ do
|
||||
withFile (fromRawFilePath tmpfile) WriteMode $ \h ->
|
||||
writeJournalHandle h content
|
||||
moveFile tmpfile (jd P.</> jfile)
|
||||
-- avoid overhead of creating the journal directory when it already
|
||||
-- exists
|
||||
write `catchIO` (const (createAnnexDirectory jd >> write))
|
||||
|
||||
data JournalledContent
|
||||
= NoJournalledContent
|
||||
|
|
|
@ -27,3 +27,5 @@ May be changes to those .web files in journal could be done "in place" by append
|
|||
may be there is a way to "stagger" those --batch additions somehow so all thousands of URLs are added in a single "run" thus having a single "copy/move" and locking/stat'ing syscalls?
|
||||
|
||||
PS More information could be found at [dandisets/issues/225](https://github.com/dandi/dandisets/issues/225 )
|
||||
|
||||
[[!tag projects/dandi]]
|
||||
|
|
|
@ -9,9 +9,10 @@ randomly distributed?
|
|||
|
||||
It sounds like it's more randomly distributed, if you're walking a tree and
|
||||
adding each file you encounter, and some of them have the same content so
|
||||
the same url and key.
|
||||
the same key.
|
||||
|
||||
If it was not randomly distributed, a nice optimisation would be for
|
||||
But your stace shows repeated writes for the same key, so maybe they bunch
|
||||
up? If it was not randomly distributed, a nice optimisation would be for
|
||||
registerurl to buffer urls as long as the key is the same, and then do a
|
||||
single write for that key of all the urls. But it can't really buffer like
|
||||
that if it's randomly distributed; the buffer could use a large amount of
|
||||
|
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2022-07-14T16:16:35Z"
|
||||
content="""
|
||||
I've optimised away the repeated mkdir of the journal.
|
||||
|
||||
Probably not a big win in this particular edge case, but a nice general
|
||||
win..
|
||||
"""]]
|
Loading…
Reference in a new issue