split out appending to journal from writing, high level only

Currently this is not an improvement, but it allows for optimising
appendJournalFile later. With an optimised appendJournalFile, this will
greatly speed up access patterns like git-annex addurl of a lot of urls
to the same key, where the log file can grow rather large. Appending
rather than re-writing the journal file for each line can save a lot of
disk writes.

It still has to read the current journal or branch file, to check
if it can append to it, and so when the journal file does not exist yet,
it can write the old content from the branch to it. Probably the re-reads
are better cached by the filesystem than repeated writes. (If the
re-reads turn out to keep performance bad, they could be eliminated, at
the cost of not being able to compact the log when replacing old
information in it. That could be enabled by a switch.)

While the immediate need is to affect addurl writes, it was implemented
at the level of presence logs, so will also perhaps speed up location logs.
The only added overhead is the call to isNewInfo, which only needs to
compare ByteStrings. Helping to balance that out, it avoids compactLog
when it's able to append.

Sponsored-by: Dartmouth College's DANDI project
This commit is contained in:
Joey Hess 2022-07-18 13:22:50 -04:00
parent 2ce1eaf56a
commit ce455223df
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 61 additions and 6 deletions

View file

@ -6,7 +6,7 @@
- A line of the log will look like: "date N INFO"
- Where N=1 when the INFO is present, 0 otherwise.
-
- Copyright 2010-2021 Joey Hess <id@joeyh.name>
- Copyright 2010-2022 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@ -35,10 +35,13 @@ addLog ru file logstatus loginfo =
addLog' :: Annex.Branch.RegardingUUID -> RawFilePath -> LogStatus -> LogInfo -> CandidateVectorClock -> Annex ()
addLog' ru file logstatus loginfo c =
Annex.Branch.change ru file $ \b ->
Annex.Branch.changeOrAppend ru file $ \b ->
let old = parseLog b
line = genLine logstatus loginfo c old
in buildLog $ compactLog (line : old)
in if isNewInfo line old
then Annex.Branch.Append $ buildLog [line]
else Annex.Branch.Change $ buildLog $
compactLog (line : old)
{- When a LogLine already exists with the same status and info, but an
- older timestamp, that LogLine is preserved, rather than updating the log

View file

@ -1,6 +1,6 @@
{- git-annex presence log, pure operations
-
- Copyright 2010-2019 Joey Hess <id@joeyh.name>
- Copyright 2010-2021 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@ -91,6 +91,12 @@ mapLog = M.elems
logMap :: [LogLine] -> LogMap
logMap = foldr insertNewerLogLine M.empty
{- Check if the info of the given line is not in the list of LogLines. -}
isNewInfo :: LogLine -> [LogLine] -> Bool
isNewInfo l old = not (any issame old)
where
issame l' = info l' == info l
insertBetter :: (LogLine -> Bool) -> LogLine -> LogMap -> Maybe LogMap
insertBetter betterthan l m
| better = Just (M.insert i l m)