fix lack of laziness streaming large diffs

A commit last year that made a partial function use Maybe unfortunately
caused the whole input to need to be consumed, breaking streaming. So,
revert it.

This commit was sponsored by Nick Daly on Patreon.
This commit is contained in:
Joey Hess 2017-01-31 17:43:11 -04:00
parent a130bc4f9b
commit dbaea98836
No known key found for this signature in database
GPG key ID: C910D9222512E3C7
2 changed files with 32 additions and 14 deletions

View file

@ -89,7 +89,7 @@ commitDiff ref = getdiff (Param "show")
getdiff :: CommandParam -> [CommandParam] -> Repo -> IO ([DiffTreeItem], IO Bool)
getdiff command params repo = do
(diff, cleanup) <- pipeNullSplit ps repo
return (fromMaybe (error $ "git " ++ show (toCommand ps) ++ " parse failed") (parseDiffRaw diff), cleanup)
return (parseDiffRaw diff, cleanup)
where
ps =
command :
@ -100,24 +100,23 @@ getdiff command params repo = do
params
{- Parses --raw output used by diff-tree and git-log. -}
parseDiffRaw :: [String] -> Maybe [DiffTreeItem]
parseDiffRaw :: [String] -> [DiffTreeItem]
parseDiffRaw l = go l []
where
go [] c = Just c
go (info:f:rest) c = case mk info f of
Nothing -> Nothing
Just i -> go rest (i:c)
go (_:[]) _ = Nothing
go [] c = c
go (info:f:rest) c = go rest (mk info f : c)
go (s:[]) _ = error $ "diff-tree parse error near \"" ++ s ++ "\""
mk info f = DiffTreeItem
<$> readmode srcm
<*> readmode dstm
<*> extractSha ssha
<*> extractSha dsha
<*> pure s
<*> pure (asTopFilePath $ fromInternalGitPath $ Git.Filename.decode f)
{ srcmode = readmode srcm
, dstmode = readmode dstm
, srcsha = fromMaybe (error "bad srcsha") $ extractSha ssha
, dstsha = fromMaybe (error "bad dstsha") $ extractSha dsha
, status = s
, file = asTopFilePath $ fromInternalGitPath $ Git.Filename.decode f
}
where
readmode = fst <$$> headMaybe . readOct
readmode = fst . Prelude.head . readOct
-- info = :<srcmode> SP <dstmode> SP <srcsha> SP <dstsha> SP <status>
-- All fields are fixed, so we can pull them out of

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2017-01-31T20:24:04Z"
content="""
The heap profile has multiple spikes (so not an accumulating memory leak).
The diff parsing code is indeed what's using so much memory. Looks like
data is failing to stream through that code and instead the whole
diff output gets buffered.
Aha.. Git.DiffTree.parseDiffRaw used to return a list, but changed
in [[!commit 8d124beba8]]
to a Maybe list in order to avoid being a partial function. But
that change destroyed laziness, since the whole input has to be parsed
in order to determine if Nothing should be returned.
However, fixing that only eliminated part of the spike. There's something
else keeping data from streaming.
"""]]