fix lack of laziness streaming large diffs

A commit last year that made a partial function use Maybe unfortunately
caused the whole input to need to be consumed, breaking streaming. So,
revert it.

This commit was sponsored by Nick Daly on Patreon.
This commit is contained in:
Joey Hess 2017-01-31 17:43:11 -04:00
parent a130bc4f9b
commit dbaea98836
No known key found for this signature in database
GPG key ID: C910D9222512E3C7
2 changed files with 32 additions and 14 deletions

View file

@ -89,7 +89,7 @@ commitDiff ref = getdiff (Param "show")
getdiff :: CommandParam -> [CommandParam] -> Repo -> IO ([DiffTreeItem], IO Bool) getdiff :: CommandParam -> [CommandParam] -> Repo -> IO ([DiffTreeItem], IO Bool)
getdiff command params repo = do getdiff command params repo = do
(diff, cleanup) <- pipeNullSplit ps repo (diff, cleanup) <- pipeNullSplit ps repo
return (fromMaybe (error $ "git " ++ show (toCommand ps) ++ " parse failed") (parseDiffRaw diff), cleanup) return (parseDiffRaw diff, cleanup)
where where
ps = ps =
command : command :
@ -100,24 +100,23 @@ getdiff command params repo = do
params params
{- Parses --raw output used by diff-tree and git-log. -} {- Parses --raw output used by diff-tree and git-log. -}
parseDiffRaw :: [String] -> Maybe [DiffTreeItem] parseDiffRaw :: [String] -> [DiffTreeItem]
parseDiffRaw l = go l [] parseDiffRaw l = go l []
where where
go [] c = Just c go [] c = c
go (info:f:rest) c = case mk info f of go (info:f:rest) c = go rest (mk info f : c)
Nothing -> Nothing go (s:[]) _ = error $ "diff-tree parse error near \"" ++ s ++ "\""
Just i -> go rest (i:c)
go (_:[]) _ = Nothing
mk info f = DiffTreeItem mk info f = DiffTreeItem
<$> readmode srcm { srcmode = readmode srcm
<*> readmode dstm , dstmode = readmode dstm
<*> extractSha ssha , srcsha = fromMaybe (error "bad srcsha") $ extractSha ssha
<*> extractSha dsha , dstsha = fromMaybe (error "bad dstsha") $ extractSha dsha
<*> pure s , status = s
<*> pure (asTopFilePath $ fromInternalGitPath $ Git.Filename.decode f) , file = asTopFilePath $ fromInternalGitPath $ Git.Filename.decode f
}
where where
readmode = fst <$$> headMaybe . readOct readmode = fst . Prelude.head . readOct
-- info = :<srcmode> SP <dstmode> SP <srcsha> SP <dstsha> SP <status> -- info = :<srcmode> SP <dstmode> SP <srcsha> SP <dstsha> SP <status>
-- All fields are fixed, so we can pull them out of -- All fields are fixed, so we can pull them out of

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2017-01-31T20:24:04Z"
content="""
The heap profile has multiple spikes (so not an accumulating memory leak).
The diff parsing code is indeed what's using so much memory. Looks like
data is failing to stream through that code and instead the whole
diff output gets buffered.
Aha.. Git.DiffTree.parseDiffRaw used to return a list, but changed
in [[!commit 8d124beba8]]
to a Maybe list in order to avoid being a partial function. But
that change destroyed laziness, since the whole input has to be parsed
in order to determine if Nothing should be returned.
However, fixing that only eliminated part of the spike. There's something
else keeping data from streaming.
"""]]