import: Avoid buffering all filenames to be imported in memory.

Test case is 24 directories each containing files named 1..10000.
The concat and filterM destroyed what laziness there is in
dirContentsRecursive, making it buffer all the filenames. Memory
use was around 300 mb (possibly growing slightly as it progressed).
After this fix, memory use drops to a constant 59 mb.

Note that dirContentsRecursive still buffers the entire content of a
directory (not subdirectories) so this is still not optimal.
This commit is contained in:
Joey Hess 2018-04-26 12:06:12 -04:00
parent a81dcfdafd
commit bfa26661d1
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 5 additions and 2 deletions

View file

@ -22,6 +22,7 @@ git-annex (6.20180410) UNRELEASED; urgency=medium
assistant is autostarted on boot.
* Assistant: Fix installation of menus, icons, etc when run
from within runshell.
* import: Avoid buffering all filenames to be imported in memory.
-- Joey Hess <id@joeyh.name> Mon, 09 Apr 2018 14:03:28 -0400

View file

@ -93,9 +93,11 @@ withFilesInRefs a = mapM_ go
withPathContents :: ((FilePath, FilePath) -> CommandStart) -> CmdParams -> CommandSeek
withPathContents a params = do
matcher <- Limit.getMatcher
seekActions $ map a <$> (filterM (checkmatch matcher) =<< ps)
forM_ params $ \p -> do
fs <- liftIO $ get p
forM fs $ \f -> whenM (checkmatch matcher f) $
commandAction (a f)
where
ps = concat <$> liftIO (mapM get params)
get p = ifM (isDirectory <$> getFileStatus p)
( map (\f -> (f, makeRelative (parentDir p) f))
<$> dirContentsRecursiveSkipping (".git" `isSuffixOf`) True p