Improve memory use of --all when using annex.private
This does not improve Annex.Branch.files at all, since it still uses ++ to
combine the lists, so forcing all but the last one.
But when there are a lot of files in the private journal, it does avoid
--all (or a bare repo) from buffering the filenames in memory.
See commit 653b719472
for prior discussion of
this buffering.
Sponsored-by: Graham Spencer on Patreon
This commit is contained in:
parent
18f902efa9
commit
0da1d40cd4
3 changed files with 42 additions and 25 deletions
|
@ -1,6 +1,6 @@
|
|||
{- management of the git-annex branch
|
||||
-
|
||||
- Copyright 2011-2022 Joey Hess <id@joeyh.name>
|
||||
- Copyright 2011-2023 Joey Hess <id@joeyh.name>
|
||||
-
|
||||
- Licensed under the GNU AGPL version 3 or higher.
|
||||
-}
|
||||
|
@ -597,21 +597,24 @@ files = do
|
|||
then return Nothing
|
||||
else do
|
||||
(bfs, cleanup) <- branchFiles
|
||||
jfs <- journalledFiles
|
||||
pjfs <- journalledFilesPrivate
|
||||
-- ++ forces the content of the first list to be
|
||||
-- buffered in memory, so use journalledFiles,
|
||||
-- which should be much smaller most of the time.
|
||||
-- branchFiles will stream as the list is consumed.
|
||||
l <- (++) <$> journalledFiles <*> pure bfs
|
||||
let l = jfs ++ pjfs ++ bfs
|
||||
return (Just (l, cleanup))
|
||||
|
||||
{- Lists all files currently in the journal. There may be duplicates in
|
||||
- the list when using a private journal. -}
|
||||
{- Lists all files currently in the journal, but not files in the private
|
||||
- journal. -}
|
||||
journalledFiles :: Annex [RawFilePath]
|
||||
journalledFiles = ifM privateUUIDsKnown
|
||||
( (++)
|
||||
<$> getJournalledFilesStale gitAnnexPrivateJournalDir
|
||||
<*> getJournalledFilesStale gitAnnexJournalDir
|
||||
, getJournalledFilesStale gitAnnexJournalDir
|
||||
journalledFiles = getJournalledFilesStale gitAnnexJournalDir
|
||||
|
||||
journalledFilesPrivate :: Annex [RawFilePath]
|
||||
journalledFilesPrivate = ifM privateUUIDsKnown
|
||||
( getJournalledFilesStale gitAnnexPrivateJournalDir
|
||||
, return []
|
||||
)
|
||||
|
||||
{- Files in the branch, not including any from journalled changes,
|
||||
|
@ -992,8 +995,11 @@ overBranchFileContents' select go st = do
|
|||
-- This can cause the action to be run a
|
||||
-- second time with a file it already ran on.
|
||||
| otherwise -> liftIO (tryTakeMVar buf) >>= \case
|
||||
Nothing -> drain buf =<< journalledFiles
|
||||
Just fs -> drain buf fs
|
||||
Nothing -> do
|
||||
jfs <- journalledFiles
|
||||
pjfs <- journalledFilesPrivate
|
||||
drain buf jfs pjfs
|
||||
Just (jfs, pjfs) -> drain buf jfs pjfs
|
||||
catObjectStreamLsTree l (select' . getTopFilePath . Git.LsTree.file) g go'
|
||||
`finally` liftIO (void cleanup)
|
||||
where
|
||||
|
@ -1007,9 +1013,9 @@ overBranchFileContents' select go st = do
|
|||
PossiblyStaleJournalledContent journalledcontent ->
|
||||
Just (fromMaybe mempty branchcontent <> journalledcontent)
|
||||
|
||||
drain buf fs = case getnext fs of
|
||||
Just (v, f, fs') -> do
|
||||
liftIO $ putMVar buf fs'
|
||||
drain buf fs pfs = case getnext fs pfs of
|
||||
Just (v, f, fs', pfs') -> do
|
||||
liftIO $ putMVar buf (fs', pfs')
|
||||
content <- getJournalFileStale (GetPrivate True) f >>= \case
|
||||
NoJournalledContent -> return Nothing
|
||||
JournalledContent journalledcontent ->
|
||||
|
@ -1022,13 +1028,16 @@ overBranchFileContents' select go st = do
|
|||
return (Just (content <> journalledcontent))
|
||||
return (Just (v, f, content))
|
||||
Nothing -> do
|
||||
liftIO $ putMVar buf []
|
||||
liftIO $ putMVar buf ([], [])
|
||||
return Nothing
|
||||
|
||||
getnext [] = Nothing
|
||||
getnext (f:fs) = case select f of
|
||||
Nothing -> getnext fs
|
||||
Just v -> Just (v, f, fs)
|
||||
getnext [] [] = Nothing
|
||||
getnext (f:fs) pfs = case select f of
|
||||
Nothing -> getnext fs pfs
|
||||
Just v -> Just (v, f, fs, pfs)
|
||||
getnext [] (pf:pfs) = case select pf of
|
||||
Nothing -> getnext [] pfs
|
||||
Just v -> Just (v, pf, [], pfs)
|
||||
|
||||
{- Check if the git-annex branch has been updated from the oldtree.
|
||||
- If so, returns the tuple of the old and new trees. -}
|
||||
|
|
|
@ -4,6 +4,7 @@ git-annex (10.20230927) UNRELEASED; urgency=medium
|
|||
* Fix crash of enableremote when the special remote has embedcreds=yes.
|
||||
* importfeed: Use caching database to avoid needing to list urls
|
||||
on every run, and avoid using too much memory.
|
||||
* Improve memory use of --all when using annex.private.
|
||||
|
||||
-- Joey Hess <id@joeyh.name> Tue, 10 Oct 2023 13:17:31 -0400
|
||||
|
||||
|
|
|
@ -1,16 +1,23 @@
|
|||
Using --all, or running in a bare repo, as well as
|
||||
`git annex unused` and `git annex info` all end up buffering the list of
|
||||
all keys that have uncommitted journalled changes in memory.
|
||||
This is due to Annex.Branch.files's call to getJournalledFilesStale which
|
||||
reads all the files in the directory into a buffer.
|
||||
`git annex unused --from=$remote` and `git annex info $remote`
|
||||
buffer the list of keys that have uncommitted journalled changes
|
||||
in memory. This is due to Annex.Branch.files's which reads all the
|
||||
files in the journal into a buffer.
|
||||
|
||||
Note that the list of keys in the branch *does* stream in, so this
|
||||
is only really a problem when using annex.alwayscommit=false to build
|
||||
up big git-annex branch commits via the journal.
|
||||
up big git-annex branch commits via the journal. Or using annex.private,
|
||||
since the private journal can build up a lot of keys in it.
|
||||
|
||||
An attempt at making it stream via unsafeInterleaveIO failed miserably
|
||||
and that is not the right approach. This would be a good place to use
|
||||
ResourceT, but it might need some changes to the Annex monad to allow
|
||||
combining the two. --[[Joey]]
|
||||
|
||||
> This used to also affect --all and using git-annex in a bare repo, but
|
||||
> that was avoided by using the overBranchFileContents interface. This
|
||||
> suggests that changing to that interface in unused and info would be a
|
||||
> solution.
|
||||
|
||||
[[!tag confirmed]]
|
||||
|
||||
[[!meta title="improve memory usage of unused and info when the journal contains a lot of files"]]
|
||||
|
|
Loading…
Reference in a new issue