Improve memory use of --all when using annex.private

This does not improve Annex.Branch.files at all, since it still uses ++ to
combine the lists, so forcing all but the last one.

But when there are a lot of files in the private journal, it does avoid
--all (or a bare repo) from buffering the filenames in memory.

See commit 653b719472 for prior discussion of
this buffering.

Sponsored-by: Graham Spencer on Patreon
This commit is contained in:
Joey Hess 2023-10-24 13:06:54 -04:00
parent 18f902efa9
commit 0da1d40cd4
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 42 additions and 25 deletions

View file

@ -1,16 +1,23 @@
Using --all, or running in a bare repo, as well as
`git annex unused` and `git annex info` all end up buffering the list of
all keys that have uncommitted journalled changes in memory.
This is due to Annex.Branch.files's call to getJournalledFilesStale which
reads all the files in the directory into a buffer.
`git annex unused --from=$remote` and `git annex info $remote`
buffer the list of keys that have uncommitted journalled changes
in memory. This is due to Annex.Branch.files's which reads all the
files in the journal into a buffer.
Note that the list of keys in the branch *does* stream in, so this
is only really a problem when using annex.alwayscommit=false to build
up big git-annex branch commits via the journal.
up big git-annex branch commits via the journal. Or using annex.private,
since the private journal can build up a lot of keys in it.
An attempt at making it stream via unsafeInterleaveIO failed miserably
and that is not the right approach. This would be a good place to use
ResourceT, but it might need some changes to the Annex monad to allow
combining the two. --[[Joey]]
> This used to also affect --all and using git-annex in a bare repo, but
> that was avoided by using the overBranchFileContents interface. This
> suggests that changing to that interface in unused and info would be a
> solution.
[[!tag confirmed]]
[[!meta title="improve memory usage of unused and info when the journal contains a lot of files"]]