ImportableContentsChunkable

This improves the borg special remote memory usage, by
letting it only load one archive's worth of filenames into memory at a
time, and building up a larger tree out of the chunks.

When a borg repository has many archives, git-annex could easily OOM
before. Now, it will use only memory proportional to the number of
annexed keys in an archive.

Minor implementation wart: Each new chunk re-opens the content
identifier database, and also a new vector clock is used for each chunk.
This is a minor innefficiency only; the use of continuations makes
it hard to avoid, although putting the database handle into a Reader
monad would be one way to fix it.

It may later be possible to extend the ImportableContentsChunkable
interface to remotes that are not third-party populated. However, that
would perhaps need an interface that does not use continuations.

The ImportableContentsChunkable interface currently does not allow
populating the top of the tree with anything other than subtrees. It
would be easy to extend it to allow putting files in that tree, but borg
doesn't need that so I left it out for now.

Sponsored-by: Noam Kremen on Patreon
This commit is contained in:
Joey Hess 2021-10-06 17:05:32 -04:00
parent 153f3600fb
commit 69f8e6c7c0
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
13 changed files with 286 additions and 92 deletions

View file

@ -3,3 +3,5 @@ memory, then got OOM-killed.
I don't know if this is a memory leak or just trying to load too much, but it seems like this is a thing you should be able to do on
a machine with 64G of RAM.
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="joey"
subject="""comment 12"""
date="2021-10-08T17:06:05Z"
content="""
I've fixed this problem, my test case tops out at 160 mb now, and adding more
archives to the borg repo no longer increases memory use. Memory use is now
proportional to the number of annexed objects in a borg archive.
"""]]