ImportableContentsChunkable

This improves the borg special remote memory usage, by
letting it only load one archive's worth of filenames into memory at a
time, and building up a larger tree out of the chunks.

When a borg repository has many archives, git-annex could easily OOM
before. Now, it will use only memory proportional to the number of
annexed keys in an archive.

Minor implementation wart: Each new chunk re-opens the content
identifier database, and also a new vector clock is used for each chunk.
This is a minor innefficiency only; the use of continuations makes
it hard to avoid, although putting the database handle into a Reader
monad would be one way to fix it.

It may later be possible to extend the ImportableContentsChunkable
interface to remotes that are not third-party populated. However, that
would perhaps need an interface that does not use continuations.

The ImportableContentsChunkable interface currently does not allow
populating the top of the tree with anything other than subtrees. It
would be easy to extend it to allow putting files in that tree, but borg
doesn't need that so I left it out for now.

Sponsored-by: Noam Kremen on Patreon
This commit is contained in:
Joey Hess 2021-10-06 17:05:32 -04:00
parent 153f3600fb
commit 69f8e6c7c0
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
13 changed files with 286 additions and 92 deletions

View file

@ -309,7 +309,7 @@ data ImportActions a = ImportActions
--
-- Throws exception on failure to access the remote.
-- May return Nothing when the remote is unchanged since last time.
{ listImportableContents :: a (Maybe (ImportableContents (ContentIdentifier, ByteSize)))
{ listImportableContents :: a (Maybe (ImportableContentsChunkable a (ContentIdentifier, ByteSize)))
-- Generates a Key (of any type) for the file stored on the
-- remote at the ImportLocation. Does not download the file
-- from the remote.
@ -322,7 +322,7 @@ data ImportActions a = ImportActions
-- since the ContentIdentifier was generated.
--
-- When it returns nothing, the file at the ImportLocation
-- not by included in the imported tree.
-- will not be included in the imported tree.
--
-- When the remote is thirdPartyPopulated, this should check if the
-- file stored on the remote is the content of an annex object,