ImportableContentsChunkable
This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon
This commit is contained in:
parent
153f3600fb
commit
69f8e6c7c0
13 changed files with 286 additions and 92 deletions
|
@ -1,6 +1,6 @@
|
|||
{- git-annex import types
|
||||
-
|
||||
- Copyright 2019 Joey Hess <id@joeyh.name>
|
||||
- Copyright 2019-2021 Joey Hess <id@joeyh.name>
|
||||
-
|
||||
- Licensed under the GNU AGPL version 3 or higher.
|
||||
-}
|
||||
|
@ -13,6 +13,7 @@ import qualified Data.ByteString as S
|
|||
import Data.Char
|
||||
import Control.DeepSeq
|
||||
import GHC.Generics
|
||||
import qualified System.FilePath.Posix.ByteString as Posix
|
||||
|
||||
import Types.Export
|
||||
import Utility.QuickCheck
|
||||
|
@ -69,3 +70,34 @@ data ImportableContents info = ImportableContents
|
|||
deriving (Show, Generic)
|
||||
|
||||
instance NFData info => NFData (ImportableContents info)
|
||||
|
||||
{- ImportableContents, but it can be chunked into subtrees to avoid
|
||||
- all needing to fit in memory at the same time. -}
|
||||
data ImportableContentsChunkable m info
|
||||
= ImportableContentsComplete (ImportableContents info)
|
||||
-- ^ Used when not chunking
|
||||
| ImportableContentsChunked
|
||||
{ importableContentsChunk :: ImportableContentsChunk m info
|
||||
, importableHistoryComplete :: [ImportableContents info]
|
||||
-- ^ Chunking the history is not supported
|
||||
}
|
||||
|
||||
{- A chunk of ImportableContents, which is the entire content of a subtree
|
||||
- of the main tree. Nested subtrees are not allowed. -}
|
||||
data ImportableContentsChunk m info = ImportableContentsChunk
|
||||
{ importableContentsSubDir :: ImportChunkSubDir
|
||||
, importableContentsSubTree :: [(RawFilePath, info)]
|
||||
-- ^ locations are relative to importableContentsSubDir
|
||||
, importableContentsNextChunk :: m (Maybe (ImportableContentsChunk m info))
|
||||
-- ^ Continuation to get the next chunk.
|
||||
-- Returns Nothing when there are no more chunks.
|
||||
}
|
||||
|
||||
newtype ImportChunkSubDir = ImportChunkSubDir { importChunkSubDir :: RawFilePath }
|
||||
|
||||
importableContentsChunkFullLocation
|
||||
:: ImportChunkSubDir
|
||||
-> RawFilePath
|
||||
-> ImportLocation
|
||||
importableContentsChunkFullLocation (ImportChunkSubDir root) loc =
|
||||
mkImportLocation $ Posix.combine root loc
|
||||
|
|
|
@ -309,7 +309,7 @@ data ImportActions a = ImportActions
|
|||
--
|
||||
-- Throws exception on failure to access the remote.
|
||||
-- May return Nothing when the remote is unchanged since last time.
|
||||
{ listImportableContents :: a (Maybe (ImportableContents (ContentIdentifier, ByteSize)))
|
||||
{ listImportableContents :: a (Maybe (ImportableContentsChunkable a (ContentIdentifier, ByteSize)))
|
||||
-- Generates a Key (of any type) for the file stored on the
|
||||
-- remote at the ImportLocation. Does not download the file
|
||||
-- from the remote.
|
||||
|
@ -322,7 +322,7 @@ data ImportActions a = ImportActions
|
|||
-- since the ContentIdentifier was generated.
|
||||
--
|
||||
-- When it returns nothing, the file at the ImportLocation
|
||||
-- not by included in the imported tree.
|
||||
-- will not be included in the imported tree.
|
||||
--
|
||||
-- When the remote is thirdPartyPopulated, this should check if the
|
||||
-- file stored on the remote is the content of an annex object,
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue