convert ExportLocation to ShortByteString to avoid PINNED memory fragmentation

This adds the overhead of a copy whenever converting to/from ExportLocation and
ImportLocation.

borg: Some improvements to memory use when importing a lot of archives.
(It's still pretty bad.)

Sponsored-by: Mark Reidenbach on Patreon
This commit is contained in:
Joey Hess 2021-10-05 14:51:55 -04:00
parent 8b4f331b09
commit 45dfddd33f
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 44 additions and 8 deletions

View file

@ -15,6 +15,7 @@ git-annex (8.20210904) UNRELEASED; urgency=medium
incrementally verified, when used on NTFS and perhaps other filesystems.
* reinject: Fix crash when reinjecting a file from outside the repository.
(Reversion in version 8.20210621)
* borg: Some improvements to memory use when importing a lot of archives.
-- Joey Hess <id@joeyh.name> Fri, 03 Sep 2021 12:02:55 -0400

View file

@ -1,6 +1,6 @@
{- git-annex export types
-
- Copyright 2017 Joey Hess <id@joeyh.name>
- Copyright 2017-2021 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@ -21,23 +21,28 @@ import Git.FilePath
import Utility.Split
import Utility.FileSystemEncoding
import Data.ByteString.Short as S
import qualified System.FilePath.Posix as Posix
import GHC.Generics
import Control.DeepSeq
-- A location on a remote that a key can be exported to.
-- The RawFilePath will be relative to the top of the remote,
-- and uses unix-style path separators.
newtype ExportLocation = ExportLocation RawFilePath
-- A location such as a path on a remote, that a key can be exported to.
-- The path is relative to the top of the remote, and uses unix-style
-- path separators.
--
-- This uses a ShortByteString to avoid problems with ByteString getting
-- PINNED in memory which caused memory fragmentation and excessive memory
-- use.
newtype ExportLocation = ExportLocation S.ShortByteString
deriving (Show, Eq, Generic)
instance NFData ExportLocation
mkExportLocation :: RawFilePath -> ExportLocation
mkExportLocation = ExportLocation . toInternalGitPath
mkExportLocation = ExportLocation . S.toShort . toInternalGitPath
fromExportLocation :: ExportLocation -> RawFilePath
fromExportLocation (ExportLocation f) = f
fromExportLocation (ExportLocation f) = S.fromShort f
newtype ExportDirectory = ExportDirectory RawFilePath
deriving (Show, Eq)
@ -58,4 +63,4 @@ exportDirectories (ExportLocation f) =
subs ps (d:ds) = (d:ps) : subs (d:ps) ds
dirs = map Posix.dropTrailingPathSeparator $
dropFromEnd 1 $ Posix.splitPath $ decodeBS f
dropFromEnd 1 $ Posix.splitPath $ decodeBS $ S.fromShort f

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2021-10-05T17:39:29Z"
content="""
A heap profile shows that the problem is an accumulation of PINNED memory.
Both the memory used by borg list and by constructing the git tree.
At least the borg list part seems very similar to the problem described here.
<https://well-typed.com/blog/2020/08/memory-fragmentation/>
The borg list gets read into a lazy bytestring, then it's split up
and copied into strict bytestring chunks. But those get bundled
back up into larger memory allocations as explained there. Then the files
that are not git-annex objects are filtered out, resulting in memory
fragmentation.
I tried throwing in some S.copy in the borg list and filter part. Didn't
help.
I converted ImportLocation to use a ShortByteString, and that solved,
or at least improved, the borg list part of the problem. With 20 borg
archives with 10000 annex objects each, the heap profile which had
showed around 90 mb, mostly PINNED during that first stage, went down
to 8 mb, none PINNED. (Although looking at the git-annex process
from outside, it still allocated 120 mb or so.)
That leaves the memory use when constructing the git tree.
Which would also probably affect importtree special remotes,
when they have a large number of files.
"""]]