convert ExportLocation to ShortByteString to avoid PINNED memory fragmentation
This adds the overhead of a copy whenever converting to/from ExportLocation and ImportLocation. borg: Some improvements to memory use when importing a lot of archives. (It's still pretty bad.) Sponsored-by: Mark Reidenbach on Patreon
This commit is contained in:
parent
8b4f331b09
commit
45dfddd33f
3 changed files with 44 additions and 8 deletions
|
@ -15,6 +15,7 @@ git-annex (8.20210904) UNRELEASED; urgency=medium
|
||||||
incrementally verified, when used on NTFS and perhaps other filesystems.
|
incrementally verified, when used on NTFS and perhaps other filesystems.
|
||||||
* reinject: Fix crash when reinjecting a file from outside the repository.
|
* reinject: Fix crash when reinjecting a file from outside the repository.
|
||||||
(Reversion in version 8.20210621)
|
(Reversion in version 8.20210621)
|
||||||
|
* borg: Some improvements to memory use when importing a lot of archives.
|
||||||
|
|
||||||
-- Joey Hess <id@joeyh.name> Fri, 03 Sep 2021 12:02:55 -0400
|
-- Joey Hess <id@joeyh.name> Fri, 03 Sep 2021 12:02:55 -0400
|
||||||
|
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
{- git-annex export types
|
{- git-annex export types
|
||||||
-
|
-
|
||||||
- Copyright 2017 Joey Hess <id@joeyh.name>
|
- Copyright 2017-2021 Joey Hess <id@joeyh.name>
|
||||||
-
|
-
|
||||||
- Licensed under the GNU AGPL version 3 or higher.
|
- Licensed under the GNU AGPL version 3 or higher.
|
||||||
-}
|
-}
|
||||||
|
@ -21,23 +21,28 @@ import Git.FilePath
|
||||||
import Utility.Split
|
import Utility.Split
|
||||||
import Utility.FileSystemEncoding
|
import Utility.FileSystemEncoding
|
||||||
|
|
||||||
|
import Data.ByteString.Short as S
|
||||||
import qualified System.FilePath.Posix as Posix
|
import qualified System.FilePath.Posix as Posix
|
||||||
import GHC.Generics
|
import GHC.Generics
|
||||||
import Control.DeepSeq
|
import Control.DeepSeq
|
||||||
|
|
||||||
-- A location on a remote that a key can be exported to.
|
-- A location such as a path on a remote, that a key can be exported to.
|
||||||
-- The RawFilePath will be relative to the top of the remote,
|
-- The path is relative to the top of the remote, and uses unix-style
|
||||||
-- and uses unix-style path separators.
|
-- path separators.
|
||||||
newtype ExportLocation = ExportLocation RawFilePath
|
--
|
||||||
|
-- This uses a ShortByteString to avoid problems with ByteString getting
|
||||||
|
-- PINNED in memory which caused memory fragmentation and excessive memory
|
||||||
|
-- use.
|
||||||
|
newtype ExportLocation = ExportLocation S.ShortByteString
|
||||||
deriving (Show, Eq, Generic)
|
deriving (Show, Eq, Generic)
|
||||||
|
|
||||||
instance NFData ExportLocation
|
instance NFData ExportLocation
|
||||||
|
|
||||||
mkExportLocation :: RawFilePath -> ExportLocation
|
mkExportLocation :: RawFilePath -> ExportLocation
|
||||||
mkExportLocation = ExportLocation . toInternalGitPath
|
mkExportLocation = ExportLocation . S.toShort . toInternalGitPath
|
||||||
|
|
||||||
fromExportLocation :: ExportLocation -> RawFilePath
|
fromExportLocation :: ExportLocation -> RawFilePath
|
||||||
fromExportLocation (ExportLocation f) = f
|
fromExportLocation (ExportLocation f) = S.fromShort f
|
||||||
|
|
||||||
newtype ExportDirectory = ExportDirectory RawFilePath
|
newtype ExportDirectory = ExportDirectory RawFilePath
|
||||||
deriving (Show, Eq)
|
deriving (Show, Eq)
|
||||||
|
@ -58,4 +63,4 @@ exportDirectories (ExportLocation f) =
|
||||||
subs ps (d:ds) = (d:ps) : subs (d:ps) ds
|
subs ps (d:ds) = (d:ps) : subs (d:ps) ds
|
||||||
|
|
||||||
dirs = map Posix.dropTrailingPathSeparator $
|
dirs = map Posix.dropTrailingPathSeparator $
|
||||||
dropFromEnd 1 $ Posix.splitPath $ decodeBS f
|
dropFromEnd 1 $ Posix.splitPath $ decodeBS $ S.fromShort f
|
||||||
|
|
|
@ -0,0 +1,30 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 3"""
|
||||||
|
date="2021-10-05T17:39:29Z"
|
||||||
|
content="""
|
||||||
|
A heap profile shows that the problem is an accumulation of PINNED memory.
|
||||||
|
Both the memory used by borg list and by constructing the git tree.
|
||||||
|
|
||||||
|
At least the borg list part seems very similar to the problem described here.
|
||||||
|
<https://well-typed.com/blog/2020/08/memory-fragmentation/>
|
||||||
|
The borg list gets read into a lazy bytestring, then it's split up
|
||||||
|
and copied into strict bytestring chunks. But those get bundled
|
||||||
|
back up into larger memory allocations as explained there. Then the files
|
||||||
|
that are not git-annex objects are filtered out, resulting in memory
|
||||||
|
fragmentation.
|
||||||
|
|
||||||
|
I tried throwing in some S.copy in the borg list and filter part. Didn't
|
||||||
|
help.
|
||||||
|
|
||||||
|
I converted ImportLocation to use a ShortByteString, and that solved,
|
||||||
|
or at least improved, the borg list part of the problem. With 20 borg
|
||||||
|
archives with 10000 annex objects each, the heap profile which had
|
||||||
|
showed around 90 mb, mostly PINNED during that first stage, went down
|
||||||
|
to 8 mb, none PINNED. (Although looking at the git-annex process
|
||||||
|
from outside, it still allocated 120 mb or so.)
|
||||||
|
|
||||||
|
That leaves the memory use when constructing the git tree.
|
||||||
|
Which would also probably affect importtree special remotes,
|
||||||
|
when they have a large number of files.
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue