convert ExportLocation to ShortByteString to avoid PINNED memory fragmentation
This adds the overhead of a copy whenever converting to/from ExportLocation and ImportLocation. borg: Some improvements to memory use when importing a lot of archives. (It's still pretty bad.) Sponsored-by: Mark Reidenbach on Patreon
This commit is contained in:
parent
8b4f331b09
commit
45dfddd33f
3 changed files with 44 additions and 8 deletions
|
@ -15,6 +15,7 @@ git-annex (8.20210904) UNRELEASED; urgency=medium
|
|||
incrementally verified, when used on NTFS and perhaps other filesystems.
|
||||
* reinject: Fix crash when reinjecting a file from outside the repository.
|
||||
(Reversion in version 8.20210621)
|
||||
* borg: Some improvements to memory use when importing a lot of archives.
|
||||
|
||||
-- Joey Hess <id@joeyh.name> Fri, 03 Sep 2021 12:02:55 -0400
|
||||
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
{- git-annex export types
|
||||
-
|
||||
- Copyright 2017 Joey Hess <id@joeyh.name>
|
||||
- Copyright 2017-2021 Joey Hess <id@joeyh.name>
|
||||
-
|
||||
- Licensed under the GNU AGPL version 3 or higher.
|
||||
-}
|
||||
|
@ -21,23 +21,28 @@ import Git.FilePath
|
|||
import Utility.Split
|
||||
import Utility.FileSystemEncoding
|
||||
|
||||
import Data.ByteString.Short as S
|
||||
import qualified System.FilePath.Posix as Posix
|
||||
import GHC.Generics
|
||||
import Control.DeepSeq
|
||||
|
||||
-- A location on a remote that a key can be exported to.
|
||||
-- The RawFilePath will be relative to the top of the remote,
|
||||
-- and uses unix-style path separators.
|
||||
newtype ExportLocation = ExportLocation RawFilePath
|
||||
-- A location such as a path on a remote, that a key can be exported to.
|
||||
-- The path is relative to the top of the remote, and uses unix-style
|
||||
-- path separators.
|
||||
--
|
||||
-- This uses a ShortByteString to avoid problems with ByteString getting
|
||||
-- PINNED in memory which caused memory fragmentation and excessive memory
|
||||
-- use.
|
||||
newtype ExportLocation = ExportLocation S.ShortByteString
|
||||
deriving (Show, Eq, Generic)
|
||||
|
||||
instance NFData ExportLocation
|
||||
|
||||
mkExportLocation :: RawFilePath -> ExportLocation
|
||||
mkExportLocation = ExportLocation . toInternalGitPath
|
||||
mkExportLocation = ExportLocation . S.toShort . toInternalGitPath
|
||||
|
||||
fromExportLocation :: ExportLocation -> RawFilePath
|
||||
fromExportLocation (ExportLocation f) = f
|
||||
fromExportLocation (ExportLocation f) = S.fromShort f
|
||||
|
||||
newtype ExportDirectory = ExportDirectory RawFilePath
|
||||
deriving (Show, Eq)
|
||||
|
@ -58,4 +63,4 @@ exportDirectories (ExportLocation f) =
|
|||
subs ps (d:ds) = (d:ps) : subs (d:ps) ds
|
||||
|
||||
dirs = map Posix.dropTrailingPathSeparator $
|
||||
dropFromEnd 1 $ Posix.splitPath $ decodeBS f
|
||||
dropFromEnd 1 $ Posix.splitPath $ decodeBS $ S.fromShort f
|
||||
|
|
|
@ -0,0 +1,30 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2021-10-05T17:39:29Z"
|
||||
content="""
|
||||
A heap profile shows that the problem is an accumulation of PINNED memory.
|
||||
Both the memory used by borg list and by constructing the git tree.
|
||||
|
||||
At least the borg list part seems very similar to the problem described here.
|
||||
<https://well-typed.com/blog/2020/08/memory-fragmentation/>
|
||||
The borg list gets read into a lazy bytestring, then it's split up
|
||||
and copied into strict bytestring chunks. But those get bundled
|
||||
back up into larger memory allocations as explained there. Then the files
|
||||
that are not git-annex objects are filtered out, resulting in memory
|
||||
fragmentation.
|
||||
|
||||
I tried throwing in some S.copy in the borg list and filter part. Didn't
|
||||
help.
|
||||
|
||||
I converted ImportLocation to use a ShortByteString, and that solved,
|
||||
or at least improved, the borg list part of the problem. With 20 borg
|
||||
archives with 10000 annex objects each, the heap profile which had
|
||||
showed around 90 mb, mostly PINNED during that first stage, went down
|
||||
to 8 mb, none PINNED. (Although looking at the git-annex process
|
||||
from outside, it still allocated 120 mb or so.)
|
||||
|
||||
That leaves the memory use when constructing the git tree.
|
||||
Which would also probably affect importtree special remotes,
|
||||
when they have a large number of files.
|
||||
"""]]
|
Loading…
Reference in a new issue