progress in my head
This commit is contained in:
parent
cc66c9f9ad
commit
153f3600fb
3 changed files with 74 additions and 12 deletions
|
@ -0,0 +1,35 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 10"""
|
||||
date="2021-10-06T17:09:50Z"
|
||||
content="""
|
||||
There is still a big PINNED spike though. I measured this memory use:
|
||||
|
||||
115344 post listContents
|
||||
133816 post importKeys
|
||||
236676 post recordImportTree
|
||||
|
||||
listContents produces an `ImportableContents (ContentIdentifier, ByteSize)`
|
||||
and that gets transformed through importKeys
|
||||
to `ImportableContents (Either Sha Key)`. The GC should be able to
|
||||
free up the first as it's being traversed, but PINNED still goes up during
|
||||
that, and memory increases by 20% or so.
|
||||
|
||||
Then recordImportTree calls mktreeitem and treeItemsToTree, which between
|
||||
then double the memory.
|
||||
|
||||
So I think I understand where the memory use is, although why it's PINNED
|
||||
is still not clear, and unpinning could still help. I did try converting
|
||||
TopFilePath to ShortByteString, since TreeItems contain them, but it didn't
|
||||
reduce the amount PINNED and actually used more memory.
|
||||
|
||||
To avoid the allocation entirely, it seems that borg's
|
||||
listImportableContents would need to generate a Tree itself, rather than
|
||||
using ImportableContents. And it could, probably fairly efficiently, but it
|
||||
would not be able to reuse the tree import interface as it does now.
|
||||
|
||||
(borg could return a `ImportableContents (Either Sha Key)` more easily,
|
||||
and still reuse part of the interface, but the conversion to that only
|
||||
uses 20% or so of memory so it's not a big enough win. Also when I looked
|
||||
at it, it was still not going to be an easy refactoring.)
|
||||
"""]]
|
|
@ -0,0 +1,39 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 11"""
|
||||
date="2021-10-06T18:03:23Z"
|
||||
content="""
|
||||
@tomdhunt the tree is being stored in git, so the natural way
|
||||
to do something like a difference encoding would be a series of trees
|
||||
in a commit sequence.
|
||||
|
||||
The tree import interface does support that, but borg remote
|
||||
doesn't bother and puts all the items in a single tree. But even if it did,
|
||||
it would still populate the same ImportableContents data structure with
|
||||
the same amount of data just a different layout.
|
||||
|
||||
But maybe this line of thinking does point toward a solution.. Suppose that
|
||||
there was a way for listImportableContents to generate an
|
||||
ImportableContentsChunk that contained a subtree, and a continuation to get
|
||||
the next subtree. Then each subtree's worth of ImportableContents would be
|
||||
passed through to recordImportTree (a version omitting the parts of it that
|
||||
commit the tree), and only one subtree at a time would occupy memory. At
|
||||
the end a tree would be constucted containing all the subtrees, and
|
||||
committed.
|
||||
|
||||
For borg, each archive would be a subtree; 500k filenames will fit in memory
|
||||
or at least fit better than `365*500k`.
|
||||
|
||||
The interface I'm thinking about is something like this:
|
||||
|
||||
data ChunkedImportableContents info
|
||||
= ImportableContentsChunk
|
||||
{ importableContentsRoot :: ImportLocation
|
||||
, importableContentsSubTree :: [(ImportLocation, info)]
|
||||
-- ^ locations are relative to importableContentsRoot
|
||||
, importableContentsContinuation :: Annex (ChunkedImportableContents info)
|
||||
}
|
||||
| ImportableContentsComplete (ImportableContents info)
|
||||
|
||||
This is a promising idea!
|
||||
"""]]
|
|
@ -8,16 +8,4 @@ and the -hc profile is unchanged. So the pinned memory is not in refs.
|
|||
|
||||
Also tried converting Key to use ShortByteString. That was a win!
|
||||
My 20 borg archive test case is down from 320 mb to 242 mb.
|
||||
|
||||
Looking at Command.SyncpullThirdPartyPopulated,
|
||||
it calls listContents, which calls borg's listImportableContents,
|
||||
and produces an `ImportableContents (ContentIdentifier, ByteSize)`
|
||||
then that gets passed through importKeys to produce
|
||||
an `ImportableContents (Either Sha Key)`. Probably
|
||||
double memory is used while doing that conversion, unless
|
||||
the GC manages to free the first one while it's traversed.
|
||||
|
||||
If borg's listImportableContents included a Key (which it does
|
||||
produce already only to throw away!) that might
|
||||
eliminate the big spike just before treeItemsToTree.
|
||||
"""]]
|
||||
|
|
Loading…
Reference in a new issue