progress in my head
This commit is contained in:
parent
cc66c9f9ad
commit
153f3600fb
3 changed files with 74 additions and 12 deletions
|
@ -0,0 +1,35 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 10"""
|
||||||
|
date="2021-10-06T17:09:50Z"
|
||||||
|
content="""
|
||||||
|
There is still a big PINNED spike though. I measured this memory use:
|
||||||
|
|
||||||
|
115344 post listContents
|
||||||
|
133816 post importKeys
|
||||||
|
236676 post recordImportTree
|
||||||
|
|
||||||
|
listContents produces an `ImportableContents (ContentIdentifier, ByteSize)`
|
||||||
|
and that gets transformed through importKeys
|
||||||
|
to `ImportableContents (Either Sha Key)`. The GC should be able to
|
||||||
|
free up the first as it's being traversed, but PINNED still goes up during
|
||||||
|
that, and memory increases by 20% or so.
|
||||||
|
|
||||||
|
Then recordImportTree calls mktreeitem and treeItemsToTree, which between
|
||||||
|
then double the memory.
|
||||||
|
|
||||||
|
So I think I understand where the memory use is, although why it's PINNED
|
||||||
|
is still not clear, and unpinning could still help. I did try converting
|
||||||
|
TopFilePath to ShortByteString, since TreeItems contain them, but it didn't
|
||||||
|
reduce the amount PINNED and actually used more memory.
|
||||||
|
|
||||||
|
To avoid the allocation entirely, it seems that borg's
|
||||||
|
listImportableContents would need to generate a Tree itself, rather than
|
||||||
|
using ImportableContents. And it could, probably fairly efficiently, but it
|
||||||
|
would not be able to reuse the tree import interface as it does now.
|
||||||
|
|
||||||
|
(borg could return a `ImportableContents (Either Sha Key)` more easily,
|
||||||
|
and still reuse part of the interface, but the conversion to that only
|
||||||
|
uses 20% or so of memory so it's not a big enough win. Also when I looked
|
||||||
|
at it, it was still not going to be an easy refactoring.)
|
||||||
|
"""]]
|
|
@ -0,0 +1,39 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 11"""
|
||||||
|
date="2021-10-06T18:03:23Z"
|
||||||
|
content="""
|
||||||
|
@tomdhunt the tree is being stored in git, so the natural way
|
||||||
|
to do something like a difference encoding would be a series of trees
|
||||||
|
in a commit sequence.
|
||||||
|
|
||||||
|
The tree import interface does support that, but borg remote
|
||||||
|
doesn't bother and puts all the items in a single tree. But even if it did,
|
||||||
|
it would still populate the same ImportableContents data structure with
|
||||||
|
the same amount of data just a different layout.
|
||||||
|
|
||||||
|
But maybe this line of thinking does point toward a solution.. Suppose that
|
||||||
|
there was a way for listImportableContents to generate an
|
||||||
|
ImportableContentsChunk that contained a subtree, and a continuation to get
|
||||||
|
the next subtree. Then each subtree's worth of ImportableContents would be
|
||||||
|
passed through to recordImportTree (a version omitting the parts of it that
|
||||||
|
commit the tree), and only one subtree at a time would occupy memory. At
|
||||||
|
the end a tree would be constucted containing all the subtrees, and
|
||||||
|
committed.
|
||||||
|
|
||||||
|
For borg, each archive would be a subtree; 500k filenames will fit in memory
|
||||||
|
or at least fit better than `365*500k`.
|
||||||
|
|
||||||
|
The interface I'm thinking about is something like this:
|
||||||
|
|
||||||
|
data ChunkedImportableContents info
|
||||||
|
= ImportableContentsChunk
|
||||||
|
{ importableContentsRoot :: ImportLocation
|
||||||
|
, importableContentsSubTree :: [(ImportLocation, info)]
|
||||||
|
-- ^ locations are relative to importableContentsRoot
|
||||||
|
, importableContentsContinuation :: Annex (ChunkedImportableContents info)
|
||||||
|
}
|
||||||
|
| ImportableContentsComplete (ImportableContents info)
|
||||||
|
|
||||||
|
This is a promising idea!
|
||||||
|
"""]]
|
|
@ -8,16 +8,4 @@ and the -hc profile is unchanged. So the pinned memory is not in refs.
|
||||||
|
|
||||||
Also tried converting Key to use ShortByteString. That was a win!
|
Also tried converting Key to use ShortByteString. That was a win!
|
||||||
My 20 borg archive test case is down from 320 mb to 242 mb.
|
My 20 borg archive test case is down from 320 mb to 242 mb.
|
||||||
|
|
||||||
Looking at Command.SyncpullThirdPartyPopulated,
|
|
||||||
it calls listContents, which calls borg's listImportableContents,
|
|
||||||
and produces an `ImportableContents (ContentIdentifier, ByteSize)`
|
|
||||||
then that gets passed through importKeys to produce
|
|
||||||
an `ImportableContents (Either Sha Key)`. Probably
|
|
||||||
double memory is used while doing that conversion, unless
|
|
||||||
the GC manages to free the first one while it's traversed.
|
|
||||||
|
|
||||||
If borg's listImportableContents included a Key (which it does
|
|
||||||
produce already only to throw away!) that might
|
|
||||||
eliminate the big spike just before treeItemsToTree.
|
|
||||||
"""]]
|
"""]]
|
||||||
|
|
Loading…
Reference in a new issue