update exportdb tree in getImportableContents
This avoids bottlenecking on git check-ignore in a particular situation. Also, there may have been a correctness issue with it not having updated it. When the exportdb is already up-to-date, this is not expensive. And the exportdb is updated elsewhere, so usually it is up-to-date. Sponsored-by: Joshua Antonishen on Patreon
This commit is contained in:
parent
5934e7d402
commit
532b227086
3 changed files with 64 additions and 1 deletions
|
@ -1049,7 +1049,10 @@ getImportableContents r importtreeconfig ci matcher = do
|
||||||
Just c' -> Just <$> filterunwantedchunk dbhandle c'
|
Just c' -> Just <$> filterunwantedchunk dbhandle c'
|
||||||
)
|
)
|
||||||
|
|
||||||
opendbhandle = Export.openDb (Remote.uuid r)
|
opendbhandle = do
|
||||||
|
h <- Export.openDb (Remote.uuid r)
|
||||||
|
void $ Export.updateExportTreeFromLog h
|
||||||
|
return h
|
||||||
|
|
||||||
wanted dbhandle (loc, (_cid, sz))
|
wanted dbhandle (loc, (_cid, sz))
|
||||||
| ingitdir = pure False
|
| ingitdir = pure False
|
||||||
|
|
|
@ -0,0 +1,8 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""Re: comment 17"""
|
||||||
|
date="2023-06-08T20:56:25Z"
|
||||||
|
content="""
|
||||||
|
I've fixed that, now `git-annex sync` avoids updating the adjusted branch
|
||||||
|
when there have been no changes to available content.
|
||||||
|
"""]]
|
|
@ -0,0 +1,52 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 19"""
|
||||||
|
date="2023-06-08T21:14:35Z"
|
||||||
|
content="""
|
||||||
|
I ran the second test case with 150000 files, and here's how long the syncs
|
||||||
|
took:
|
||||||
|
|
||||||
|
1. 0m0.170s
|
||||||
|
2. 33m37.810s
|
||||||
|
3. 0m36.644s
|
||||||
|
4. 2m58.773s
|
||||||
|
5. 13m38.126s
|
||||||
|
6. 0m3.933s
|
||||||
|
7. still running after 85 minutes
|
||||||
|
8. tbd
|
||||||
|
|
||||||
|
Sync 2 took longer than your results in comment #11, but consistent with my
|
||||||
|
laptop being slower. And I think 33 minutes to import 150k files is fine.
|
||||||
|
|
||||||
|
Still not seeing sync 5 take as long as it did for you.
|
||||||
|
Nowhere in the same ballpark. 13 minutes seems ok for a sync --content
|
||||||
|
that has to scan 150000 files.
|
||||||
|
|
||||||
|
The 7th sync is seeming too slow to me. For you it took equally long as the
|
||||||
|
5th sync, and both are --content syncs. So maybe I'm seeing the same
|
||||||
|
problem but only on the 7th for some reason?
|
||||||
|
|
||||||
|
For me it seemed to take a long time after outputting "list source ok". At
|
||||||
|
that point strace showed only a lot of futex(). And the cpu was pegged. And
|
||||||
|
it had the cidsdb open. Hmmm.. This is feeling a bit like the problem you
|
||||||
|
originally reported.
|
||||||
|
|
||||||
|
Interrupted the 7th sync and ran again...
|
||||||
|
|
||||||
|
The "list source" takes more than 15 minutes. It's bottlenecked on checking
|
||||||
|
git ignores. Bottleneck that I didn't notice with a smaller
|
||||||
|
number of files. Fixed that by making sure the export db was
|
||||||
|
populated, which it usually is, but not in the 7th sync's situation.
|
||||||
|
Now "list source" completes in less than 2 minutes.
|
||||||
|
|
||||||
|
And.. after that, it was back to the tight futex() loop.. And this time I
|
||||||
|
had intrumented the cidsdb, and it was importKeys
|
||||||
|
calling getContentIdentifierKeys.
|
||||||
|
|
||||||
|
Here's the kicker: It's only running getContentIdentifierKeys 15 times
|
||||||
|
per second. So that will take 166 minutes for all 150000 files.
|
||||||
|
|
||||||
|
Each call to getContentIdentifierKeys is taking 0.05 seconds.
|
||||||
|
So, this bug is back to the original problem of being bottlenecked on the
|
||||||
|
cidsdb. And it is smelling like a lack of indexes. Yay!
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue