update exportdb tree in getImportableContents
This avoids bottlenecking on git check-ignore in a particular situation. Also, there may have been a correctness issue with it not having updated it. When the exportdb is already up-to-date, this is not expensive. And the exportdb is updated elsewhere, so usually it is up-to-date. Sponsored-by: Joshua Antonishen on Patreon
This commit is contained in:
parent
5934e7d402
commit
532b227086
3 changed files with 64 additions and 1 deletions
|
@ -1049,7 +1049,10 @@ getImportableContents r importtreeconfig ci matcher = do
|
|||
Just c' -> Just <$> filterunwantedchunk dbhandle c'
|
||||
)
|
||||
|
||||
opendbhandle = Export.openDb (Remote.uuid r)
|
||||
opendbhandle = do
|
||||
h <- Export.openDb (Remote.uuid r)
|
||||
void $ Export.updateExportTreeFromLog h
|
||||
return h
|
||||
|
||||
wanted dbhandle (loc, (_cid, sz))
|
||||
| ingitdir = pure False
|
||||
|
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""Re: comment 17"""
|
||||
date="2023-06-08T20:56:25Z"
|
||||
content="""
|
||||
I've fixed that, now `git-annex sync` avoids updating the adjusted branch
|
||||
when there have been no changes to available content.
|
||||
"""]]
|
|
@ -0,0 +1,52 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 19"""
|
||||
date="2023-06-08T21:14:35Z"
|
||||
content="""
|
||||
I ran the second test case with 150000 files, and here's how long the syncs
|
||||
took:
|
||||
|
||||
1. 0m0.170s
|
||||
2. 33m37.810s
|
||||
3. 0m36.644s
|
||||
4. 2m58.773s
|
||||
5. 13m38.126s
|
||||
6. 0m3.933s
|
||||
7. still running after 85 minutes
|
||||
8. tbd
|
||||
|
||||
Sync 2 took longer than your results in comment #11, but consistent with my
|
||||
laptop being slower. And I think 33 minutes to import 150k files is fine.
|
||||
|
||||
Still not seeing sync 5 take as long as it did for you.
|
||||
Nowhere in the same ballpark. 13 minutes seems ok for a sync --content
|
||||
that has to scan 150000 files.
|
||||
|
||||
The 7th sync is seeming too slow to me. For you it took equally long as the
|
||||
5th sync, and both are --content syncs. So maybe I'm seeing the same
|
||||
problem but only on the 7th for some reason?
|
||||
|
||||
For me it seemed to take a long time after outputting "list source ok". At
|
||||
that point strace showed only a lot of futex(). And the cpu was pegged. And
|
||||
it had the cidsdb open. Hmmm.. This is feeling a bit like the problem you
|
||||
originally reported.
|
||||
|
||||
Interrupted the 7th sync and ran again...
|
||||
|
||||
The "list source" takes more than 15 minutes. It's bottlenecked on checking
|
||||
git ignores. Bottleneck that I didn't notice with a smaller
|
||||
number of files. Fixed that by making sure the export db was
|
||||
populated, which it usually is, but not in the 7th sync's situation.
|
||||
Now "list source" completes in less than 2 minutes.
|
||||
|
||||
And.. after that, it was back to the tight futex() loop.. And this time I
|
||||
had intrumented the cidsdb, and it was importKeys
|
||||
calling getContentIdentifierKeys.
|
||||
|
||||
Here's the kicker: It's only running getContentIdentifierKeys 15 times
|
||||
per second. So that will take 166 minutes for all 150000 files.
|
||||
|
||||
Each call to getContentIdentifierKeys is taking 0.05 seconds.
|
||||
So, this bug is back to the original problem of being bottlenecked on the
|
||||
cidsdb. And it is smelling like a lack of indexes. Yay!
|
||||
"""]]
|
Loading…
Reference in a new issue