comment
This commit is contained in:
parent
40017089f2
commit
594110a6af
1 changed files with 28 additions and 0 deletions
|
@ -0,0 +1,28 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2023-06-01T17:47:31Z"
|
||||
content="""
|
||||
As far as how reads from the cidsdb scale, I think that sqlite databases
|
||||
like this do slow down somewhat when tables get massive, but I don't really
|
||||
know the details. And it's of course possible that something could be
|
||||
improved in the schema or queries.
|
||||
|
||||
I've been working on that todo I linked above, and the speed gain is
|
||||
impressive when there are few or no changed files in the remote. With 20,000
|
||||
unchanged files, re-running git-annex import[1] sped up from 125.95 to 3.84
|
||||
seconds. With 40,000 unchanged files, it sped up from 477 to 8.13 seconds. I
|
||||
haven't tried with 150000 files yet but the pattern is clear.
|
||||
|
||||
> I can rerun the sync with an unchanged import directory. It still takes
|
||||
> 107 minutes, the majority of which is spent reading cidsdb. Only the
|
||||
> first minute or two are spent scanning the source area.
|
||||
|
||||
Well, I think I've certianly solved that problem. But I don't know if there's
|
||||
something else that is making the initial sync slower than it needs to
|
||||
be.
|
||||
|
||||
[1] More accurately, re-running it a second time, both to get a warm cache
|
||||
result, and because the first time, it is busy updating the cidsdb with
|
||||
the files that were imported earlier, as described in comment #2.
|
||||
"""]]
|
Loading…
Add table
Reference in a new issue