reproduced
This commit is contained in:
parent
26a9ea12d1
commit
0eff5a3f71
2 changed files with 48 additions and 0 deletions
|
@ -0,0 +1,31 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 23"""
|
||||
date="2021-06-14T14:09:07Z"
|
||||
content="""
|
||||
The file contents all being the same is the crucial thing. On linux,
|
||||
adding 1000 dup files at a time (all in same directory), I get:
|
||||
|
||||
run 1: 0:08
|
||||
run 2: 0:42
|
||||
run 3: 1:14
|
||||
run 4: 1:46
|
||||
|
||||
After run 4, adding 1000 files with all different content takes
|
||||
0:11, so not appreciably slowed down; it only affects adding dups,
|
||||
and only when there are a *lot* of them.
|
||||
|
||||
This feels like quite an edge case, and also not
|
||||
really a new problem, since unlocked files would have already
|
||||
had the same problem before recent changes.
|
||||
|
||||
I thought this might be an innefficiency in sqlite's index, similar to how
|
||||
hash tables can scale poorly when a lot of things end up in the same
|
||||
bucket. But disabling the index did not improve performance.
|
||||
|
||||
Aha -- the slowdown is caused by `git-annex add` looking to see what other
|
||||
annexed files use the same content, so that it can populate any unlocked
|
||||
files that didn't have the content present before. With all these locked
|
||||
files now recorded in the db, it has to check each file in turn, and
|
||||
there's the `O(N^2)`
|
||||
"""]]
|
|
@ -0,0 +1,17 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 24"""
|
||||
date="2021-06-14T15:36:30Z"
|
||||
content="""
|
||||
If the database recorded when files were unlocked or not, that could be
|
||||
avoided, but tracking that would add a lot of complexity for what is just
|
||||
an edge case. And probably slow things down generally by some amount due to
|
||||
the db being larger.
|
||||
|
||||
It seems almost cheating, but it could remember the last few keys it's added,
|
||||
and avoid trying to populate unlocked files when adding those keys again.
|
||||
This would slow down the usual case by some tiny amount (eg an IORef access)
|
||||
but avoid `O(N^2)` in this edge case. Though it wouldn't fix all edge cases,
|
||||
eg when the files it's adding rotate through X different contents, and X is
|
||||
larger than the number of keys it remembers.
|
||||
"""]]
|
Loading…
Add table
Reference in a new issue