reproduced
This commit is contained in:
parent
26a9ea12d1
commit
0eff5a3f71
2 changed files with 48 additions and 0 deletions
|
@ -0,0 +1,31 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 23"""
|
||||||
|
date="2021-06-14T14:09:07Z"
|
||||||
|
content="""
|
||||||
|
The file contents all being the same is the crucial thing. On linux,
|
||||||
|
adding 1000 dup files at a time (all in same directory), I get:
|
||||||
|
|
||||||
|
run 1: 0:08
|
||||||
|
run 2: 0:42
|
||||||
|
run 3: 1:14
|
||||||
|
run 4: 1:46
|
||||||
|
|
||||||
|
After run 4, adding 1000 files with all different content takes
|
||||||
|
0:11, so not appreciably slowed down; it only affects adding dups,
|
||||||
|
and only when there are a *lot* of them.
|
||||||
|
|
||||||
|
This feels like quite an edge case, and also not
|
||||||
|
really a new problem, since unlocked files would have already
|
||||||
|
had the same problem before recent changes.
|
||||||
|
|
||||||
|
I thought this might be an innefficiency in sqlite's index, similar to how
|
||||||
|
hash tables can scale poorly when a lot of things end up in the same
|
||||||
|
bucket. But disabling the index did not improve performance.
|
||||||
|
|
||||||
|
Aha -- the slowdown is caused by `git-annex add` looking to see what other
|
||||||
|
annexed files use the same content, so that it can populate any unlocked
|
||||||
|
files that didn't have the content present before. With all these locked
|
||||||
|
files now recorded in the db, it has to check each file in turn, and
|
||||||
|
there's the `O(N^2)`
|
||||||
|
"""]]
|
|
@ -0,0 +1,17 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 24"""
|
||||||
|
date="2021-06-14T15:36:30Z"
|
||||||
|
content="""
|
||||||
|
If the database recorded when files were unlocked or not, that could be
|
||||||
|
avoided, but tracking that would add a lot of complexity for what is just
|
||||||
|
an edge case. And probably slow things down generally by some amount due to
|
||||||
|
the db being larger.
|
||||||
|
|
||||||
|
It seems almost cheating, but it could remember the last few keys it's added,
|
||||||
|
and avoid trying to populate unlocked files when adding those keys again.
|
||||||
|
This would slow down the usual case by some tiny amount (eg an IORef access)
|
||||||
|
but avoid `O(N^2)` in this edge case. Though it wouldn't fix all edge cases,
|
||||||
|
eg when the files it's adding rotate through X different contents, and X is
|
||||||
|
larger than the number of keys it remembers.
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue