comment
This commit is contained in:
parent
f6dd34ca81
commit
b40b368857
1 changed files with 33 additions and 0 deletions
|
@ -0,0 +1,33 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 8"""
|
||||||
|
date="2023-06-02T15:25:12Z"
|
||||||
|
content="""
|
||||||
|
@jgoerzen if you want to take a look at the sql, see
|
||||||
|
`Database/ContentIdentifier.hs`. `getContentIdentifierKeys` is the query
|
||||||
|
that it's running on each file. I'm not really sure right now if the
|
||||||
|
persistent schema in there actually creates an index that is used for that
|
||||||
|
query. persistent's documentation of indexes is lacking and I may have
|
||||||
|
misunderstood that uniqueness constraints result in indexes being created.
|
||||||
|
|
||||||
|
Dumping the database shows this, which really doesn't seem to have an index
|
||||||
|
after all:
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS "content_identifiers"("id" INTEGER
|
||||||
|
PRIMARY KEY,"remote" BLOB NOT NULL,"cid" BLOB NOT NULL,"key" BLOB
|
||||||
|
NOT NULL,CONSTRAINT "content_indentifiers_key_remote_cid_index"
|
||||||
|
UNIQUE ("key","remote","cid"));
|
||||||
|
|
||||||
|
May need some raw sql to add it, like:
|
||||||
|
|
||||||
|
CREATE INDEX cidindex ON "content_identifiers" ("cid");
|
||||||
|
|
||||||
|
Also, I re-ran the 150000 file sync benchmark with `getContentIdentifierKeys`
|
||||||
|
disabled and it took 29:56.78, so 25% faster.
|
||||||
|
|
||||||
|
That gives me the idea for an optimisation -- it could check if the
|
||||||
|
database is empty at start and if so, avoid calling that at all. (It also
|
||||||
|
maintains a map in memory which will still allow it to detect duplicate files.)
|
||||||
|
Speeding up initial imports of a lot of files, but not later imports of a lot
|
||||||
|
of files is kind of a cop out, but..
|
||||||
|
"""]]
|
Loading…
Reference in a new issue