comment
This commit is contained in:
parent
f6dd34ca81
commit
b40b368857
1 changed files with 33 additions and 0 deletions
|
@ -0,0 +1,33 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 8"""
|
||||
date="2023-06-02T15:25:12Z"
|
||||
content="""
|
||||
@jgoerzen if you want to take a look at the sql, see
|
||||
`Database/ContentIdentifier.hs`. `getContentIdentifierKeys` is the query
|
||||
that it's running on each file. I'm not really sure right now if the
|
||||
persistent schema in there actually creates an index that is used for that
|
||||
query. persistent's documentation of indexes is lacking and I may have
|
||||
misunderstood that uniqueness constraints result in indexes being created.
|
||||
|
||||
Dumping the database shows this, which really doesn't seem to have an index
|
||||
after all:
|
||||
|
||||
CREATE TABLE IF NOT EXISTS "content_identifiers"("id" INTEGER
|
||||
PRIMARY KEY,"remote" BLOB NOT NULL,"cid" BLOB NOT NULL,"key" BLOB
|
||||
NOT NULL,CONSTRAINT "content_indentifiers_key_remote_cid_index"
|
||||
UNIQUE ("key","remote","cid"));
|
||||
|
||||
May need some raw sql to add it, like:
|
||||
|
||||
CREATE INDEX cidindex ON "content_identifiers" ("cid");
|
||||
|
||||
Also, I re-ran the 150000 file sync benchmark with `getContentIdentifierKeys`
|
||||
disabled and it took 29:56.78, so 25% faster.
|
||||
|
||||
That gives me the idea for an optimisation -- it could check if the
|
||||
database is empty at start and if so, avoid calling that at all. (It also
|
||||
maintains a map in memory which will still allow it to detect duplicate files.)
|
||||
Speeding up initial imports of a lot of files, but not later imports of a lot
|
||||
of files is kind of a cop out, but..
|
||||
"""]]
|
Loading…
Reference in a new issue