make sure all sqlite selects have indexes

Bearing in mind that these indexes are really uniqueness constraints
that just happen to also make sqlite generate indexes.

In Database.ContentIndentifier, the ContentIndentifiersKeyRemoteCidIndex
is fine as a uniqueness constraint because it contains all rows from the
table. The ContentIndentifiersCidRemoteIndex is also ok because there
can only be one key for a given (cid, uuid) combination.

In Database.Export, the new ExportTreeFileKeyIndex is the same pair as
the old ExportTreeKeyFileIndex (previously ExportTreeIndex). And
in Database.Keys.SQL, the new InodeCacheKeyIndex is the same pair as the
old KeyInodeCacheIndex.
This commit is contained in:
Joey Hess 2019-10-30 13:40:29 -04:00
parent 3732f27722
commit 9085a2cfec
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 10 additions and 19 deletions

View file

@ -2,24 +2,14 @@ Collection of non-ideal things about git-annex's use of sqlite databases.
Would be good to improve these sometime, but it would need a migration
process.
* Database.Keys.SQL.isInodeKnown seems likely to get very slow
when there are a lot of unlocked annexed files. It needs
an index in the database, eg "InodeIndex cache"
It also has to do some really ugly SQL LIKE queries. Probably an index
would not speed them up. They're only needed when git-annex detects
inodes are not stable, eg on fat or probably windows. A better database
* Database.Keys.SQL.isInodeKnown has some really ugly SQL LIKE queries.
Probably an index would not speed them up. They're only needed when
git-annex detects inodes are not stable, eg on fat or probably windows.
A better database
schema should be able to eliminate the need for those LIKE queries.
Eg, store the size and allowable mtimes in a separate table that is
queried when necessary.
* Database.Export.getExportedKey would be faster if there was an index
in the database, eg "ExportedIndex file key". This only affects
the speed of `git annex export`, which is probably swamped by the actual
upload of the data to the remote.
* There may be other selects elsewhere that are not indexed.
* Database.Types has some suboptimal encodings for Key and InodeCache.
They are both slow due to being implemented using String
(which may be fixable w/o changing the DB schema),