Home > Blockchain >  SQLite queryslow when using index
SQLite queryslow when using index

Time:01-22

I have a table indexed on a text column, and I want all my queries to return results ordered by name without any performance hit. Table has around 1 million rows if it matters.

Table -

CREATE TABLE table (Name text)

Index -

CREATE INDEX "NameIndex" ON "Files" (
    "Name" COLLATE nocase   ASC
);

Query 1 -

select * from table where Name like "%a%"

Query plan, as expected a full scan -

SCAN TABLE table

Time -

Result: 179202 rows returned in 53ms

Query 2, now using order by to read from index -

select * from table where Name like "%a%" order by Name collate nocase

Query plan, scan using index -

SCAN TABLE table USING INDEX NameIndex

Time -

Result: 179202 rows returned in 672ms

Used DB Browser for SQLite to get the information above, with default Pragmas.

I'd assume scanning the index would be as performant as scanning the table, is it not the case or am I doing something wrong?

Another interesting thing I noticed, that may be relevant -

Query 3 -

select * from table where Name like "a%"
Result: 23026 rows returned in 9ms

Query 4 -

select * from table where name like "a%" order by name collate nocase
Result: 23026 rows returned in 101ms

And both has them same query plan -

SEARCH TABLE table USING INDEX NameIndex (Name>? AND Name<?)

Is this expected? I'd assume the performance be the same if the plan was the same.

Thanks!

EDIT - The reason the query is slower was because I used select * and not select name, causing SQLite to go between the table and the index.

The solution was to use clustered index, thanks @Tomalak for helping me find it - create table mytable (a text, b text, primary key (a,b)) without rowid The table will be ordered by default using a b combination, meaning that full scan queries will be much faster (now 90ms).

CodePudding user response:

A LIKE pattern that starts with % can never use an index. It will always result in a full table scan (or index scan, if the query can be covered by the index itself).

It's logical when you think about it. Indexes are not magic. They are sorted lists of values, exactly like a keyword index in a book, and that means they are only only quick for looking up a word if you know how the given word starts. If you're searching for the middle part of a word, you would have to look at every index entry in a book as well.


Conclusion from the ensuing discussion in the comments:

The best course of action to get a table that always sorts by a non-unique column without a performance penalty is to create it without ROWID, and turn it into a clustering index over a the column in question plus a second column that makes the combination unique:

CREATE TABLE MyTable (
    Name   TEXT COLLATE NOCASE,
    Id     INTEGER,
    Other  TEXT,
    Stuff  INTEGER,
    PRIMARY KEY(Name, Id)  -- this will sort the whole table by Name
) WITHOUT ROWID;

This will result in a performance penalty for INSERT/UPDATE/DELETE operations, but in exchange sorting will be free since the table is already ordered.

  •  Tags:  
  • Related