Problem
I have two queries where one needs the result of the other one. My first guess was to use an independent subquery:
SELECT P2.*
FROM ExampleTable P2
WHERE P2.delivery_start >= (
SELECT MIN(P1.delivery_start)
FROM ExampleTable P1
WHERE 1641288602 < P1.delivery_end
);
The entire query takes 5-6 seconds which is way to long for my application. Running these queries after another takes only around 800ms for both:
SELECT MIN(P1.delivery_start)
FROM ExampleTable P1
WHERE 1641288602 < P1.delivery_end;
SELECT P2.*
FROM ExampleTable P2
WHERE P2.delivery_start >= 1641286800;
I am using Mariadb 10.2 and have indices on both delivery_start and delivery_end.
What I have tried
I have used a CTE instead of a subquery which resulted in the same performance. Using a Variable with SET yields similar results as to running both queries separately, so thats what I will use for the time being.
I ran EXPLAIN on all 3 Queries:
1. Query with subquery
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|---|---|---|---|---|---|---|---|---|---|
| 1 | PRIMARY | P2 | ALL | delivery_start | NULL | NULL | NULL | 6388282 | Using where |
| 2 | SUBQUERY | P1 | range | delivery_end | delivery_end | 4 | NULL | 36378 | Using index condition |
2. Separate Queries
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|---|---|---|---|---|---|---|---|---|---|
| 1 | SIMPLE | P1 | range | delivery_end | delivery_end | 4 | NULL | 36432 | Using index condition |
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|---|---|---|---|---|---|---|---|---|---|
| 1 | SIMPLE | P2 | range | delivery_start | delivery_start | 4 | NULL | 35944 | Using index condition |
Question
I think the issue is shown in the first EXPLAIN table as it has type ALL which means that the database performs a full table scan. My question is simply: why? Is the optimizer not able to figure out that the subquery produces a number with which we only need a range type query? And why does it not use any index?
CodePudding user response:
The problem is described in the MariaDB docs:
In all remaining cases when NULL cannot be substituted with FALSE, it is not possible to use index lookups. This is not a limitation in the server, but a consequence of the NULL semantics in the ANSI SQL standard.
The result of your subquery can potentially return a NULL in the case no rows were found. Hence, MariaDB cannot use the index for the parent query.
You must rewrite your subquery in a way that it will always return a row with a non-NULL scalar or stick with two separate queries. However, what should happen if your first query returns NULL? With a compound statement you could put an if around the second query and don't even execute it if the first returns NULL.
CodePudding user response:
Replace these
INDEX(delivery_start)
INDEX(delivery_end)
with these:
INDEX(delivery_start, delivery_end)
INDEX(delivery_end, delivery_start)
The second one will help significantly with the subquery. Then the first may help with the outer query.
(If those don't help, please add SHOW CREATE TABLE, EXPLAIN SELECT ... and table sizes.)
