Home > Net >  SQL query with subquery takes longer than both queries separately
SQL query with subquery takes longer than both queries separately

Time:01-06

Problem

I have two queries where one needs the result of the other one. My first guess was to use an independent subquery:

SELECT P2.*
FROM ExampleTable P2
WHERE P2.delivery_start >= (
    SELECT MIN(P1.delivery_start)
    FROM ExampleTable P1
    WHERE 1641288602 < P1.delivery_end
);

The entire query takes 5-6 seconds which is way to long for my application. Running these queries after another takes only around 800ms for both:

SELECT MIN(P1.delivery_start)
FROM ExampleTable P1
WHERE 1641288602 < P1.delivery_end;
SELECT P2.*
FROM ExampleTable P2
WHERE P2.delivery_start >= 1641286800;

I am using Mariadb 10.2 and have indices on both delivery_start and delivery_end.

What I have tried

I have used a CTE instead of a subquery which resulted in the same performance. Using a Variable with SET yields similar results as to running both queries separately, so thats what I will use for the time being.

I ran EXPLAIN on all 3 Queries:

1. Query with subquery

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY P2 ALL delivery_start NULL NULL NULL 6388282 Using where
2 SUBQUERY P1 range delivery_end delivery_end 4 NULL 36378 Using index condition

2. Separate Queries

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE P1 range delivery_end delivery_end 4 NULL 36432 Using index condition
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE P2 range delivery_start delivery_start 4 NULL 35944 Using index condition

Question

I think the issue is shown in the first EXPLAIN table as it has type ALL which means that the database performs a full table scan. My question is simply: why? Is the optimizer not able to figure out that the subquery produces a number with which we only need a range type query? And why does it not use any index?

CodePudding user response:

The problem is described in the MariaDB docs:

In all remaining cases when NULL cannot be substituted with FALSE, it is not possible to use index lookups. This is not a limitation in the server, but a consequence of the NULL semantics in the ANSI SQL standard.

The result of your subquery can potentially return a NULL in the case no rows were found. Hence, MariaDB cannot use the index for the parent query.

You must rewrite your subquery in a way that it will always return a row with a non-NULL scalar or stick with two separate queries. However, what should happen if your first query returns NULL? With a compound statement you could put an if around the second query and don't even execute it if the first returns NULL.

CodePudding user response:

Replace these

INDEX(delivery_start)
INDEX(delivery_end)

with these:

INDEX(delivery_start, delivery_end)
INDEX(delivery_end, delivery_start)

The second one will help significantly with the subquery. Then the first may help with the outer query.

(If those don't help, please add SHOW CREATE TABLE, EXPLAIN SELECT ... and table sizes.)

  •  Tags:  
  • Related