What I am trying to accomplish is getting rows from one table that do not match another table based on specific filters. The two tables are relatively huge so I am trying to filter them based on a certain time range.
The steps I went through so far.
- Get the IDs from "T1" for the last 3 days
SELECT
id
FROM T1
WHERE STARTTIME BETWEEN '3 days ago' AND 'now';
Execution time is 4.5s.
- Get the IDs from "T2" for the last 3 days
SELECT
id
FROM T2
WHERE STARTTIME BETWEEN '3 days ago' AND 'now';
Execution time is 2.5s.
- Now I try to use
NOT EXISTSto merge the results from both statements into one
SELECT
CID
FROM T1
WHERE STARTTIME BETWEEN '3 days ago' AND 'now'
AND NOT EXISTS (
SELECT NULL FROM T2
WHERE T1.ID = T2.ID
AND STARTTIME BETWEEN '3 days ago' AND 'now'
);
Execution time is 23s.
I also tried the INNER JOIN logic from this answer thinking it makes sense, but I get no results so I cannot properly evaluate.
Is there a better way to construct this statement that could possibly lead to a faster execution time?
CodePudding user response:
Is there a better way to construct this statement that could possibility lead to a faster execution time?
Your basic responsibility is to write the SQL staement, the basic responsibility of Oracle is to come with an execution plan
If you are not satified (but you should know that a combination of two sources using NOT EXISTS will take longer that the sum of the time to extract the data from the sources) your fist step should be to verify the execution plan (and not try to rewrite the statement).
See some more details how to proceede here
EXPLAIN PLAN SET STATEMENT_ID = 'stmt1' into plan_table FOR
SELECT
PAD
FROM T1
WHERE STARTTIME BETWEEN date'2021-01-11' AND date'2021-01-13'
AND NOT EXISTS (
SELECT NULL FROM T2
WHERE T1.ID = T2.ID
AND STARTTIME BETWEEN date'2021-01-11' AND date'2021-01-13'
);
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'stmt1','ALL'));
This is what you should see
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1999 | 150K| 10175 (1)| 00:00:01 |
|* 1 | HASH JOIN RIGHT ANTI| | 1999 | 150K| 10175 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | T2 | 2002 | 26026 | 4586 (1)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | T1 | 4002 | 250K| 5589 (1)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
2 - filter("STARTTIME"<=TO_DATE(' 2021-01-13 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "STARTTIME">=TO_DATE(' 2021-01-11 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
3 - filter("STARTTIME"<=TO_DATE(' 2021-01-13 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "STARTTIME">=TO_DATE(' 2021-01-11 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
Note that the hash join (here anti due to the not exists) is the best way to join two large row sources. Note also that the plan does not use indexes. The reason is the same - to access large data you do not want to go over index.
Contrary to the case of low cardinality row sources (OTPL) where you expects to see index access and NESTED LOOPS ANTI.
Some times is Oracle confused (e.g. while seeing stale statistics) and decide to go the NESTED LOOPway even for large data - which leads to long elapsed time.
This should help you at least to decide if you have a problem or not.
CodePudding user response:
Perhaps a simple MINUS operation will accomplish what you are looking for:
select id
from ( select id
from t1
where starttime between '3 days ago' and 'now'
MINUS
select id
from t2
where starttime between '3 days ago' and 'now'
);
for however you actually define starttime between '3 days ago' and 'now'. This literally uses your current queries as is the MINUS operation removes from the first those values which do exist in the second and returns the result. See MINUS demo here.
