I have a table TRANSACTIONS with almost 30 million transactions (13 COLUMNS). How Can I optimize following code? I tried with self join but it seemed to be less effective.
Logic: I want to get last transactions by sender-receiver_2 if receiver_2 exists, else by sender-receiver calculate some statistics (10/30/90 days)
SELECT T.* FROM
(SELECT T.*, row_number() over (partition by T.SENDER, (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END) order by T.DATE_ACCEPT desc) as seqnum
FROM
(
SELECT T.*
,(SELECT COUNT(DISTINCT T2.ID_TRAN)
FROM TRANSACTIONS T2
WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND
T2.DATE_ACCEPT < T.DATE_ACCEPT AND
(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
(CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
AND
T2.SENDER = T.SENDER
) CNT_10
,(SELECT COUNT(DISTINCT T2.ID_TRAN)
FROM TRANSACTIONS T2
WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
T2.DATE_ACCEPT < T.DATE_ACCEPT AND
(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
(CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
AND
T2.SENDER = T.SENDER
) CNT_30
,(SELECT COUNT(DISTINCT T2.ID_TRAN)
FROM TRANSACTIONS T2
WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND
T2.DATE_ACCEPT < T.DATE_ACCEPT AND
(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
(CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
AND
T2.SENDER = T.SENDER
) CNT_90
,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
FROM TRANSACTIONS T2
WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND
T2.DATE_ACCEPT < T.DATE_ACCEPT AND
(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
(CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
AND
T2.SENDER = T.SENDER
GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
) AVG_AMOUNT_10
,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
FROM TRANSACTIONS T2
WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
T2.DATE_ACCEPT < T.DATE_ACCEPT AND
(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
(CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
AND
T2.SENDER = T.SENDER
GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
) AVG_AMOUNT_30
,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
FROM TRANSACTIONS T2
WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND
T2.DATE_ACCEPT < T.DATE_ACCEPT AND
(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
(CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
AND
T2.SENDER = T.SENDER
GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
) AVG_AMOUNT_90
,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
FROM TRANSACTIONS T2
WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND
T2.DATE_ACCEPT < T.DATE_ACCEPT AND
(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
(CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
AND
T2.SENDER = T.SENDER
) MAX_AMOUNT_10
,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
FROM TRANSACTIONS T2
WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
T2.DATE_ACCEPT < T.DATE_ACCEPT AND
(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
(CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
AND
T2.SENDER = T.SENDER
) MAX_AMOUNT_30
,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
FROM TRANSACTIONS T2
WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND
T2.DATE_ACCEPT < T.DATE_ACCEPT AND
(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
(CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
AND
T2.SENDER = T.SENDER
) MAX_AMOUNT_90
FROM TRANSACTIONS T
) T ) T
WHERE T.SEQNUM = 1
Also I created index on (SENDER, DATE_ACCEPT).
CodePudding user response:
The major problem in your query is the CASE in predicates. It invalidates the usage of any index. Therefore you need to use a virtual column:
ALTER TABLE Transactions ADD rec AS (
CASE WHEN RECEIVER_2 IS NULL
THEN RECEIVER ELSE RECEIVER_2 END
);
The second step is to create an index with this column:
CREATE INDEX ix_transactions_sender_rec
ON Transactions(sender, rec, date_accept)
However, the index may not be used due to the query syntax. Replace the CASE syntax with the newly created column rec and also rewrite the greatest per group solution into a self-join. I add the reduced SQL example of how to do it.
select t.*,
(
select count(DISTINCT T2.id_tran)
from transactions T2
where T2.date_accept > T.date_accept - 10
AND T2.date_accept < T.date_accept
AND T2.rec = T.rec
AND T2.sender = T.sender
) CNT_10
from (
select sender, rec, max(date_accept)
from transactions
group sender, rec
) tmax
join transactions t on t.sender = tmax.sender and
t.rec = tmax.rec and
t.date_accept = tmax.date_accept
And if you want your statistical subqueries super fast, than add also other columns used in them:
CREATE INDEX ix_transactions_sender_rec
ON Transactions(sender, rec, date_accept, id_tran, amount)
CodePudding user response:
Having an index on (SENDER, DATE_ACCEPT) will probably help.
And you can simplify & accelarate the query by using one LATERAL JOIN.
It allows to calculate more than 1 COUNT/AVG/MAX.
For example:
SELECT T.*, LT.* FROM ( SELECT SENDER , RECEIVER, RECEIVER_2 , DATE_ACCEPT , AMOUNT, AMOUNT_2 FROM ( SELECT SENDER , RECEIVER, RECEIVER_2 , DATE_ACCEPT , AMOUNT, AMOUNT_2 , ROW_NUMBER() OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT DESC) AS RN FROM TRANSACTIONS ) TRANS WHERE RN = 1 ) T CROSS JOIN LATERAL ( SELECT COUNT(DISTINCT CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN T2.ID_TRAN END) AS CNT_10 , COUNT(DISTINCT CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN T2.ID_TRAN END) AS CNT_30 , COUNT(DISTINCT CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN T2.ID_TRAN END) AS CNT_90 , NVL(AVG( CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END END), 0) AS AVG_AMOUNT_10 , NVL(AVG( CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END END), 0) AS AVG_AMOUNT_30 , NVL(AVG( CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END END), 0) AS AVG_AMOUNT_90 , NVL(MAX( CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END END), 0) AS MAX_AMOUNT_10 , NVL(MAX( CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END END), 0) AS MAX_AMOUNT_30 , NVL(MAX( CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END END), 0) AS MAX_AMOUNT_90 FROM TRANSACTIONS T2 WHERE T2.SENDER = T.SENDER AND T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND NVL(T2.RECEIVER_2, T2.RECEIVER) = NVL(T.RECEIVER_2, T.RECEIVER) ) LT;
| SENDER | RECEIVER | RECEIVER_2 | DATE_ACCEPT | AMOUNT | AMOUNT_2 | CNT_10 | CNT_30 | CNT_90 | AVG_AMOUNT_10 | AVG_AMOUNT_30 | AVG_AMOUNT_90 | MAX_AMOUNT_10 | MAX_AMOUNT_30 | MAX_AMOUNT_90 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 30-MAR-21 | 10 | 20 | 1 | 2 | 3 | 11.2 | 21.65 | 45.5 | 11.2 | 32.1 | 93.2 |
Demo on db<>fiddle here
CodePudding user response:
Are you aware of the Analytic Functions Windowing Clause?
I don't get the logic of your query, but I guess it might be possible without any self-joins. Have a look this query, it could be a starting point:
SELECT
COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW) AS CNT_10,
COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS CNT_30,
COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND CURRENT ROW) AS CNT_90,
AVG(NVL(T.AMOUNT_2, T.AMOUNT)) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS AVG_30,
AVG(NVL2(T.RECEIVER_2, T.AMOUNT_2, T.AMOUNT)) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND CURRENT ROW) AS AVG_90
FROM TRANSACTIONS
Note, RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW) is equal to RANGE INTERVAL '10' DAY PRECEDING)
Another note, when I run your query on the sample data, then I get
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|ID_TRAN|SENDER|RECEIVER|RECEIVER_2|AMOUNT|AMOUNT_2|DATE_ACCEPT |CNT_10|CNT_30|CNT_90|AVG_AMOUNT_10|AVG_AMOUNT_30|AVG_AMOUNT_90|MAX_AMOUNT_10|MAX_AMOUNT_30|MAX_AMOUNT_90|SEQNUM|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|1 |00010 |22222 |1112 |3000 |1000 |16.04.2021 14:01:00|0 |0 |0 | | | | | | |1 |
|1 |00010 |22222 |2114 |3000 |2000 |16.04.2021 14:01:00|0 |0 |0 | | | | | | |1 |
|2 |01236 |45872 | |4000 | |01.04.2021 22:01:00|0 |0 |0 | | | | | | |1 |
|3 |45872 |00010 | |5000 | |17.04.2021 14:01:00|0 |0 |0 | | | | | | |1 |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
which looks quite pointless.
