Home > Software design >  SQL Oracle script optimization
SQL Oracle script optimization

Time:01-24

I have a table TRANSACTIONS with almost 30 million transactions (13 COLUMNS). How Can I optimize following code? I tried with self join but it seemed to be less effective.

Logic: I want to get last transactions by sender-receiver_2 if receiver_2 exists, else by sender-receiver calculate some statistics (10/30/90 days)

SELECT T.* FROM
(SELECT T.*, row_number() over (partition by T.SENDER, (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END) order by T.DATE_ACCEPT desc) as seqnum 
FROM 
(
SELECT T.*
      ,(SELECT COUNT(DISTINCT T2.ID_TRAN)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10  AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        ) CNT_10
      ,(SELECT COUNT(DISTINCT T2.ID_TRAN)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) CNT_30
      ,(SELECT COUNT(DISTINCT T2.ID_TRAN)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90  AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) CNT_90 
        ,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
             AND
              T2.SENDER = T.SENDER
        GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        ) AVG_AMOUNT_10
      ,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        ) AVG_AMOUNT_30
        ,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        ) AVG_AMOUNT_90
        ,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        ) MAX_AMOUNT_10
        ,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) MAX_AMOUNT_30
        ,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) MAX_AMOUNT_90
FROM TRANSACTIONS T
) T ) T
WHERE T.SEQNUM = 1

Also I created index on (SENDER, DATE_ACCEPT).

Query plan

TABLE EXAMPLE

CodePudding user response:

The major problem in your query is the CASE in predicates. It invalidates the usage of any index. Therefore you need to use a virtual column:

ALTER TABLE Transactions ADD rec AS (
     CASE WHEN RECEIVER_2 IS NULL 
     THEN RECEIVER ELSE RECEIVER_2 END
);

The second step is to create an index with this column:

CREATE INDEX ix_transactions_sender_rec 
    ON Transactions(sender, rec, date_accept)

However, the index may not be used due to the query syntax. Replace the CASE syntax with the newly created column rec and also rewrite the greatest per group solution into a self-join. I add the reduced SQL example of how to do it.

select t.*,
    (
           select count(DISTINCT T2.id_tran)
           from transactions T2
           where T2.date_accept > T.date_accept - 10
                 AND T2.date_accept < T.date_accept
                 AND T2.rec = T.rec
                 AND T2.sender = T.sender
    ) CNT_10
from (
    select sender, rec, max(date_accept)
    from transactions
    group sender, rec
) tmax 
join transactions t on t.sender = tmax.sender and
                       t.rec = tmax.rec and
                       t.date_accept = tmax.date_accept

And if you want your statistical subqueries super fast, than add also other columns used in them:

CREATE INDEX ix_transactions_sender_rec 
    ON Transactions(sender, rec, date_accept, id_tran, amount)

CodePudding user response:

Having an index on (SENDER, DATE_ACCEPT) will probably help.

And you can simplify & accelarate the query by using one LATERAL JOIN.
It allows to calculate more than 1 COUNT/AVG/MAX.

For example:

SELECT T.*, LT.*
FROM (
  SELECT SENDER
    , RECEIVER, RECEIVER_2
    , DATE_ACCEPT
    , AMOUNT, AMOUNT_2 
  FROM (
    SELECT SENDER
    , RECEIVER, RECEIVER_2
    , DATE_ACCEPT
    , AMOUNT, AMOUNT_2 
    , ROW_NUMBER() OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT DESC) AS RN
    FROM TRANSACTIONS
  ) TRANS
  WHERE RN = 1
) T
CROSS JOIN LATERAL (
  SELECT 
    COUNT(DISTINCT
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN T2.ID_TRAN
    END) AS CNT_10
  , COUNT(DISTINCT
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN T2.ID_TRAN
    END) AS CNT_30
  , COUNT(DISTINCT
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN T2.ID_TRAN
    END) AS CNT_90
  , NVL(AVG(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS AVG_AMOUNT_10
  , NVL(AVG(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS AVG_AMOUNT_30
  , NVL(AVG(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS AVG_AMOUNT_90
  , NVL(MAX(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS MAX_AMOUNT_10
  , NVL(MAX(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS MAX_AMOUNT_30
  , NVL(MAX(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS MAX_AMOUNT_90
  FROM TRANSACTIONS T2
  WHERE T2.SENDER = T.SENDER 
    AND T2.DATE_ACCEPT > T.DATE_ACCEPT - 90
    AND NVL(T2.RECEIVER_2, T2.RECEIVER) = NVL(T.RECEIVER_2, T.RECEIVER)
) LT;
SENDER RECEIVER RECEIVER_2 DATE_ACCEPT AMOUNT AMOUNT_2 CNT_10 CNT_30 CNT_90 AVG_AMOUNT_10 AVG_AMOUNT_30 AVG_AMOUNT_90 MAX_AMOUNT_10 MAX_AMOUNT_30 MAX_AMOUNT_90
1 2 3 30-MAR-21 10 20 1 2 3 11.2 21.65 45.5 11.2 32.1 93.2

Demo on db<>fiddle here

CodePudding user response:

Are you aware of the Analytic Functions Windowing Clause?

I don't get the logic of your query, but I guess it might be possible without any self-joins. Have a look this query, it could be a starting point:

SELECT 
    COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW) AS CNT_10,
    COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS CNT_30,
    COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND CURRENT ROW) AS CNT_90,
    AVG(NVL(T.AMOUNT_2, T.AMOUNT)) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS AVG_30,
    AVG(NVL2(T.RECEIVER_2, T.AMOUNT_2, T.AMOUNT)) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND CURRENT ROW) AS AVG_90
FROM TRANSACTIONS

Note, RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW) is equal to RANGE INTERVAL '10' DAY PRECEDING)

Another note, when I run your query on the sample data, then I get

 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
|ID_TRAN|SENDER|RECEIVER|RECEIVER_2|AMOUNT|AMOUNT_2|DATE_ACCEPT        |CNT_10|CNT_30|CNT_90|AVG_AMOUNT_10|AVG_AMOUNT_30|AVG_AMOUNT_90|MAX_AMOUNT_10|MAX_AMOUNT_30|MAX_AMOUNT_90|SEQNUM|
 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
|1      |00010 |22222   |1112      |3000  |1000    |16.04.2021 14:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
|1      |00010 |22222   |2114      |3000  |2000    |16.04.2021 14:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
|2      |01236 |45872   |          |4000  |        |01.04.2021 22:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
|3      |45872 |00010   |          |5000  |        |17.04.2021 14:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

which looks quite pointless.

  •  Tags:  
  • Related