I have a table similar to the below. My goal is to remove groups for each date where the status moves to either 'Cancelled' or 'Failed', while retaining groups per day that contain other status changes.
| Group | Status | Date |
|---|---|---|
| A | Pending | 2021-01-01 08:00:00 |
| A | Cancelled | 2021-01-01 13:00:00 |
| A | Pending | 2021-01-02 08:00:00 |
| A | Failed | 2021-01-02 13:00:00 |
| A | Pending | 2021-01-03 08:00:00 |
| A | Pending Settlement | 2021-01-03 13:00:00 |
| A | Pending | 2021-01-04 08:00:00 |
| A | Settled | 2021-01-04 13:00:00 |
| B | Pending | 2021-01-01 08:00:00 |
| B | Cancelled | 2021-01-01 13:00:00 |
| B | Pending | 2021-01-02 08:00:00 |
| B | Failed | 2021-01-02 13:00:00 |
| B | Pending | 2021-01-03 08:00:00 |
| B | Pending Settlement | 2021-01-03 13:00:00 |
| B | Pending | 2021-01-04 08:00:00 |
| B | Settled | 2021-01-04 13:00:00 |
My first attempt was something like:
select GROUP, STATUS, DATE
from TABLE TBL
, (
select GROUP, STATUS, DATE
from TABLE
where STATUS in ('Cancelled','Failed')
) FLAG
where (TBL.GROUP <> FLAG.GROUP and TBL.DATE <> FLAG.DATE)
;
My expected output is shown below EDIT:, however it seems to be taking exceptionally long (>10 mins) even when applying date filters:
| Group | Status | Date |
|---|---|---|
| A | Pending | 2021-01-03 08:00:00 |
| A | Pending Settlement | 2021-01-03 13:00:00 |
| A | Pending | 2021-01-04 08:00:00 |
| A | Settled | 2021-01-04 13:00:00 |
| B | Pending | 2021-01-03 08:00:00 |
| B | Pending Settlement | 2021-01-03 13:00:00 |
| B | Pending | 2021-01-04 08:00:00 |
| B | Settled | 2021-01-04 13:00:00 |
CodePudding user response:
You may use the last_value() window function to get the last value within a group and then apply your filter against it.
SELECT "GROUP",
"STATUS",
"DATE"
FROM (SELECT "GROUP",
"STATUS",
"DATE",
last_value("STATUS") OVER (PARTITION BY "GROUP",
trunc("DATE")
ORDER BY "DATE" ASC) lv
FROM "TABLE") x
WHERE lv NOT IN ('Cancelled',
'Failed');
Edit:
To filter out days where the status was 'Cancelled' or 'Failed' anytime during the day, you can use for example the windowed version of of count() with a CASE expression that gives a non-NULL value when the status is 'Cancelled' or 'Failed', or NULL (the default) otherwise.
SELECT "GROUP",
"STATUS",
"DATE"
FROM (SELECT "GROUP",
"STATUS",
"DATE",
count(CASE
WHEN "STATUS" IN ('Cancelled',
'Failed') THEN
0
END) OVER (PARTITION BY "GROUP",
trunc("DATE")) c
FROM "TABLE") x
WHERE c = 0;
