I have a table as follows:
| ID | ACTIVE_STATUS | DATE |
|---|---|---|
| 45 | TRUE | 2022-06-12 |
| 45 | TRUE | 2022-06-13 |
| 45 | FALSE | 2022-07-01 |
| 36 | TRUE | 2022-08-01 |
| 36 | FALSE | 2022-08-02 |
| 36 | FALSE | 2022-08-14 |
| 36 | TRUE | 2022-08-15 |
| 14 | TRUE | 2022-03-25 |
| 14 | TRUE | 2022-03-28 |
| 14 | TRUE | 2022-03-29 |
I would like to remove rows from the table where within each ID group, if the current ACTIVE_STATUS value is the same as the value in the previous row, then remove the current row (Basically I am keeping the rows where the ACTIVE_STATUS shows a change for each ID group).
For example for ID 45, the active status was TRUE on 2022-06-12 and stayed TRUE until it became FALSE on 2022-07-01 so I would delete the row where the status is TRUE for date 2022-06-13 since there is no change in status between that and the previous row. I currently have the data ordered by DATE per ID group. I would like the output to look like
| ID | ACTIVE_STATUS | DATE |
|---|---|---|
| 45 | TRUE | 2022-06-12 |
| 45 | FALSE | 2022-07-01 |
| 36 | TRUE | 2022-08-01 |
| 36 | FALSE | 2022-08-02 |
| 36 | TRUE | 2022-08-15 |
| 14 | TRUE | 2022-03-25 |
I currently have:
SELECT ID, ACTIVE_STATUS, DATE
FROM MY_TABLE
GROUP BY ID, ACTIVE_STATUS, DATE
ORDER BY DATE;
But I am not sure how to use lag() to achieve this or a partition? Any help would be great!
CodePudding user response:
Before filtering:
CodePudding user response:
Here's how you would use lag. You can keep the lag in the select to see what it's doing. Qualify is to Window Function what Having is to Group By.
select *
from your_table
qualify lag(active_status) over(partition by id order by date) <> active_status or
lag(active_status) over(partition by id order by date) is null
If your team is familiar with null-safe equality operators, you could use is distinct from and simplify that to
select *
from your_table
qualify lag(active_status) over(partition by id order by date) is distinct from active_status


