| event_id | alert_id | alert_timestamp | value |
|---|---|---|---|
| 1 | x | 2022-10-10 | -2 |
| 1 | x | 2022-10-10 | 4 |
| 1 | x | 2022-10-10 | 5 |
| 2 | z | 2022-09-02 | 3 |
I have a table that has rows that match on event_id, alert_id and alert_timestamp. I want to only keep the single row that has the smallest value in the value column where all other rows are matching on event_id, alert_id and alert_timestamp.
NOTE: this is just one example of matching rows, the table has a mix of different rows than the example given, that match on event_id, alert_id and alert_timestamp
Also if there are single rows i.e. a row that has no other row that matches on event_id, alert_id and alert_timestamp, this should be kept as is.
CodePudding user response:
That's a typical top-1-per-group problem. In Redshift you can use row_number() to solve it:
select event_id, alert_id, alert_timestamp, value
from (
select t.*,
row_number() over(partition by event_id, alert_id, alert_timestamp order by value) rn
from mytable t
) t
where rn = 1
Note that if you really had only 4 columns, then aggregation would be sufficient:
select event_id, alert_id, alert_timestamp, min(value) as value
from mytable
group by event_id, alert_id, alert_timestamp, value
