I'm sure this has been asked over and over, but I can't quite find a simple example that I can fully grok.
I'm trying to deduplicate (do a DISTINCT ON) by one column and COUNT records GROUPed By columns that differ from the column used to deduplicate, but without introducing subqueries.
Let's say I have a table with the following information:
| order_num | date | region | timestamp_updated |
|---|---|---|---|
| 001 | 2021-09-01 | Murica | 2021-09-02T19:00:01Z |
| 001 | 2021-09-01 | Murica | 2021-09-03T19:00:01Z |
| 002 | 2021-09-01 | Yurop | 2021-09-02T19:00:01Z |
| 003 | 2021-09-01 | Yurop | 2021-09-03T19:00:01Z |
| 004 | 2021-09-02 | Yurop | 2021-09-03T19:00:01Z |
I would like to first get unique records with different order_num (keeping the most recently updated) ones AND then count groups or orders by date and region.
De-duplicate (gets rid of the oldest
order_num='001A'):order_num date region timestamp_updated 001 2021-09-01 Murica 2021-09-03T19:00:01Z 002 2021-09-01 Yurop 2021-09-02T19:00:01Z 003 2021-09-01 Yurop 2021-09-03T19:00:01Z 004 2021-09-02 Yurop 2021-09-03T19:00:01Z Then group by and count:
date region count 2021-09-01 Murica 1 2021-09-01 Yurop 2 2021-09-02 Yurop 1
I know how to do those two things separately (distinct on(order_num) order by timestamp_updated desc) to deduplicate and then select count(*) group by date, region ) And even together with subqueries. But I'd like to try to avoid subqueries as much as possible and here's where window functions (seem) to come in handy and I don't know much anything about those.
The closest thing I've been able to get are groups, but they show one record by each order_num. The records are correct, but they are duplicated:
select distinct on (order_num) date, region, count(1)over (
partition by order_num
)
from orders_table
order by order_num, timestamp_updated desc;
That query ^^ shows:
| date | region | count | |
|---|---|---|---|
| 2021-09-01 | Murica | 1 | I think this is the first 001 |
| 2021-09-01 | Murica | 1 | I think this is the second 001 |
| 2021-09-01 | Yurop | 2 | I think this is the first Yurop: 002 |
| 2021-09-01 | Yurop | 2 | I think this is the second Yurop: 003 |
| 2021-09-02 | Yurop | 1 |
CodePudding user response:
You could get the max timestamp_updatedper per order_num, date, region and then aggregate again to get the counts per date, region using window function
select distinct
date,
region,
count(max(timestamp_updated)) over (partition by date, region) as counts
from t
group by order_num, date, region;
