I have a list of countries, and for each country I have a table like this (first column: user, second column: user_type, third column: date):
| user | user_type | date |
|---|---|---|
| user1 | National | 2022-10-01 |
| user1 | National | 2022-10-01 |
| user2 | National | 2022-10-01 |
| user2 | International | 2022-10-01 |
| user3 | National | 2022-10-02 |
| user1 | Unknown | 2022-10-02 |
| user1 | National | 2022-10-03 |
I would like to get all the distinct users of each type, in one row per day, like this (4 columns: date, first_user_type, second_user_type, third_user_type):
| date | first_user_type | second_user_type | third_user_type |
|---|---|---|---|
| 2022-10-01 | 2 | 1 | 0 |
| 2022-10-02 | 1 | 0 | 1 |
| 2022-10-03 | 1 | 0 | 0 |
However, If I run this query:
SELECT
date, user_type, COUNT(distinct user) as num_users
FROM "country"."table"
WHERE
date between '2022-10-01' AND '2022-10-03'
GROUP BY date, user_type
ORDER BY date, user_type
Then I obtain the correct results but with the same date in different rows, like this:
| date | user_type | num_users |
|---|---|---|
| 2022-10-01 | National | 2 |
| 2022-10-01 | International | 1 |
| 2022-10-01 | Unknown | 0 |
| 2022-10-02 | National | 1 |
| 2022-10-02 | International | 0 |
| 2022-10-02 | Unknown | 1 |
| 2022-10-03 | National | 1 |
| 2022-10-03 | International | 0 |
| 2022-10-03 | Unknown | 0 |
But there is an additional difficulty: there are 3 user_types globally: "National", "International" and "Unknown", but the problem is that some countries only have "National", or "National" and "Unknown" (no "International"), and I would like the result to appear with a 0 (even if that user_type do not exist in that country).
The query should be the same for all conutries, only changing "country".
(P.S: the query should be run into AWS Athena)
Any ideas?
CodePudding user response:
Conditional aggregation should do it here:
SELECT date_,
COUNT(DISTINCT IF(user_type = 'National' , user_, NULL)) AS numNational,
COUNT(DISTINCT IF(user_type = 'International', user_, NULL)) AS numInternational,
COUNT(DISTINCT IF(user_type = 'Unknown' , user_, NULL)) AS numUnknown
FROM tab
GROUP BY date_
Check the demo here.
CodePudding user response:
SELECT date,
SUM(CASE WHEN user_type = 'National' THEN 1 ELSE 0 END) AS National,
SUM(CASE WHEN user_type = 'International' THEN 1 ELSE 0 END) AS International,
SUM(CASE WHEN user_type = 'Unknown' THEN 1 ELSE 0 END) AS Unknown
FROM "country"."table"
WHERE date BETWEEN '2022-10-01' AND '2022-10-03'
GROUP BY date
I hope this helps!
