Data source
| User ID | Visit Date |
|---|---|
| 1 | 2020-01-01 12:29:15 |
| 1 | 2020-01-02 12:30:11 |
| 1 | 2020-04-01 12:31:01 |
| 2 | 2020-05-01 12:31:14 |
Problem
I got user's visit data and im trying to get their last visit for each month, have been trying to join the data source with generate_series('2021-01-01'::timestamp, '2021-12-01'::timestamp, interval '1 month') but what i got is only 1 user for each month, do you guys have an idea how to achieve this
Expected Result
| Month | User ID | Visit Date |
|---|---|---|
| 1 | 1 | 2020-01-01 12:29:15 |
| 2 | 1 | null |
| 3 | 1 | null |
| 4 | 1 | 2020-04-01 12:31:01 |
| .... | ||
| 12 | 1 | null |
| 1 | 2 | null |
| ... | ||
| 5 | 2 | 2020-05-01 12:31:14 |
| ... and so on |
and i need advice im trying to do sub query for this result to mark user as retention if he havent visit back like below result, if u guys have better query to do this it will be appreciate
| Month | User ID | Type |
|---|---|---|
| 1 | 1 | FIRST |
| 2 | 1 | RETENTION |
| 3 | 1 | RETENTION |
| 4 | 1 | REACTIVATE |
| .... | ||
| 12 | 1 | null |
| 1 | 2 | null |
| ... | ||
| 5 | 2 | FIRST |
| 6 | 2 | RETENTION |
| 7 | 2 | RETENTION |
| 8 | 2 | RETENTION |
| 9 | 2 | null |
| ... and so on |
CodePudding user response:
You can generate the rows using a cross join. Then you can get the last visit in various ways:
select u.user_id, gs.yyyymm, s.last_visit_date
from (select distinct user_id from source s) u cross join
generate_series('2021-01-01'::timestamp, '2021-12-01'::timestamp, interval '1 month'
) gs(yyyymm) left join lateral
(select max(s.visit_date) as last_visit_date
from source s
where s.user_id = u.user_id and
s.visit_date >= gs.yyyymm and
s.visit_date < gs.yyyymm interval '1 month'
) s
on 1=1;
