I have a table of messages. Each message has a timestamp created_ts, a type (either purchase or support), user_id, and a primary key msg_id. E.g.,
| msg_id | user_id | created_ts | type |
|---|---|---|---|
| 1 | 1 | 1 | purchase |
| 2 | 1 | 2 | support |
| 3 | 2 | 3 | purchase |
| 4 | 2 | 4 | support |
| 5 | 2 | 5 | support |
I'd like to generate a table of support messages, such that each row will contain the details of the support message, and also the details of the last purchase (if such exists), i.e. the latest message of type purchase, that happened before this row's support message. E.g.,
| support_msg_id | user_id | support_created_ts | type | last_purchase_msg_id | last_purchase_created_ts |
|---|---|---|---|---|---|
| 2 | 1 | 2 | support | 1 | 1 |
| 4 | 2 | 4 | support | 3 | 3 |
| 5 | 2 | 5 | support | 3 | 3 |
I've tried the following query:
select * from
(select msg_id as support_msg_id, user_id, created_ts as support_created_ts, type
(select msg_id as last_purchase_msg_id
from messages
where type = 'purchase' and created_ts < support_created_ts
order by created_ts desc
limit 1)
from messages
where type = 'support'
) as supports
inner join
(
select msg_id as last_purchase_msg_id, created_ts as last_purchase_ts
from messages
where type = 'purchase'
) as purchases
on supports.last_purchase_msg_id = purchases.last_purchase_msg_id
However this is too slow.
CodePudding user response:
with p as (
select *, lead(created_ts) over (partition by user_id order by created_ts) as nextts
from messages where type = 'purchase'
)
select *
from messages s
left outer join p on p.user_id = s.user_id
and s.created_ts >= p.created_ts and (s.created_ts < p.nextts or p.nextts is null)
where s.type = 'support'
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=aea646e85dfa35896f8cacfabec439d6
CodePudding user response:
The solution for your problem is double usage of window functions. First use count function to identify bounds of groups where each group starts with purchase and continues with all subsequent supports. Then use max function to get (actually single nonnull) value among each group.
Dbfiddle is for Postgres but uses only standard SQL, I hope it will be usable for Redshift too.
with messages(msg_id, user_id, created_ts, type) as (values
(1, 1, 1, 'purchase'),
(2, 1, 2, 'support'),
(3, 2, 3, 'purchase'),
(4, 2, 4, 'support'),
(5, 2, 5, 'support')
)
, precomputed as (
select m.*
, case m.type when 'purchase' then m.msg_id end as last_purchase_msg_id
, case m.type when 'purchase' then m.created_ts end as last_purchase_created_ts
, count(case m.type when 'purchase' then 1 end) over (order by m.created_ts) as grp
from messages m
)
, lasts as (
select g.msg_id, g.user_id, g.created_ts, g.type
, max(g.last_purchase_msg_id) over (partition by g.grp) as last_purchase_msg_id
, max(g.last_purchase_created_ts) over (partition by g.grp) as last_purchase_created_ts
from precomputed g
)
select * from lasts
where type = 'support'
