Return the latest record which was before the current row-CodePudding

I have a table of messages. Each message has a timestamp created_ts, a type (either purchase or support), user_id, and a primary key msg_id. E.g.,

msg_id	user_id	created_ts	type
1	1	1	purchase
2	1	2	support
3	2	3	purchase
4	2	4	support
5	2	5	support

I'd like to generate a table of support messages, such that each row will contain the details of the support message, and also the details of the last purchase (if such exists), i.e. the latest message of type purchase, that happened before this row's support message. E.g.,

support_msg_id	user_id	support_created_ts	type	last_purchase_msg_id	last_purchase_created_ts
2	1	2	support	1	1
4	2	4	support	3	3
5	2	5	support	3	3

I've tried the following query:

select * from
(select msg_id as support_msg_id, user_id, created_ts as support_created_ts, type
  (select msg_id as last_purchase_msg_id
   from messages
   where type = 'purchase' and created_ts < support_created_ts
 order by created_ts desc
limit 1)
  
 from messages
 where type = 'support'
) as supports
inner join
(
 select msg_id as last_purchase_msg_id, created_ts as last_purchase_ts
   from messages
   where type = 'purchase'
) as purchases
on supports.last_purchase_msg_id = purchases.last_purchase_msg_id

However this is too slow.

CodePudding user response：

with p as (
    select *, lead(created_ts) over (partition by user_id order by created_ts) as nextts
    from messages where type = 'purchase'
)
select *
from messages s
left outer join p on p.user_id = s.user_id
    and s.created_ts >= p.created_ts and (s.created_ts < p.nextts or p.nextts is null)
where s.type = 'support'

https://dbfiddle.uk/?rdbms=postgres_10&fiddle=aea646e85dfa35896f8cacfabec439d6

CodePudding user response：

The solution for your problem is double usage of window functions. First use count function to identify bounds of groups where each group starts with purchase and continues with all subsequent supports. Then use max function to get (actually single nonnull) value among each group.

Dbfiddle is for Postgres but uses only standard SQL, I hope it will be usable for Redshift too.

with messages(msg_id,   user_id,    created_ts,     type) as (values
(1,     1,  1,  'purchase'),
(2,     1,  2,  'support'),
(3,     2,  3,  'purchase'),
(4,     2,  4,  'support'),
(5,     2,  5,  'support')
)
, precomputed as (
  select m.*
       , case m.type when 'purchase' then m.msg_id end as last_purchase_msg_id
       , case m.type when 'purchase' then m.created_ts end as last_purchase_created_ts
       , count(case m.type when 'purchase' then 1 end) over (order by m.created_ts) as grp
  from messages m
)
, lasts as (
  select g.msg_id, g.user_id, g.created_ts, g.type
       , max(g.last_purchase_msg_id) over (partition by g.grp) as last_purchase_msg_id
       , max(g.last_purchase_created_ts) over (partition by g.grp) as last_purchase_created_ts
  from precomputed g
)
select * from lasts
where type = 'support'