I have a table as follows:
create table users as(sent_at date, user_email varchar(128), score int, quota int);
insert into users
select '2022-01-18','[email protected]',50,200
union
select '2022-01-15','[email protected]',34,400;
I'd like to get result set as:
select '2022-01-19','[email protected]',85,100 union
select '2022-01-18','[email protected]',50,200 union
select '2022-01-17','[email protected]',0,200 union --Missing date with 0 score but quota from prior line
select '2022-01-16','[email protected]',0,200 union --Missing date with 0 score but quota from prior line
select '2022-01-15','[email protected]',34,400;
I already have a table which has all the dates, e.g. schema.dates. There are thousands of users in the table for which I need to fill in these gaps. I tried bunch of solutions but nothing is working. How can I achieve this?
CodePudding user response:
This is an attempt. It does work if one user is added; I would need sample data for several users, with sent_at dates on different days to see if it works with many users.
WITH
-- your users table after inserting the two rows you insert,
-- and with a previous row dated '2022-01-19' - don't use in query
users(sent_at,user_email,score,quota) AS (
SELECT DATE '2022-01-19','[email protected]',85,100
UNION ALL SELECT DATE '2022-01-18','[email protected]',50,200
UNION ALL SELECT DATE '2022-01-15','[email protected]',34,400
)
,
-- a cutout of the calendar table you mentioned - don't use in query either
calendar(dt) AS (
SELECT DATE '2022-01-14'
UNION ALL SELECT DATE '2022-01-15'
UNION ALL SELECT DATE '2022-01-16'
UNION ALL SELECT DATE '2022-01-17'
UNION ALL SELECT DATE '2022-01-18'
UNION ALL SELECT DATE '2022-01-19'
UNION ALL SELECT DATE '2022-01-20'
)
-- real query starts here, replace following comma with "WITH" ...
,
-- can be done differently, but I like determining filters early ..
limits AS (
SELECT
MIN(sent_at) AS mindt
, MAX(sent_at) AS maxdt
FROM users
)
,
-- fill the time gaps with a left join from calendar to users table
gaps_filled AS (
SELECT
dt AS sent_at
, user_email
, NVL(score,0) AS score
, quota
FROM calendar
LEFT JOIN users ON calendar.dt=users.sent_at
)
-- use LAST_VALUE( .. IGNORE NULLS) to fill the NULL gaps from earlier rows
-- CROSS JOIN with the in-line limits table to get the filter dates
SELECT
sent_at
, LAST_VALUE(gaps_filled.user_email IGNORE NULLS)
OVER(ORDER BY sent_at DESC) AS user_email
, NVL(score,0) AS score
, LAST_VALUE(quota IGNORE NULLS)
OVER(ORDER BY sent_at DESC) AS quota
FROM gaps_filled
CROSS JOIN limits
WHERE sent_at BETWEEN mindt AND maxdt
ORDER BY user_email, sent_at DESC;
-- out Null display is "(null)".
-- out sent_at | user_email | score | quota
-- out ------------ ---------------- ------- -------
-- out 2022-01-19 | [email protected] | 85 | 100
-- out 2022-01-18 | [email protected] | 50 | 200
-- out 2022-01-17 | [email protected] | 0 | 200
-- out 2022-01-16 | [email protected] | 0 | 200
-- out 2022-01-15 | [email protected] | 34 | 400
CodePudding user response:
Here is a query that would work. Start by cross joining all combinations of dates and users (add filters as needed), then left join the users table and calculate quota using the last_value() function (note that if you are using Snowflake, you must specify "rows between unbounded preceding and current row" as documented here):
with all_dates_users as (
--all combinations of dates and users
select date, user
from dates
cross join (select distinct user_email as user from users)
),
joined as (
--left join users table to the previous
select DU.date, DU.user, U.sent_at, U.user_email, U.score, U.quota
from all_dates_users DU
left join users U on U.sent_at = DU.date and U.user_email = DU.user
)
--calculate quota as previous quota using last_value() function
select date, user, nvl(score, 0) as score, last_value(quota) ignore nulls over (partition by user order by date desc rows between unbounded preceding and current row) as quota
from joined
order by date desc;
