Home > Back-end >  How to fill in missing dates
How to fill in missing dates

Time:01-23

I have a table as follows:

create table users as(sent_at date, user_email varchar(128), score int, quota int);

insert into users 
select '2022-01-18','[email protected]',50,200 
union
select '2022-01-15','[email protected]',34,400;

I'd like to get result set as:

select '2022-01-19','[email protected]',85,100 union
select '2022-01-18','[email protected]',50,200 union
select '2022-01-17','[email protected]',0,200 union        --Missing date with 0 score but quota from prior line
select '2022-01-16','[email protected]',0,200 union        --Missing date with 0 score but quota from prior line
select '2022-01-15','[email protected]',34,400;

I already have a table which has all the dates, e.g. schema.dates. There are thousands of users in the table for which I need to fill in these gaps. I tried bunch of solutions but nothing is working. How can I achieve this?

CodePudding user response:

This is an attempt. It does work if one user is added; I would need sample data for several users, with sent_at dates on different days to see if it works with many users.

WITH
-- your users table after inserting the two rows you insert, 
-- and with a previous row dated '2022-01-19' - don't use in query
users(sent_at,user_email,score,quota) AS (
            SELECT DATE '2022-01-19','[email protected]',85,100
  UNION ALL SELECT DATE '2022-01-18','[email protected]',50,200
  UNION ALL SELECT DATE '2022-01-15','[email protected]',34,400
)
,
-- a cutout of the calendar table you mentioned - don't use in query either
calendar(dt) AS (
            SELECT DATE '2022-01-14'
  UNION ALL SELECT DATE '2022-01-15'
  UNION ALL SELECT DATE '2022-01-16'
  UNION ALL SELECT DATE '2022-01-17'
  UNION ALL SELECT DATE '2022-01-18'
  UNION ALL SELECT DATE '2022-01-19'
  UNION ALL SELECT DATE '2022-01-20'
)
-- real query starts here, replace following comma with "WITH" ...
,
-- can be done differently, but I like determining filters early ..
limits AS (
  SELECT
    MIN(sent_at) AS mindt
  , MAX(sent_at) AS maxdt
  FROM users
)
,
-- fill the time gaps with a left join from calendar to users table
gaps_filled AS (
  SELECT
    dt AS sent_at
  , user_email
  , NVL(score,0) AS score
  , quota
  FROM calendar
  LEFT JOIN users ON calendar.dt=users.sent_at
)
-- use LAST_VALUE( .. IGNORE NULLS) to fill the NULL gaps from earlier rows
-- CROSS JOIN with the in-line limits table to get the filter dates
SELECT
  sent_at
, LAST_VALUE(gaps_filled.user_email IGNORE NULLS) 
     OVER(ORDER BY sent_at DESC) AS user_email 
, NVL(score,0)  AS score
, LAST_VALUE(quota IGNORE NULLS) 
     OVER(ORDER BY sent_at DESC) AS quota 
FROM gaps_filled
CROSS JOIN limits
WHERE sent_at BETWEEN mindt AND maxdt
ORDER BY user_email, sent_at DESC;
-- out Null display is "(null)".
-- out   sent_at   |   user_email   | score | quota 
-- out ------------ ---------------- ------- -------
-- out  2022-01-19 | [email protected] |    85 |   100
-- out  2022-01-18 | [email protected] |    50 |   200
-- out  2022-01-17 | [email protected] |     0 |   200
-- out  2022-01-16 | [email protected] |     0 |   200
-- out  2022-01-15 | [email protected] |    34 |   400

CodePudding user response:

Here is a query that would work. Start by cross joining all combinations of dates and users (add filters as needed), then left join the users table and calculate quota using the last_value() function (note that if you are using Snowflake, you must specify "rows between unbounded preceding and current row" as documented here):

with all_dates_users as (
--all combinations of dates and users
select date, user
from dates
cross join (select distinct user_email as user from users)
),
joined as (
--left join users table to the previous
select DU.date, DU.user, U.sent_at, U.user_email, U.score, U.quota
from all_dates_users DU
left join users U on U.sent_at = DU.date and U.user_email = DU.user
)
--calculate quota as previous quota using last_value() function
select date, user, nvl(score, 0) as score, last_value(quota) ignore nulls over (partition by user order by date desc rows between unbounded preceding and current row) as quota
from joined
order by date desc;
  •  Tags:  
  • Related