I have two date columns in BQ table. pageview_date and edited_date, as well as id column. I need to output the data - row by row - and for each record I want to grab a value from edited_date column that is the latest date in that column BUT is not later than the pageview_date value itself. And if both dates are equal then leave it as is. It also has to correspond with the ids. Data looks like this:
id pageview_date edited_date
A 03/01/22 02/28/22
A 03/01/22 02/02/22
A 03/01/22 02/02/22
B 03/01/22 01/01/22
B 03/01/22 01/01/22
B 03/01/22 01/31/22
C 03/01/22 04/01/22
C 03/01/22 03/25/22
C 03/01/22 03/01/22
Desired output is:
id pageview_date edited_date
A 03/01/22 02/28/22
A 03/01/22 02/28/22
A 03/01/22 02/28/22
B 03/01/22 01/31/22
B 03/01/22 01/31/22
B 03/01/22 01/31/22
C 03/01/22 03/01/22
C 03/01/22 03/01/22
C 03/01/22 03/01/22
CodePudding user response:
An approach is use the MAX window function in the edited_date column partitioned by the id:
with sample as (
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-02-28') as edited_date
UNION ALL
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-03-28') as edited_date
UNION ALL
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-01-28') as edited_date
)
SELECT
id,
pageview_date,
MAX(IF(edited_date <= pageview_date, edited_date, null)) OVER (PARTITION BY id) as new_edited_date
FROM sample
Note if there's no edited_date before an pageview_date, the new_edited_date will be null.
