Im working at SNOWFLAKE database. I have that type of table (pivot product):

I need only one row for each CUSTOMER_ID with lags, somethin like this:

How can I do that? I try:
SELECT
CUSTOMER_ID,
SUM(LAG_1) AS LAG_0_1,
SUM(LAG_2) AS LAG_1_2,
SUM(LAG_3) AS LAG_2_3,
SUM(LAG_4) AS LAG_3_4,
SUM(LAG_5) AS LAG_4_5
FROM TEMP_TABLE
GROUP BY 1
But output is totaly different against my expectations. Thank you.
CodePudding user response:
If lag_1 lag_2 are in effect
,LAG(column,1) OVER (blar blar blar) AS lag_1
,LAG(column,2) OVER (blar blar blar) AS lag_1
and you want to avoid the nulls,
then use IGNORE NULLS to get the most recent row that is not null:
,LAG(column) IGNORE NULLS OVER (blar blar blar) AS lag
OR if you are wanting them if a different ORDER to can do the n-lags but COALESCE the results in the order of preference.
COALESCE(lag5, lag_3, lag_1, lag_4, lag_2) as awesome_lag_order
So if you have the data as you present, and you want the output, you can use FIRST_VALUE (or inverse the ORDER BY and use LAST_VALUE) and then reduce the results with DISTINCT like so (I added an ordering column to make things simpler for me):
SELECT DISTINCT customer_id
,first_value(lag_1) ignore nulls over (partition by customer_id order by order_id desc) AS lag_1
,first_value(lag_2) ignore nulls over (partition by customer_id order by order_id desc) AS lag_2
,first_value(lag_3) ignore nulls over (partition by customer_id order by order_id desc) AS lag_3
,first_value(lag_4) ignore nulls over (partition by customer_id order by order_id desc) AS lag_4
,first_value(lag_5) ignore nulls over (partition by customer_id order by order_id desc) AS lag_5
FROM VALUES
(10001, 1, null, 3, null, null, null),
(10001, 2, null, null, 12, null, null),
(10001, 3, null, null, null, 5, null),
(10001, 4, null, null, null, null, 27),
(10002, 1, null, 7, null, null, null),
(10002, 2, null, null, 3, null, null),
(10002, 3, null, null, null, 4, null),
(10002, 4, null, null, null, null, 12)
v(customer_id, order_id, lag_1, lag_2, lag_3, lag_4, lag_5)
ORDER BY 1,2;
will give:
| CUSTOMER_ID | LAG_1 | LAG_2 | LAG_3 | LAG_4 | LAG_5 |
|---|---|---|---|---|---|
| 10,001 | NULL | 3 | 12 | 5 | 27 |
| 10,002 | NULL | 7 | 3 | 4 | 12 |
FIRST_VALUE allows you to control the order, because if your data had a lower priority values, that where larger:
FROM VALUES
(10001, 0, null, 100, 100, 100, 1000),
(10001, 1, null, 3, null, null, null),
(10001, 2, null, null, 12, null, null),
(10001, 3, null, null, null, 5, null),
(10001, 4, null, null, null, null, 27),
(10002, 1, null, 7, null, null, null),
(10002, 2, null, null, 3, null, null),
(10002, 3, null, null, null, 4, null),
(10002, 4, null, null, null, null, 12)
then using MAX will get that value:
SELECT customer_id
,max(lag_1) AS lag_1
,max(lag_2) AS lag_2
,max(lag_3) AS lag_3
,max(lag_4) AS lag_4
,max(lag_5) AS lag_5
FROM VALUES
(10001, 0, null, 100, 100, 100, 1000),
(10001, 1, null, 3, null, null, null),
(10001, 2, null, null, 12, null, null),
(10001, 3, null, null, null, 5, null),
(10001, 4, null, null, null, null, 27),
(10002, 1, null, 7, null, null, null),
(10002, 2, null, null, 3, null, null),
(10002, 3, null, null, null, 4, null),
(10002, 4, null, null, null, null, 12)
v(customer_id, order_id, lag_1, lag_2, lag_3, lag_4, lag_5)
GROUP BY 1
ORDER BY 1,2;
gives:
| CUSTOMER_ID | LAG_1 | LAG_2 | LAG_3 | LAG_4 | LAG_5 |
|---|---|---|---|---|---|
| 10,001 | 100 | 100 | 100 | 1,000 | |
| 10,002 | 7 | 3 | 4 | 12 |
Which maybe you want that.. it's not clear.
Another option is if you already want to use a GROUP BY clause, and void distinct, is to use ARRAY_AGG and return only the first value of the array:
SELECT customer_id
,get(array_agg(lag_1) WITHIN GROUP (order by order_id desc),0) AS lag_1
,get(array_agg(lag_2) WITHIN GROUP (order by order_id desc),0) AS lag_2
,get(array_agg(lag_3) WITHIN GROUP (order by order_id desc),0) AS lag_3
,get(array_agg(lag_4) WITHIN GROUP (order by order_id desc),0) AS lag_4
,get(array_agg(lag_5) WITHIN GROUP (order by order_id desc),0) AS lag_5
FROM VALUES
(10001, 0, null, 100, 100, 100, 1000),
(10001, 1, null, 3, null, null, null),
(10001, 2, null, null, 12, null, null),
(10001, 3, null, null, null, 5, null),
(10001, 4, null, null, null, null, 27),
(10002, 1, null, 7, null, null, null),
(10002, 2, null, null, 3, null, null),
(10002, 3, null, null, null, 4, null),
(10002, 4, null, null, null, null, 12)
v(customer_id, order_id, lag_1, lag_2, lag_3, lag_4, lag_5)
GROUP BY 1
ORDER BY 1;
gives:
| CUSTOMER_ID | LAG_1 | LAG_2 | LAG_3 | LAG_4 | LAG_5 |
|---|---|---|---|---|---|
| 10,001 | 3 | 12 | 5 | 27 | |
| 10,002 | 7 | 3 | 4 | 12 |
