Home > Back-end >  Summary from pivot table for each ID
Summary from pivot table for each ID

Time:02-01

Im working at SNOWFLAKE database. I have that type of table (pivot product): enter image description here

I need only one row for each CUSTOMER_ID with lags, somethin like this: enter image description here

How can I do that? I try:

SELECT 
CUSTOMER_ID,
SUM(LAG_1) AS LAG_0_1,
SUM(LAG_2) AS LAG_1_2,
SUM(LAG_3) AS LAG_2_3,
SUM(LAG_4) AS LAG_3_4,
SUM(LAG_5) AS LAG_4_5
FROM TEMP_TABLE
GROUP BY 1

But output is totaly different against my expectations. Thank you.

CodePudding user response:

If lag_1 lag_2 are in effect

   ,LAG(column,1) OVER (blar blar blar) AS lag_1
   ,LAG(column,2) OVER (blar blar blar) AS lag_1

and you want to avoid the nulls,

then use IGNORE NULLS to get the most recent row that is not null:

   ,LAG(column) IGNORE NULLS OVER (blar blar blar) AS lag

OR if you are wanting them if a different ORDER to can do the n-lags but COALESCE the results in the order of preference.

    COALESCE(lag5, lag_3, lag_1, lag_4, lag_2) as awesome_lag_order

So if you have the data as you present, and you want the output, you can use FIRST_VALUE (or inverse the ORDER BY and use LAST_VALUE) and then reduce the results with DISTINCT like so (I added an ordering column to make things simpler for me):

SELECT DISTINCT customer_id
    ,first_value(lag_1) ignore nulls over (partition by customer_id order by order_id desc) AS lag_1
    ,first_value(lag_2) ignore nulls over (partition by customer_id order by order_id desc) AS lag_2
    ,first_value(lag_3) ignore nulls over (partition by customer_id order by order_id desc) AS lag_3
    ,first_value(lag_4) ignore nulls over (partition by customer_id order by order_id desc) AS lag_4
    ,first_value(lag_5) ignore nulls over (partition by customer_id order by order_id desc) AS lag_5
FROM VALUES 
    (10001, 1, null, 3, null, null, null),
    (10001, 2, null, null, 12, null, null),
    (10001, 3, null, null, null, 5, null),
    (10001, 4, null, null, null, null, 27),
    (10002, 1, null, 7, null, null, null),
    (10002, 2, null, null, 3, null, null),
    (10002, 3, null, null, null, 4, null),
(10002, 4, null, null, null, null, 12)
v(customer_id, order_id, lag_1, lag_2,  lag_3,  lag_4,  lag_5)
ORDER BY 1,2;

will give:

CUSTOMER_ID LAG_1 LAG_2 LAG_3 LAG_4 LAG_5
10,001 NULL 3 12 5 27
10,002 NULL 7 3 4 12

FIRST_VALUE allows you to control the order, because if your data had a lower priority values, that where larger:

FROM VALUES 
(10001, 0, null, 100, 100, 100, 1000),
(10001, 1, null, 3, null, null, null),
(10001, 2, null, null, 12, null, null),
(10001, 3, null, null, null, 5, null),
(10001, 4, null, null, null, null, 27),
(10002, 1, null, 7, null, null, null),
(10002, 2, null, null, 3, null, null),
(10002, 3, null, null, null, 4, null),
(10002, 4, null, null, null, null, 12)

then using MAX will get that value:

SELECT customer_id
    ,max(lag_1) AS lag_1
    ,max(lag_2) AS lag_2
    ,max(lag_3) AS lag_3
    ,max(lag_4) AS lag_4
    ,max(lag_5) AS lag_5
FROM VALUES 
(10001, 0, null, 100, 100, 100, 1000),
(10001, 1, null, 3, null, null, null),
(10001, 2, null, null, 12, null, null),
(10001, 3, null, null, null, 5, null),
(10001, 4, null, null, null, null, 27),
(10002, 1, null, 7, null, null, null),
(10002, 2, null, null, 3, null, null),
(10002, 3, null, null, null, 4, null),
(10002, 4, null, null, null, null, 12)
v(customer_id, order_id, lag_1, lag_2,  lag_3,  lag_4,  lag_5)
GROUP BY 1
ORDER BY 1,2;

gives:

CUSTOMER_ID LAG_1 LAG_2 LAG_3 LAG_4 LAG_5
10,001 100 100 100 1,000
10,002 7 3 4 12

Which maybe you want that.. it's not clear.

Another option is if you already want to use a GROUP BY clause, and void distinct, is to use ARRAY_AGG and return only the first value of the array:

SELECT customer_id
    ,get(array_agg(lag_1) WITHIN GROUP (order by order_id desc),0) AS lag_1
    ,get(array_agg(lag_2) WITHIN GROUP (order by order_id desc),0) AS lag_2
    ,get(array_agg(lag_3) WITHIN GROUP (order by order_id desc),0) AS lag_3
    ,get(array_agg(lag_4) WITHIN GROUP (order by order_id desc),0) AS lag_4
    ,get(array_agg(lag_5) WITHIN GROUP (order by order_id desc),0) AS lag_5
FROM VALUES 
(10001, 0, null, 100, 100, 100, 1000),
(10001, 1, null, 3, null, null, null),
(10001, 2, null, null, 12, null, null),
(10001, 3, null, null, null, 5, null),
(10001, 4, null, null, null, null, 27),
(10002, 1, null, 7, null, null, null),
(10002, 2, null, null, 3, null, null),
(10002, 3, null, null, null, 4, null),
(10002, 4, null, null, null, null, 12)
v(customer_id, order_id, lag_1, lag_2,  lag_3,  lag_4,  lag_5)
GROUP BY 1
ORDER BY 1;

gives:

CUSTOMER_ID LAG_1 LAG_2 LAG_3 LAG_4 LAG_5
10,001 3 12 5 27
10,002 7 3 4 12
  •  Tags:  
  • Related