We have 2 tables called tbl1 and tbl2. It contains columns such as Visit_ID, Customer ID, and so on. There are instances where a Visit_ID will be associated with multiple Customer IDs.
For example, if customer logs into a website, a unique Visit_ID will be generated for each time he visits the website.
In one visit, multiple customers can login to their accounts and make individual purchases.
There are instances where a visit will be associated to multiple customer IDs. If there are more than 2 instances, append any other retail customer ID instances in this column.
For instances there are visit, which had 200 Customer IDs attached to that visit.
For example, if there are 7 Customer IDs in 1 visit, for Customer 1, it should have the first customer 1. For Customer 2, we will need to display the 2nd customer ID.
For 3rd to 7, all those 5 will be comma separated.
Can someone help how to frame a SQL query using this logic?
SELECT
visit_id,
CUSTOMER_ID,
B.visitpg_nbr::INT AS visitpg_nbr,
row_number()over(order by B.visitpg_nbr) as rank
from
dbname.schema_name.tbl1 A
JOIN dbname.schema_name.tbl2 B
ON B.id_column = A.id_column
WHERE flg_col = '0'
AND CREATE_DT = '2022-01-02'
and visit_id='1'
ORDER BY visitpg_nbr)
SELECT distinct
visit_id,rank,
case when rank=1 then max(CUSTOMER_ID) end as Customer_1,
case when rank=2 then max(CUSTOMER_ID) end as Customer_2 ,
case when rank>2 then CUSTOMER_ID end as Remaining_Customers,
LISTAGG(distinct case when rank>2 then CUSTOMER_ID end, ',')
WITHIN GROUP(ORDER BY case when rank>2 then CUSTOMER_ID end)AS Remaining_Cust
from CTE
GROUP BY 1,2
The query which I tried is incorrect. Kindly help me in forming a correct query using the logic mentioned.
VISIT_ID First_Customer Second_Customer Other_Customers
1 100 101 102,103,104,105,106
2 200 201 202,203,204,205
First Customer_ID should get displayed in the First_Customer column, Second_Customer_Id should get displayed in Second_Customer column.. All the other customer_ids should be displayed in the final column and it should be comma separated.
CodePudding user response:
You should be able to get this with array_agg(), and then choosing the first, second, and subsequent (array_slice()) elements:
with data as (
select *
from snowflake_sample_data.tpch_sf100.orders
where o_custkey between 5411266 and 5411290
)
select o_custkey, arr[0], arr[1], array_to_string(array_slice(arr, 2, 99999), ', ')
from (
select o_custkey, array_agg(o_orderkey) within group(order by o_orderdate) arr
from data
group by 1
);
You might need to get unique ids in case there are many, you can solve that with a subquery before array_agg().
CodePudding user response:
slightly different to Felipe's answer, not sure which would be more performant. I suspect his, but anyways here is another way to try it.
SELECT visit_id, first_customer, second_customer
,array_agg(other_ids) within group (order by order_id) as other_customer
FROM(
SELECT visit_id,
order_id,
first_value(customer_id) over (partition by visit_id order by order_id) as first_customer,
first_value(customer_id) over (partition by visit_id order by order_id) as second_customer,
IFF(row_number() over (partition by visit_id order by order_id) > 2, customer_id, null) as other_ids
FROM VALUES
(1,100, 1),
(1,101, 2),
(1,102, 3),
(1,103, 5),
(1,104, 6),
(1,105, 6),
(1,106, 7),
(2,200, 1),
(2,201, 2),
(2,202, 3),
(2,203, 4)
v(visit_id, customer_id, order_id)
)
GROUP BY 1,2,3
ORDER BY 1,2,3;
| VISIT_ID | FIRST_CUSTOMER | SECOND_CUSTOMER | OTHER_CUSTOMER |
|---|---|---|---|
| 1 | 100 | 100 | [ 102, 103, 104, 105, 106 ] |
| 2 | 200 | 200 | [ 202, 203 ] |

