Need help in forming a SQL query-CodePudding

We have 2 tables called tbl1 and tbl2. It contains columns such as Visit_ID, Customer ID, and so on. There are instances where a Visit_ID will be associated with multiple Customer IDs.

For example, if customer logs into a website, a unique Visit_ID will be generated for each time he visits the website.

In one visit, multiple customers can login to their accounts and make individual purchases.

There are instances where a visit will be associated to multiple customer IDs. If there are more than 2 instances, append any other retail customer ID instances in this column.

For instances there are visit, which had 200 Customer IDs attached to that visit.

For example, if there are 7 Customer IDs in 1 visit, for Customer 1, it should have the first customer 1. For Customer 2, we will need to display the 2nd customer ID.

For 3rd to 7, all those 5 will be comma separated.

Can someone help how to frame a SQL query using this logic?

SELECT
visit_id,
CUSTOMER_ID,
B.visitpg_nbr::INT AS visitpg_nbr,
row_number()over(order by B.visitpg_nbr) as rank
from
dbname.schema_name.tbl1 A
JOIN dbname.schema_name.tbl2 B
ON B.id_column = A.id_column
WHERE flg_col = '0'
AND CREATE_DT = '2022-01-02'
and visit_id='1'
ORDER BY visitpg_nbr)
SELECT distinct
visit_id,rank,
case when rank=1 then max(CUSTOMER_ID) end as Customer_1,
case when rank=2 then max(CUSTOMER_ID) end as Customer_2 ,
case when rank>2 then CUSTOMER_ID end as Remaining_Customers,

 LISTAGG(distinct case when rank>2 then CUSTOMER_ID end, ',')
 WITHIN GROUP(ORDER BY case when rank>2 then CUSTOMER_ID end)AS Remaining_Cust
from CTE
GROUP BY 1,2

The query which I tried is incorrect. Kindly help me in forming a correct query using the logic mentioned.

VISIT_ID    First_Customer Second_Customer  Other_Customers

1            100            101             102,103,104,105,106
2            200            201             202,203,204,205

First Customer_ID should get displayed in the First_Customer column, Second_Customer_Id should get displayed in Second_Customer column.. All the other customer_ids should be displayed in the final column and it should be comma separated.

CodePudding user response：

You should be able to get this with array_agg(), and then choosing the first, second, and subsequent (array_slice()) elements:

with data as (
    select *
    from snowflake_sample_data.tpch_sf100.orders
    where o_custkey between 5411266 and 5411290
)

select o_custkey, arr[0], arr[1], array_to_string(array_slice(arr, 2, 99999), ', ')
from (
    select o_custkey, array_agg(o_orderkey) within group(order by o_orderdate) arr
    from data
    group by 1 
);

You might need to get unique ids in case there are many, you can solve that with a subquery before array_agg().

CodePudding user response：

slightly different to Felipe's answer, not sure which would be more performant. I suspect his, but anyways here is another way to try it.

SELECT visit_id, first_customer, second_customer 
    ,array_agg(other_ids) within group (order by order_id) as other_customer
FROM(
    SELECT visit_id,
        order_id,
        first_value(customer_id) over (partition by visit_id order by order_id) as first_customer,
        first_value(customer_id) over (partition by visit_id order by order_id) as second_customer,
        IFF(row_number() over (partition by visit_id order by order_id) > 2, customer_id, null) as other_ids
    FROM VALUES
        (1,100, 1),
        (1,101, 2),
        (1,102, 3),
        (1,103, 5),
        (1,104, 6),
        (1,105, 6),
        (1,106, 7),
        (2,200, 1),
        (2,201, 2),
        (2,202, 3),
        (2,203, 4)
        v(visit_id, customer_id, order_id)
    )
GROUP BY 1,2,3
ORDER BY 1,2,3;

VISIT_ID	FIRST_CUSTOMER	SECOND_CUSTOMER	OTHER_CUSTOMER
1	100	100	[ 102, 103, 104, 105, 106 ]
2	200	200	[ 202, 203 ]