Select rows where the combination of two columns is unique and we only display rows where the first-CodePudding

I have an order line table that looks like this:

ID	Order ID	Product Reference	Variant
1	1	Banana	Green
2	1	Banana	Yellow
3	2	Apple	Green
4	2	Banana	Brown
5	3	Apple	Red
6	3	Apple	Yellow
7	4	Apple	Yellow
8	4	Banana	Green
9	4	Banana	Yellow
10	4	Pear	Green
11	4	Pear	Green
12	4	Pear	Green

I want to know how often people place an order with a combination of different fruit products. I want to know the orderId for that situation and which productReference was combined in the orders.

I only care about the product, not the variant.

I would imagine the desired output looking like this - a simple table output that gives insight in what product combos are ordered:

Order ID	Product
2	Banana
2	Apple
4	Banana
4	Apple
4	Pear

I just need data output of the combination Banana Apple and Banana Apple Pear happening so I can get more insight in the frequency of how often this happens. We expect most of our customers to only order Apple, Banana or Pear products, but that assumption needs to be verified.

Problem

I kind of get stuck after the first step.

select orderId, productReference, count(*) as amount
from OrderLines
group by orderId, productReference

This outputs:

Order ID	Product Reference	amount
1	Banana	2
2	Apple	1
2	Banana	1
3	Apple	2
4	Apple	1
4	Banana	2
4	Pear	3

I just don't know how to move on from this step to get the data I want.

CodePudding user response：

You can use a window count() over()

select * 
from
(   
    select orderId, productReference, count(*) as amount
       , count(productReference) over(partition by orderId) np
    from OrderLines
    group by orderId, productReference    
) t
where np > 1

CodePudding user response：

You need only the rows where an Order_Id has different products; you can do this many ways.

One way is to aggregate and filter to only rows where the min product <> the max product, then use a correlation to find matching orders:

select distinct t.Order_ID, t.Product_Reference
from t
where exists (
    select *
    from t t2
    where t2.Order_ID = t.Order_ID
    group by order_id
    having Min(Product_Reference) != Max(Product_Reference)
);

See this demo fiddle

CodePudding user response：

You could use STRING_AGG: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-ver16

Here's an example:

SELECT orderID, STRING_AGG(productReference, ' ') products
FROM  
    ( 
      SELECT DISTINCT orderID, productReference
      FROM orderLines        
    ) order_products
GROUP BY orderID

For each order ID, this pulls out the distinct products, then the STRING_AGG combines them into one field.

Output

orderID	products
1	Banana
2	Apple Banana
3	Apple
4	Apple Banana Pear

SQL fiddle example: http://sqlfiddle.com/#!18/8a677/6