I have an order line table that looks like this:
| ID | Order ID | Product Reference | Variant |
|---|---|---|---|
| 1 | 1 | Banana | Green |
| 2 | 1 | Banana | Yellow |
| 3 | 2 | Apple | Green |
| 4 | 2 | Banana | Brown |
| 5 | 3 | Apple | Red |
| 6 | 3 | Apple | Yellow |
| 7 | 4 | Apple | Yellow |
| 8 | 4 | Banana | Green |
| 9 | 4 | Banana | Yellow |
| 10 | 4 | Pear | Green |
| 11 | 4 | Pear | Green |
| 12 | 4 | Pear | Green |
I want to know how often people place an order with a combination of different fruit products. I want to know the orderId for that situation and which productReference was combined in the orders.
I only care about the product, not the variant.
I would imagine the desired output looking like this - a simple table output that gives insight in what product combos are ordered:
| Order ID | Product |
|---|---|
| 2 | Banana |
| 2 | Apple |
| 4 | Banana |
| 4 | Apple |
| 4 | Pear |
I just need data output of the combination Banana Apple and Banana Apple Pear happening so I can get more insight in the frequency of how often this happens. We expect most of our customers to only order Apple, Banana or Pear products, but that assumption needs to be verified.
Problem
I kind of get stuck after the first step.
select orderId, productReference, count(*) as amount
from OrderLines
group by orderId, productReference
This outputs:
| Order ID | Product Reference | amount |
|---|---|---|
| 1 | Banana | 2 |
| 2 | Apple | 1 |
| 2 | Banana | 1 |
| 3 | Apple | 2 |
| 4 | Apple | 1 |
| 4 | Banana | 2 |
| 4 | Pear | 3 |
I just don't know how to move on from this step to get the data I want.
CodePudding user response:
You can use a window count() over()
select *
from
(
select orderId, productReference, count(*) as amount
, count(productReference) over(partition by orderId) np
from OrderLines
group by orderId, productReference
) t
where np > 1
CodePudding user response:
You need only the rows where an Order_Id has different products; you can do this many ways.
One way is to aggregate and filter to only rows where the min product <> the max product, then use a correlation to find matching orders:
select distinct t.Order_ID, t.Product_Reference
from t
where exists (
select *
from t t2
where t2.Order_ID = t.Order_ID
group by order_id
having Min(Product_Reference) != Max(Product_Reference)
);
See this demo fiddle
CodePudding user response:
You could use STRING_AGG: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-ver16
Here's an example:
SELECT orderID, STRING_AGG(productReference, ' ') products
FROM
(
SELECT DISTINCT orderID, productReference
FROM orderLines
) order_products
GROUP BY orderID
For each order ID, this pulls out the distinct products, then the STRING_AGG combines them into one field.
Output
| orderID | products |
|---|---|
| 1 | Banana |
| 2 | Apple Banana |
| 3 | Apple |
| 4 | Apple Banana Pear |
SQL fiddle example: http://sqlfiddle.com/#!18/8a677/6
