I have a data set like below:
| transaction_id | store_id | product_id | product_name |
|---|---|---|---|
| 1 | 100 | p001 | product_1 |
| 1 | 100 | p002 | product_2 |
| 1 | 100 | p003 | product_3 |
| 4 | 100 | p002 | product_2 |
| 4 | 100 | p003 | product_3 |
| 5 | 100 | p002 | product_2 |
| 5 | 100 | p003 | product_3 |
| 7 | 100 | p001 | product_1 |
| 7 | 100 | p003 | product_3 |
| 8 | 101 | p002 | product_2 |
| 8 | 101 | p003 | product_3 |
| 9 | 101 | p001 | product_1 |
| 9 | 101 | p002 | product_2 |
| 2 | 101 | p001 | product_1 |
| 3 | 101 | p002 | product_2 |
| 3 | 101 | p003 | product_3 |
| 6 | 101 | p001 | product_1 |
| 6 | 101 | p002 | product_2 |
I am trying to find a query to give output as below.
| store | freq_prod_ids | count_of_transactions |
|---|---|---|
| 100 | p002, p003 | 3 |
| 100 | p001, p003 | 2 |
| 101 | p001, p002 | 2 |
| 101 | p002, p003 | 2 |
This essentially should give the top 2 [N=2] frequently bought product combinations in a single transaction for each store.
Please help to have an SQL query to get this response.
CodePudding user response:
You may try the following which performs a self-join before aggregating based on store and product pairs. The row_number is used to retrieve the top 2 product pairs per store.
SELECT
store_id, freq_prod_ids,count_of_transactions
FROM (
SELECT
t1.store_id,
CONCAT(t1.product_id,', ',t2.product_id) as freq_prod_ids,
COUNT(1) as count_of_transactions,
ROW_NUMBER() OVER (PARTITION BY t1.store_id ORDER BY COUNT(1) DESC) as rn
FROM my_table t1
INNER JOIN my_table t2 on t1.store_id = t2.store_id and
t1.product_id < t2.product_id and
t1.transaction_id = t2.transaction_id
GROUP BY t1.store_id,CONCAT(t1.product_id,', ',t2.product_id)
) t3 WHERE rn <=2
Let me know if this works for you.
