Home > Back-end >  Counting unique combinations of values across multiple columns regardless of order?
Counting unique combinations of values across multiple columns regardless of order?

Time:02-05

I have a table that looks a bit like this:

Customer_ID | Offer_1 | Offer_2 | Offer_3
------------|---------|---------|--------
111         | A01     | 001     | B01
222         | A01     | B01     | 001
333         | A02     | 001     | B01

I want to write a query to figure out how many unique combinations of offers there are in the table, regardless of what order the offers appear in.

So in the example above there are two unique combinations: customers 111 & 222 both have the same three offers so they count as one unique combination, and then customer 333 is the only customer to have the three orders that they have. So the desired output of the query would be 2.

For some additional context:

  • The customer_ID column is in integer format, and all the offer columns are in varchar format.
  • There are 12 offer columns and over 3 million rows in the actual table, with over 100 different values in the offer columns. I simplified the example to better illustrate what I'm trying to do, but any solution needs to scale to this amount of possible combinations.

I can concatenate all of the offer columns together and then run a count distinct statement on the result, but this doesn't account for customers who have the same unique combination of offers but ordered differently (like customers 111 & 222 in the example above).

Does anyone know how to solve this problem please?

CodePudding user response:

One way to do it would be to union all the offers into one column, then use select distinct listagg... to get the combinations of offers. Try this:

with u as
(select Customer_ID, Offer_1 as Offer from table_name union all
 select Customer_ID, Offer_2 as Offer from table_name union all
 select Customer_ID, Offer_3 as Offer from table_name)
select distinct listagg(Offer, ',') within group(order by Offer) from u
group by Customer_ID

Fiddle

CodePudding user response:

Assuming the character / doesn't show up in any of the offer names, you can do:

select count(distinct offer_combo) as distinct_offers
from (
  select listagg(offer, '/') within group (order by offer) as offer_combo
  from (
    select customer_id, offer_1 as offer from t
    union all select customer_id, offer_2 from t
    union all select customer_id, offer_3 from t
  ) x
  group by customer_id
) y

Result:

DISTINCT_OFFERS
---------------
2

See running example at db<>fiddle.

CodePudding user response:

The solution without UNION ALLs. It should have better performance.

/*
WITH MYTAB (Customer_ID, Offer_1, Offer_2, Offer_3) AS
(
VALUES
  (111, 'A01', '001', 'B01')
, (222, 'A01', 'B01', '001')
, (333, 'A02', '001', 'B01')
)
*/
SELECT COUNT (DISTINCT LIST)
FROM
(
SELECT LISTAGG (V.Offer, '|') WITHIN GROUP (ORDER BY V.Offer) LIST 
FROM MYTAB T
CROSS JOIN TABLE (VALUES T.Offer_1, T.Offer_2, T.Offer_3) V (Offer)
GROUP BY T.CUSTOMER_ID
)
  •  Tags:  
  • Related