Home > Software engineering >  How to count distinct combinations of values across multiple columns?
How to count distinct combinations of values across multiple columns?

Time:02-04

I have a table that looks a bit like this (lets call it table1):

column1 | column2 | column3
--------|---------|--------
111     | 222     | 333
111     | 222     | 333 
222     | 111     | 333
222     | 333     | 444

What I want to do is count the number of rows which have the same three values regardless of which columns they are in, i.e. these three rows:

column1 | column2 | column3
--------|---------|--------
111     | 222     | 333
111     | 222     | 333 
222     | 111     | 333

Because each row has 111, 222, and 333 in it, all three of these rows should be counted regardless of the fact that they have those values in different columns.

I can get a count of duplicate rows with this code, but for the example above it would return a count of 2 instead of the desired 3:

select  count(distinct combination)
from
        (
        select  column1||column2||column3 as combination
        from    table1
        )
; 

Does anyone know how to do this please?

CodePudding user response:

This isn't a specific DB2 solution but I tried this on MySQL and I suppose this will work on DB2 as well. Not sure whether it solves your problem 100% but hopefully it helps.

In the where clause of the first select within "concat_union" I specified a concatenated value which is then regarded as the 'original' string. It doesn't matter which combination ('111222333' vs. '222111333' vs. '333111222') you choose as the original - but you have to specify one so you're able to look for duplicates using the union. The union consists of a concatetation of the columns using each possible combination. The last step is then to count the values which look like the original.

with concat_union as (
SELECT CONCAT(column1, column2, column3) as COMB, 'ORIG' as TYPE from table1 
    where CONCAT(column1, column2, column3) = '111222333'
UNION ALL
SELECT CONCAT(column1, column3, column2) as COMB, 'GEN' as TYPE from table1
UNION ALL 
SELECT CONCAT(column2, column1, column3) as COMB, 'GEN' as TYPE from table1 
UNION ALL
SELECT CONCAT(column2, column3, column1) as COMB, 'GEN' as TYPE from table1 
UNION ALL
SELECT CONCAT(column3, column1, column2) as COMB, 'GEN' as TYPE from table1 
UNION ALL
SELECT CONCAT(column3, column2, column1) as COMB, 'GEN' as TYPE from table1
)
select count(COMB) from concat_union where COMB = (select distinct COMB from 
concat_union where TYPE = 'ORIG');

You can try it out here https://sqlize.online/sql/mysql80/7df20f2753dec79e8bfb6a2e4f6445f7/

CodePudding user response:

Try this as is:

WITH 
  ORIG (column1, column2, column3) AS
(
    -- Original table values
    VALUES
      (111, 222, 333)
    , (111, 222, 333)
    , (222, 111, 333)
    , (222, 333, 444)
    --, (222, 444, 333)
)
, ENUM AS
(
    -- Row enumeration
    SELECT ROW_NUMBER () OVER () AS RN_, O.*
    FROM ORIG O
)

SELECT SUM (CNT)
FROM
(
    -- Number of rows for each the same ordered list of column values
    SELECT LST, COUNT (1) CNT
    FROM
    (
        -- Columns to rows 
        -- and get ordered list of column values for each original row
        SELECT E.RN_, LISTAGG (C.V, ',') WITHIN GROUP (ORDER BY C.V) LST
        FROM ENUM E
        CROSS JOIN TABLE (VALUES E.column1, E.column2, E.column3) C (V)
        GROUP BY E.RN_
    )
    GROUP BY LST
    HAVING COUNT (1) > 1
)
  •  Tags:  
  • Related