I have 5 conditions (A-E) and a bunch of patient ID's. My data set is 2 columns: PatientID, Condition.
There are duplicate PatientID's with every new condition:
| PatientID | Condition |
|---|---|
| 456 | C |
| 456 | E |
| 279 | D |
| 123 | A |
| 123 | C |
| 123 | D |
| 187 | D |
| 296 | E |
| 296 | C |
I believe there are 31 different potential combinations (order doesn't matter) of those 5 conditions (ie A, AB, ABC, AC, ACDE, etc)
I want to count how many patients in each combination of conditions. So my results for the above would be - CE: 2 D: 2 ACD: 1
I'm more familiar with Excel but if this is better handled in SQL, I can do it there. I think I need to create a table of all the different combinations (any help on that would be appreciated too) and then do a count from there but I'm not sure if that's the best way.
CodePudding user response:
SQL Server solution
SELECT
conditions,
COUNT(*)
FROM (
SELECT
patientid,
STRING_AGG(condition, '') WITHIN GROUP (ORDER BY condition) conditions
FROM tbl
GROUP BY patientid
) c
GROUP BY conditions
Output
| conditions | patient_count |
|---|---|
| ACD | 1 |
| CE | 2 |
| D | 2 |

