I'm trying to get a distinct count of rows within a window function that has multiple levels of partitioning. Below is a sample of my data.
| PRODUCT_ID | KEY_ID | STORECLUSTER |
|---|---|---|
| 1000078 | 120 | LLNY |
| 1000078 | 202 | LLF |
| 1000078 | 202 | LLNY |
| 1000078 | 202 | LLNY |
I want to look at each PRODUCT_ID and then each unique KEY_ID and determine how many unique STORECLUSTERS there are per KEY_ID. For example PRODUCT_ID 1000078 has two unique KEY_ID's (120 and 202) of which 120 has 1 unique STORECLUSTER and 202 has 2 unqiue STORECLUSTER's. I've tried using a RANK() and DENSE_RANK() but I can't seem to get the partitioning correct. I would like to get a table that looks like this:
| PRODUCT_ID | KEY_ID | STORECLUSTER | STORECLUSTER_COUNT |
|---|---|---|---|
| 1000078 | 120 | LLNY | 1 |
| 1000078 | 202 | LLF | 2 |
| 1000078 | 202 | LLNY | 2 |
| 1000078 | 202 | LLNY | 2 |
CodePudding user response:
Unfortunately, SQL Server does not support COUNT(DISTINCT as a window function.
So you need to nest window functions. I find the simplest and most efficient method is MAX over a DENSE_RANK, but there are others.
The partitioning clause is the equivalent of GROUP BY in a normal aggregate, then the value you are DISTINCTing goes in the ORDER BY of the DENSE_RANK. So you calculate a ranking, while ignoring tied results, then take the maximum rank, per partition.
SELECT
PRODUCT_ID,
KEY_ID,
STORECLUSTER,
STORECLUSTER_COUNT = MAX(rn) OVER (PARTITION BY PRODUCT_ID, KEY_ID)
FROM (
SELECT *,
rn = DENSE_RANK() OVER (PARTITION BY PRODUCT_ID, KEY_ID ORDER BY STORECLUSTER)
FROM YourTable t
) t;
