I have a table with several rows, that looks more or less like this.
---- ------------ -------------
| ID | CustomerID | ProductType |
---- ------------ -------------
| 1 | 1 | A |
| 2 | 1 | A |
| 3 | 1 | B |
| 4 | 1 | B |
| 5 | 1 | B |
| 6 | 2 | A |
| 7 | 2 | A |
---- ------------ -------------
ID is just a auto_increment identity
CustomerID is a unique Id for each customer
ProductType Type of the specific Product
I need to create a select, with two extra columns where the resulting output should look like this:
Desired result:
---- ------------- ---------------- ------------ -------------
| ID | Cluster_ID | Sub_Cluster_ID | CustomerID | ProductType |
---- ------------- ---------------- ------------ -------------
| 1 | 1 | 1 | 1 | A |
| 2 | 1 | 2 | 1 | A |
| 3 | 2 | 1 | 1 | B |
| 4 | 2 | 2 | 1 | B |
| 5 | 2 | 3 | 1 | B |
| 6 | 1 | 1 | 2 | A |
| 7 | 1 | 2 | 2 | A |
---- ------------- ---------------- ------------ -------------
Cluster_ID is a counter for a customer, that only counts up on a new
ProductType(so all rows for a Customer, with the sameProductType, have the sameCluster_ID)
Sub_Cluster_ID is a simple counter, that counts up the entries for aProductTypeand Customer (so all rows for a Customer, with the sameCluster_ID, have a incrementingSub_Cluster_ID)
I tried using ROW_NUMBER, but I could not get it to work
Here some of my failed attempts:
This seems to works for the
Sub_Cluster_ID, but not for theCluster_ID.SELECT [ID] , ROW_NUMBER() OVER (Partition by CustomerID ORDER BY CustomerID, ProductType) Cluster_ID , ROW_NUMBER() OVER (Partition by CustomerID, ProductType ORDER BY CustomerID, ProductType) Sub_Cluster_ID , [CustomerID] , [ProductType] FROM dbo.TableThis seems is almost there, but the
Cluster_IDis still not correctSELECT [ID] , ROW_NUMBER() OVER (Partition by CustomerID ORDER BY CustomerID) Cluster_ID , ROW_NUMBER() OVER (Partition by CustomerID, ProductType ORDER BY CustomerID, ProductType) Sub_Cluster_ID , [CustomerID] , [ProductType] FROM dbo.Table
Almost there:
(but the clusterID is still not matching the needed Output)
---- ------------ ---------------- ------------ -------------
| ID | Cluser_ID | Sub_Cluster_ID | CustomerID | ProductType |
---- ------------ ---------------- ------------ -------------
| 1 | 1 | 1 | 1 | A |
| 2 | 2 | 2 | 1 | A |
| 3 | 3 | 1 | 1 | B |
| 4 | 4 | 2 | 1 | B |
| 5 | 5 | 3 | 1 | B |
| 6 | 1 | 1 | 2 | A |
| 7 | 2 | 2 | 2 | A |
---- ------------ ---------------- ------------ -------------
Is it anyway possible? Or do I need subqueries?
CodePudding user response:
Seems like you need dense_rank() instead of row_number() for Cluster_ID:
select ID,
dense_rank() over(partition by CustomerID order by CustomerID, ProductType) as Cluster_ID,
ROW_NUMBER() OVER (Partition by CustomerID, ProductType ORDER BY CustomerID, ProductType) Sub_Cluster_ID,
CustomerID,
ProductType
from table_name;
CodePudding user response:
You need DENSE_RANK for this.
However, it makes no sense to partition and order by the same columns like in the other answer. So you can either use an arbitrary ordering for the ROW_NUMBER
SELECT
ID,
DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY ProductType) as Cluster_ID,
ROW_NUMBER() OVER (PARTITION BY CustomerID, ProductType ORDER BY (SELECT 1)) Sub_Cluster_ID,
CustomerID,
ProductType
FROM table_name;
Or you can order by another column such as ID
SELECT
ID,
DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY ProductType) as Cluster_ID,
ROW_NUMBER() OVER (PARTITION BY CustomerID, ProductType ORDER BY ID) Sub_Cluster_ID,
CustomerID,
ProductType
FROM table_name;
