Home > Blockchain >  Is there a easy way to group/Clusters Entries in one Select
Is there a easy way to group/Clusters Entries in one Select

Time:02-01

I have a table with several rows, that looks more or less like this.

 ---- ------------ ------------- 
| ID | CustomerID | ProductType |
 ---- ------------ ------------- 
| 1  | 1          | A           |
| 2  | 1          | A           |
| 3  | 1          | B           |
| 4  | 1          | B           |
| 5  | 1          | B           |
| 6  | 2          | A           |
| 7  | 2          | A           |
 ---- ------------ ------------- 

ID is just a auto_increment identity
CustomerID is a unique Id for each customer
ProductType Type of the specific Product

I need to create a select, with two extra columns where the resulting output should look like this:

Desired result:

 ---- ------------- ---------------- ------------ ------------- 
| ID | Cluster_ID  | Sub_Cluster_ID | CustomerID | ProductType |
 ---- ------------- ---------------- ------------ ------------- 
| 1  | 1           | 1              | 1          | A           |
| 2  | 1           | 2              | 1          | A           |
| 3  | 2           | 1              | 1          | B           |
| 4  | 2           | 2              | 1          | B           |
| 5  | 2           | 3              | 1          | B           |
| 6  | 1           | 1              | 2          | A           |
| 7  | 1           | 2              | 2          | A           |
 ---- ------------- ---------------- ------------ ------------- 

Cluster_ID is a counter for a customer, that only counts up on a new ProductType (so all rows for a Customer, with the same ProductType, have the same Cluster_ID)
Sub_Cluster_ID is a simple counter, that counts up the entries for a ProductType and Customer (so all rows for a Customer, with the same Cluster_ID, have a incrementing Sub_Cluster_ID)

I tried using ROW_NUMBER, but I could not get it to work

Here some of my failed attempts:

  • This seems to works for the Sub_Cluster_ID, but not for the Cluster_ID.

    SELECT  [ID]
        , ROW_NUMBER() OVER (Partition by CustomerID ORDER BY CustomerID, ProductType) Cluster_ID
        , ROW_NUMBER() OVER (Partition by CustomerID, ProductType ORDER BY CustomerID, ProductType) Sub_Cluster_ID
        , [CustomerID]
        , [ProductType]
    FROM dbo.Table
    
  • This seems is almost there, but the Cluster_ID is still not correct

    SELECT  [ID]
        , ROW_NUMBER() OVER (Partition by CustomerID ORDER BY CustomerID) Cluster_ID
        , ROW_NUMBER() OVER (Partition by CustomerID, ProductType ORDER BY CustomerID, ProductType) Sub_Cluster_ID
        , [CustomerID]
        , [ProductType]
    FROM dbo.Table  
    

Almost there:
(but the clusterID is still not matching the needed Output)

  ---- ------------ ---------------- ------------ ------------- 
 | ID | Cluser_ID  | Sub_Cluster_ID | CustomerID | ProductType |
  ---- ------------ ---------------- ------------ ------------- 
 | 1  | 1          | 1              | 1          | A           |
 | 2  | 2          | 2              | 1          | A           |
 | 3  | 3          | 1              | 1          | B           |
 | 4  | 4          | 2              | 1          | B           |
 | 5  | 5          | 3              | 1          | B           |
 | 6  | 1          | 1              | 2          | A           |
 | 7  | 2          | 2              | 2          | A           |
  ---- ------------ ---------------- ------------ ------------- 

Is it anyway possible? Or do I need subqueries?

CodePudding user response:

Seems like you need dense_rank() instead of row_number() for Cluster_ID:

select ID,
dense_rank() over(partition by CustomerID order by CustomerID, ProductType) as Cluster_ID,
ROW_NUMBER() OVER (Partition by CustomerID, ProductType ORDER BY CustomerID, ProductType) Sub_Cluster_ID,
CustomerID,
ProductType
from table_name;

Fiddle

CodePudding user response:

You need DENSE_RANK for this.

However, it makes no sense to partition and order by the same columns like in the other answer. So you can either use an arbitrary ordering for the ROW_NUMBER

SELECT
  ID,
  DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY ProductType) as Cluster_ID,
  ROW_NUMBER() OVER (PARTITION BY CustomerID, ProductType ORDER BY (SELECT 1)) Sub_Cluster_ID,
  CustomerID,
  ProductType
FROM table_name;

Or you can order by another column such as ID

SELECT
  ID,
  DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY ProductType) as Cluster_ID,
  ROW_NUMBER() OVER (PARTITION BY CustomerID, ProductType ORDER BY ID) Sub_Cluster_ID,
  CustomerID,
  ProductType
FROM table_name;

db<>fiddle

  •  Tags:  
  • Related