I have a table with columns Machine, Product and Sources:
| Machine | Product | Sources |
|---|---|---|
| M3 | H | cmdd6 |
| M3 | H | 91 |
| M3 | H | cmdd3 |
| M4 | I | cmdd7 |
| M4 | J | cmdd7 |
| M4 | B | 827 |
| M4 | B | cmdd7 |
In the above table where Machine is M3 the product is same but the Sources column has multiple intake. So the requirement is to remove the duplicate rows where Sources should always be 'cmdd' in ascending order.
For example if there is duplicate with product and sources are different i.e 'cmdd6' or 'cmdd3', then duplicate row should be removed and values would remain with sources 'cmdd3'.
Below is the result table would to like to achieve
| Machine | Product | Sources |
|---|---|---|
| M3 | H | cmdd3 |
| M4 | I | cmdd7 |
| M4 | J | cmdd7 |
| M4 | B | cmdd7 |
Below is the query which I tried to remove duplicates on the values of count >1.
WITH CTE(Machine, Product,Sources, duplicatecount) AS
(
SELECT
Machine, Product, Sources,
ROW_NUMBER() OVER (PARTITION BY Machine, Product
ORDER BY Machine, Sources) AS DuplicateCount
FROM
Concatcleanup
)
DELETE FROM cte
WHERE duplicatecount > 1
Any help is highly appreciated.
CodePudding user response:
You can use one extra crafted field inside the ORDER BY clause ROW_NUMBER window function, to pull "cmdd%"-like values above all the others.
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER(
PARTITION BY Machine, Product
ORDER BY CASE WHEN Sources NOT LIKE 'cmdd%' THEN 1 END,
Sources
) AS DuplicateCount
FROM Concatcleanup
)
DELETE FROM cte
WHERE DuplicateCount > 1;
Check the demo here.
If you want to avoid the deletion, you can use the notation SELECT ... INTO <new_table> FROM ... and use the code for the cte:
SELECT Machine, Product, Sources,
ROW_NUMBER() OVER(
PARTITION BY Machine, Product
ORDER BY CASE WHEN Sources NOT LIKE 'cmdd%' THEN 1 END,
Sources
) AS DuplicateCount
INTO newtab
FROM Concatcleanup;
Check the demo here.
