Home > database >  Remove Duplicate Based on three Columns MYSQL
Remove Duplicate Based on three Columns MYSQL

Time:02-01

Trying to remove duplicate based on three column like query will not find duplicate based on single column it will concatenate three column then remove duplicate rows based on the merge column.

I would appreciate if someone can share an easiest way of achieving this. I know this is not an appropraiote way but tried and its not working

Select concat(col1, col2, col3,) as newCol distinct newcCol from Table2

I know how to remove the table duplicate based on multiple columns using Excel VBA but do not know how to achieve this using mysql

Sub DelDupl()

    Range("A1").CurrentRegion.RemoveDuplicates Columns:=Array(1, 2, 3), Header:=xlYes

End Sub

Table name is Table2 in Mysql

enter image description here

Sample Data

CREATE TABLE Table2(
col1 INT,
col2 varchar(10),
col3 INT,
col4 varchar(10),
col5 varchar(10),
col6 varchar(10),
col7 varchar(10));
INSERT INTO Table2 (col1,col2,col3,col4,col5,col6,col7)
VALUES ('1','A','123456','data1','data1','data1','data1'),
('2','B','78910','data2','data2','data2','data2'),
('3','C','45698','data3','data3','data3','data3'),
('1','A','123456','data1','data1','data1','data1'),
('2','B','78910','data2','data2','data2','data2'),
('3','C','45698','data3','data3','data3','data3'),
('4','D','85969','data5','data5','data5','data5');

CodePudding user response:

The problem is there is no way to establish which to keep so I suggest you set up a staging table with a compound key, load it , truncate your existing table and load it from the staging table

CodePudding user response:

How about this:

SELECT *
FROM table2 AS t
GROUP BY t.col1, t.col2, t.col3;

GROUP BY is normally used for aggregating functions(count, sum, max, etc), but will do the job for your purpose.

Edit: Ok, since you actually need to delete duplicates, thats a bit more complicated, but it's possible. First we need to somehow discriminate duplicate rows, so we will temporary add a primary key. Then execute delete statement, while joining the table on itself to find duplicated rows. And lastly drop the primary key column we added.

  • add primary key column, we need some discrimination for duplicates
ALTER TABLE `table2` ADD COLUMN id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY;
  • join table on itself to get duplicates and delete them
DELETE t1
FROM `table2` t1
INNER JOIN `table2` t2
ON t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND t1.col3 = t2.col3
WHERE t1.id < t2.id;
  • drop the primary key column
ALTER TABLE `table2` DROP COLUMN id;

As a note, the join is done on the columns which define duplication(col1, col2, col3) in your case. You can execute all 3 queries at once, just be sure you really need this data gone.

  •  Tags:  
  • Related