整个表
target_table
大概有14000W条记录,有个主键id
,需要去重的字段是mac
.
-
查看重复的记录数
SELECT min( id ) AS mid, mac, count ( * ) FROM target_table GROUP BY mac HAVING count( mac ) > 1;
整个表重复的数据2w+
-
创建临时表
CREATE TABLE dup_tmp ( SELECT min( id ) AS mid, mac, count ( * ) FROM target_table GROUP BY mac HAVING count( mac ) > 1 );
将重复的mac筛选到临时表
-
删除多余的数据保留最小的id记录
DELETE target FROM target_table AS target JOIN dup_tmp AS tmp ON target.mac = tmp.mac WHERE target.id > tmp.mid;
刚开始的时候mac字段没有建索引,删除很慢。建了索引删除就很快了。