导读:在数据库的处理工作,会遇到重复记录的问题,会影响数据的分析结果的准确性,今天我们探讨一下在数据库中对重复记录的操作;
示例数据如下我们可以看到一个name有两条记录,有的是英语成绩不一样(zhangsan),有的是math成绩不一样(wangwu),有的是两个都不一样(zhaoliu),有的是记录完全一致(lisi),针对不同的记录重复情况有不同的处理办法。(原则为保留该科目最高成绩为最终成绩,即删除成绩较低的行,记录相同保留最近一条记录,number更大的,记录都不相同则取数学成绩高的记录)
首先要找出重复的记录
SELECT name FROM repeat_nums GROUP BY name HAVING COUNT(1) >1
1.处理两条记录完全一致的情况
①找出要删除的行
SELECT * FROM repeat_nums
WHERE name in (SELECT name FROM repeat_nums GROUP BY name ,math,english HAVING COUNT(1) >1)
AND number in (SELECT min(number) FROM repeat_nums GROUP BY name HAVING COUNT(1) >1)
② 删除 把 select 改为delete
2.处理有一个字段重复的情况
SELECT * FROM repeat_nums
WHERE CAST(name as VARCHAR ) + CAST(math as VARCHAR)+ CAST(english as VARCHAR)
in (SELECT CAST(name as VARCHAR ) +CAST(min(math) as VARCHAR)+ CAST(min (english) as VARCHAR) FROM repeat_nums GROUP BY name HAVING COUNT(1) >1)
①找出要删除的行
② 删除 把 select 改为delete
3.处理两个字段都不一致的情况(选择数学成绩好的记录)
①找出要删除的行(name)
SELECT name FROM repeat_nums as a
WHERE CAST(math as VARCHAR)+ CAST(english as VARCHAR) not in
(SELECT CAST(min(math) as VARCHAR)+ CAST(min(english) as VARCHAR) FROM repeat_nums GROUP BY name HAVING COUNT(1) >1)
and
CAST(math as VARCHAR)+ CAST(english as VARCHAR) not in
(SELECT CAST(max(math) as VARCHAR)+ CAST(max(english) as VARCHAR) FROM repeat_nums GROUP BY name HAVING COUNT(1) >1)
and
name in (SELECT name FROM repeat_nums GROUP BY name HAVING COUNT(1) >1)
② 删除
DELETE FROM repeat_nums as a WHERE name in
(SELECT name FROM repeat_nums as a
WHERE CAST(math as VARCHAR)+ CAST(english as VARCHAR) not in
(SELECT CAST(min(math) as VARCHAR)+ CAST(min(english) as VARCHAR) FROM repeat_nums GROUP BY name HAVING COUNT(1) >1)
and
CAST(math as VARCHAR)+ CAST(english as VARCHAR) not in
(SELECT CAST(max(math) as VARCHAR)+ CAST(max(english) as VARCHAR) FROM repeat_nums GROUP BY name HAVING COUNT(1) >1)
and
name in (SELECT name FROM repeat_nums GROUP BY name HAVING COUNT(1) >1))
AND MATh in (SELECT min(MATH) FROM repeat_nums GROUP BY name HAVING COUNT(1) >1)