本文案例以MySQL5.7作为数据库环境。
重复数据产生的原因有多种,比如系统存在bug、重复提交、需求调整(原来允许重复的内容现在不允许重复了)... 原因就不一一列觉了,这里用实例来分析怎么解决重复数据的问题。
在另一篇《MySQL实战》的用户表中准备以下数据
mysql> select id,username,mobile from t_user;
+-------+----------+-------------+
| id | username | mobile |
+-------+----------+-------------+
| 10001 | user1 | 13900000001 |
| 10002 | user2 | NULL |
| 10003 | user3 | NULL |
| 10004 | user4 | NULL |
| 10005 | user5 | NULL |
| 10006 | user6 | 13900000001 |
+-------+----------+-------------+
现在需要检查用户表中手机号mobile重复的数据,可以利用聚合函数count()按mobile字段group by找到需要的结果。
# 查询找到出现重复的手机号
mysql> select mobile,count(1) as c from t_user where mobile is not null group by mobile having c > 1;
+-------------+---+
| mobile | c |
+-------------+---+
| 13900000001 | 2 |
+-------------+---+
接下来根据需要对重复的手机号进行处理,比如将id较大的记录中的手机号设为null。
我们按照要求一步一步来完善上面的sql,既然要对id较大的记录处理,那么久需要找到id最小的记录
# mim(id)
mysql> select mobile,count(1) as c,min(id) as min_id from t_user where mobile is not null group by mobile having c > 1;
+-------------+---+--------+
| mobile | c | min_id |
+-------------+---+--------+
| 13900000001 | 2 | 10001 |
+-------------+---+--------+
找到最小id后,将t_user与查询结果join,执行update动作。
# update ... where id > ...
mysql> update t_user as u
-> join (
-> select mobile,count(1) as c,min(id) as min_id from t_user where mobile is not null group by mobile having c > 1
-> ) as a on u.mobile = a.mobile
-> set mobile = null
-> where u.id > a.min_id;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
提示执行成功,最后检查下是否达到预期效果。
# 查询是否存在mobile重复的记录
mysql> select mobile,count(1) as c from t_user where mobile is not null group by mobile having c > 1;
Empty set (0.00 sec)
# 再通过直观方式再次验证
mysql> select id,username,mobile from t_user;
+-------+----------+-------------+
| id | username | mobile |
+-------+----------+-------------+
| 10001 | user1 | 13900000001 |
| 10002 | user2 | NULL |
| 10003 | user3 | NULL |
| 10004 | user4 | NULL |
| 10005 | user5 | NULL |
| 10006 | user6 | NULL |
+-------+----------+-------------+
6 rows in set (0.00 sec)