背景
服务器上安装mysql,收到cpu 100%报警,登上去查看top,很明显是mysql导致cpu达到100%
排查过程
1 但是因为有多个系统连接这个数据库,不知道是哪个数据库导致
一般来说都是因为慢查询,找到慢查询日志查看,看到一些慢查询但是并没有特别高的,怀疑一个程序导致,但是当发现cpu 100%的时候已经将tomcat的调用程序关闭,没有新的mysql进来了
2
mysql> show processlist;
---------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+---------+-----------+------------------+---------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| 2517261 | testUser | 192.168.1.201:57308 | testt | Query | 3856 | Copying to tmp table | SELECT count(0) FROM (SELECT users.id users_id, users.userName, users.loginName, users.uqq, users.um |
| 2517262 | testUser | 192.168.1.201:57312 | testt | Query | 3834 | Copying to tmp table | SELECT count(0) FROM (SELECT users.id users_id, users.userName, users.loginName, users.uqq, users.um |
| 2517265 | testUser | 192.168.1.201:57336 | testt | Query | 3745 | Copying to tmp table | SELECT count(0) FROM (SELECT users.id users_id, users.userName, users.loginName, users.uqq, users.um |
| 2517270 | testUser | 192.168.1.201:57438 | testt | Query | 3716 | Copying to tmp table | SELECT count(0) FROM (SELECT users.id users_id, users.userName, users.loginName, users.uqq, users.um |
| 2517271 | testUser | 192.168.1.201:57470 | testt | Query | 3699 | Copying to tmp table | SELECT count(0) FROM (SELECT users.id users_id, users.userName, users.loginName, users.uqq, users.um |
| 2517272 | testUser | 192.168.1.201:57490 | testt | Query | 3657 | Copying to tmp table | SELECT count(0) FROM (SELECT users.id users_id, users.userName, users.loginName, users.uqq, users.um |
| 2517273 | testUser | 192.168.1.201:57538 | testt | Query | 3655 | Copying to tmp table | SELECT count(0) FROM (SELECT users.id users_id, users.userName, users.loginName, users.uqq, users.um |
| 2517274 | testUser | 192.168.1.201:57542 | testt | Query | 3602 | Copying to tmp table | SELECT count(0) FROM (SELECT users.id users_id, users.userName, users.loginName, users.uqq, users.um |
| 2517279 | testUser | 192.168.1.201:57686 | testt | Query | 3462 | Copying to tmp table | SELECT count(0) FROM (SELECT users.id users_id, users.userName, users.loginName, users.uqq, users.um |
| 2517316 | testUser | 192.168.1.201:58604 | testt | Query | 0 | NULL | show processlist |
| 2517317 | testUser | 192.168.1.201:58610 | testt | Sleep | 2631 | | NULL |
+---------+-----------+------------------+---------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
11 rows in set (0.14 sec)
执行 show processlist;
因为我的用户只有testt的权限,所以只能看到这个权限下的进程,如果你是root可以查看到所有用户的进程。
3 正好怀疑是这个用户连接导致的,所以把这些进程杀掉
mysql> kill 2517262;
Query OK, 0 rows affected (0.04 sec)
mysql> kill 2517265;
Query OK, 0 rows affected (0.04 sec)
mysql> kill 2517270;
Query OK, 0 rows affected (0.04 sec)
mysql> kill 2517271;
Query OK, 0 rows affected (0.04 sec)
mysql> kill 2517272;
Query OK, 0 rows affected (0.04 sec)
mysql> kill 2517273;
Query OK, 0 rows affected (0.04 sec)
mysql> kill 2517274;
Query OK, 0 rows affected (0.04 sec)
kill后慢查询日志中有多个输出
# User@Host: testUser[testUser] @ [192.168.1.201]
# Query_time: 3977.501006 Lock_time: 0.000203 Rows_sent: 0 Rows_examined: 0
use testt;
SET timestamp=1588142826;
SELECT count(0) FROM (SELECT users.id users_id, users.userName, users.loginName, users.uqq, users.umobile, count(con.users_id) conCount, MAX(con.date) date FROM theuser.users users LEFT JOIN shopmm.consignee con ON (con.userName = users.userName OR con.suserName = users.userName) LEFT JOIN shopmm.consigneesub consub ON con.id = consub.con_id WHERE 1 = 1 AND con.static = 4 AND (con.isfrom = 1 OR (con.isfrom = 2 AND con.ishopfrom = 1)) GROUP BY users.loginName) table_count;
# Time: 200429 14:47:19
# User@Host: testUser[testUser] @ [192.168.1.201]
# Query_time: 3968.516205 Lock_time: 0.000118 Rows_sent: 0 Rows_examined: 0
SET timestamp=1588142839;
SELECT count(0) FROM (SELECT users.id users_id, users.userName, users.loginName, users.uqq, users.umobile, count(con.users_id) conCount, MAX(con.date) date FROM theuser.users users LEFT JOIN shopmm.consignee con ON (con.userName = users.userName OR con.suserName = users.userName) LEFT JOIN shopmm.consigneesub consub ON con.id = consub.con_id WHERE 1 = 1 AND con.static = 4 AND (con.isfrom = 1 OR (con.isfrom = 2 AND con.ishopfrom = 1)) GROUP BY users.loginName) table_count;
原来是这条语句导致,让开发优化语句即可
4 查看show processlist时的状态
Copying to tmp table代表临时结果集太大,超过数据库规定的临时内存大小,需要拷贝临时结果集到磁盘上
查看/etc/my.ini配置中tmp_table_size = 64M
所以判定应该时查询出来的数据过大导致消耗cpu过多,我这个都执行了1h还没执行完,后来还是kill进程,cpu才恢复了
参考:
https://blog.csdn.net/bolg_hero/article/details/70576516
https://dev.mysql.com/doc/refman/8.0/en/show-processlist.html