mysql慢查询导致的CPU打满

异常现象

大量的慢SQL导致8核CPU全部打满,解决前后CPU如图


image.png

实例基础信息:

  • 数据库版本:MariaDB10.0.27
  • 硬件信息:8核cpu+16G内存+500GSSD
  • 数据库中数据量> innodb buffer pool配置
  • 表结构
CREATE TABLE `tablename` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `gid` varchar(36) NOT NULL COMMENT '全局标识',
  `create_time` int(11) NOT NULL COMMENT '创建时间',
  `update_time` int(11) NOT NULL COMMENT '更新时间',
  `biz_type` varchar(16) DEFAULT NULL COMMENT '业务号',
  `biz_chnl` int(11) NOT NULL DEFAULT '-1' COMMENT '业务渠道编号',
  `user_gid` char(36) NOT NULL COMMENT '用户全局唯一标识',
  `trans_gid` char(36) NOT NULL COMMENT '交易唯一标识,消息类型与gid关系为:0:额度变更记录gid, 1:借款gid, 2:还款gid',
  `msg_type` int(11) NOT NULL COMMENT '消息类型:0:用户额度变更,1:借款状态变更,2:还款状态变更',
  `msg_status` int(11) NOT NULL COMMENT '消息状态:0:处理中,1:处理成功,2:处理失败',
  `msg_content` varchar(4096) NOT NULL COMMENT '消息内容JSON格式',
  `return_code` varchar(16) DEFAULT NULL COMMENT '返回状态码',
  `return_desc` varchar(4000) DEFAULT NULL COMMENT '状态描述',
  `trans_return_code` varchar(16) DEFAULT NULL COMMENT '交易状态码',
  `trans_return_desc` varchar(255) DEFAULT NULL COMMENT '交易结果描述',
  `retry_next_time` int(11) NOT NULL DEFAULT '0' COMMENT '下次重试时间',
  `retry_num` int(11) NOT NULL DEFAULT '0' COMMENT '重试次数',
  `status` int(11) NOT NULL DEFAULT '0' COMMENT '发送状态:0 准备就绪,1 发送成功, 2 发送失败,3 结果未知',
  `is_valid` bit(1) NOT NULL DEFAULT b'1' COMMENT '是否有效',
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_gid` (`gid`),
  KEY `idx_create_time` (`create_time`),
  KEY `idx_update_time` (`update_time`),
  KEY `idx_status` (`status`),
  KEY `idx_bizt_chnl` (`biz_type`,`biz_chnl`),
  KEY `idx_return_trans_code` (`return_code`,`trans_return_code`),
  KEY `idx_trans_gid` (`trans_gid`) USING BTREE,
  KEY `idx_user_gid` (`user_gid`)
) ENGINE=InnoDB AUTO_INCREMENT=9630777 DEFAULT CHARSET=utf8 COMMENT='x x x x'
  • 表数据量
MariaDB [sailfish]> select count(*) from tablename;
+----------+
| count(*) |
+----------+
|  9630917 |
+----------+    

修改SQL

-开发原始SQL

MariaDB [sailfish]> SELECT id, gid,create_time,update_time,biz_type,biz_chnl,user_gid,trans_gid,msg_type,msg_status,msg_content,return_code,return_desc,trans_return_code,trans_return_desc,retry_next_time,retry_num,status,is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511333699 ) order by id limit 500;
Empty set (15.40 sec)   
MariaDB [sailfish]> explain SELECT id, gid,create_time,update_time,biz_type,biz_chnl,user_gid,trans_gid,msg_type,msg_status,msg_content,return_code,return_desc,trans_return_code,trans_return_desc,retry_next_time,retry_num,status,is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511333699 ) order by id limit 500;
+------+-------------+--------------------+-------+--------------------------+---------+---------+------+------+-------------+
| id   | select_type | table              | type  | possible_keys            | key     | key_len | ref  | rows | Extra       |
+------+-------------+--------------------+-------+--------------------------+---------+---------+------+------+-------------+
|    1 | SIMPLE      | tablename | index | idx_status,idx_bizt_chnl | PRIMARY | 4       | NULL | 6578 | Using where |
+------+-------------+--------------------+-------+--------------------------+---------+---------+------+------+-------------+ 

-修改SQL去掉limit 500

MariaDB [sailfish]> SELECT id, gid,create_time,update_time,biz_type,biz_chnl,user_gid,trans_gid,msg_type,msg_status,msg_content,return_code,return_desc,trans_return_code,trans_return_desc,retry_next_time,retry_num,status,is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511333699 ) order by id;
Empty set (1.86 sec) 
MariaDB [sailfish]> explain SELECT id, gid,create_time,update_time,biz_type,biz_chnl,user_gid,trans_gid,msg_type,msg_status,msg_content,return_code,return_desc,trans_return_code,trans_return_desc,retry_next_time,retry_num,status,is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511333699 ) order by id;
+------+-------------+--------------------+------+--------------------------+---------------+---------+-------+--------+----------------------------------------------------+
| id   | select_type | table              | type | possible_keys            | key           | key_len | ref   | rows   | Extra                                              |
+------+-------------+--------------------+------+--------------------------+---------------+---------+-------+--------+----------------------------------------------------+
|    1 | SIMPLE      | tablename | ref  | idx_status,idx_bizt_chnl | idx_bizt_chnl | 51      | const | 708040 | Using index condition; Using where; Using filesort |
+------+-------------+--------------------+------+--------------------------+---------------+---------+-------+--------+----------------------------------------------------+ 
  • 修改SQL去掉order by limit 500(第一次解决方案,刚开始有效果,一周后查询时间又逐渐增长,cpu又打满,和之前一样)
MariaDB [sailfish]> SELECT id, gid,create_time,update_time,biz_type,biz_chnl,user_gid,trans_gid,msg_type,msg_status,msg_content,return_code,return_desc,trans_return_code,trans_return_desc,retry_next_time,retry_num,status,is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511333699 );
Empty set (2.18 sec)

MariaDB [sailfish]> explain SELECT id, gid,create_time,update_time,biz_type,biz_chnl,user_gid,trans_gid,msg_type,msg_status,msg_content,return_code,return_desc,trans_return_code,trans_return_desc,retry_next_time,retry_num,status,is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511333699 );
+------+-------------+--------------------+------+--------------------------+---------------+---------+-------+--------+------------------------------------+
| id   | select_type | table              | type | possible_keys            | key           | key_len | ref   | rows   | Extra                              |
+------+-------------+--------------------+------+--------------------------+---------------+---------+-------+--------+------------------------------------+
|    1 | SIMPLE      | tablename | ref  | idx_status,idx_bizt_chnl | idx_bizt_chnl | 51      | const | 708040 | Using index condition; Using where |
+------+-------------+--------------------+------+--------------------------+---------------+---------+-------+--------+------------------------------------+           

索引更改

  • 根据where条件增加组合索引前
MariaDB [test]> SELECT id, gid, create_time, update_time, biz_type, biz_chnl, user_gid, trans_gid, msg_type, msg_status, msg_content, return_code, return_desc, trans_return_code, trans_return_desc, retry_next_time, retry_num, status, is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511483534 );
Empty set (13.34 sec)

MariaDB [test]> explain SELECT id, gid, create_time, update_time, biz_type, biz_chnl, user_gid, trans_gid, msg_type, msg_status, msg_content, return_code, return_desc, trans_return_code, trans_return_desc, retry_next_time, retry_num, status, is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511483534 );
+------+-------------+--------------------+------+----------------------------------------------+---------------+---------+-------+--------+------------------------------------+
| id   | select_type | table              | type | possible_keys                                | key           | key_len | ref   | rows   | Extra                              |
+------+-------------+--------------------+------+----------------------------------------------+---------------+---------+-------+--------+------------------------------------+
|    1 | SIMPLE      | tablename | ref  | idx_status,idx_bizt_chnl,idx_retry_next_time | idx_bizt_chnl | 51      | const | 770126 | Using index condition; Using where |
+------+-------------+--------------------+------+----------------------------------------------+---------------+---------+-------+--------+------------------------------------+
1 row in set (0.06 sec)
                                         
  • 根据where条件增加组合索引后
pt-online-schema-change -uroot -pxxxxxx --charset=utf8 --alter="add index idx_multi(`status`,`biz_type`)" --dry-run --nocheck-replication-filters --recursion-method=none --print D=sailfish,t= tablename 

MariaDB [test]> SELECT id, gid, create_time, update_time, biz_type, biz_chnl, user_gid, trans_gid, msg_type, msg_status, msg_content, return_code, return_desc, trans_return_code, trans_return_desc, retry_next_time, retry_num, status, is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511483534 );
Empty set (0.06 sec)

MariaDB [test]> explain SELECT id, gid, create_time, update_time, biz_type, biz_chnl, user_gid, trans_gid, msg_type, msg_status, msg_content, return_code, return_desc, trans_return_code, trans_return_desc, retry_next_time, retry_num, status, is_valid FROM tablename WHERE ( status = 2 AND biz_type = 'swqian' AND retry_next_time < 1511483534 );
+------+-------------+--------------------+------+--------------------------------------------------------+-----------+---------+-------------+------+------------------------------------+
| id   | select_type | table              | type | possible_keys                                          | key       | key_len | ref         | rows | Extra                              |
+------+-------------+--------------------+------+--------------------------------------------------------+-----------+---------+-------------+------+------------------------------------+
|    1 | SIMPLE      | tablename | ref  | idx_status,idx_bizt_chnl,idx_retry_next_time,idx_multi | idx_multi | 55      | const,const |    1 | Using index condition; Using where |
+------+-------------+--------------------+------+--------------------------------------------------------+-----------+---------+-------------+------+------------------------------------+
1 row in set (0.01 sec)                  

增加索引时遇到的问题

  • 使用pt-online-schema-change更改表结构时,因为当时有40多条慢查询跑着,pt-online-schema-change被阻塞,等待2分钟效果很差,中建表只有几百条数据,停止pt-online-schema-change并删除中间表、触发器
  • 与开发沟通这40多条慢SQL可以临时注释,结合开发、运维注释掉该功能
  • 使用pt-online-schema-change更改表结构,用时10分钟,重新恢复业务,cpu趋于稳定,如第一张图

定位问题使用的工具

  • pmm监控慢SQL、用户cpu、mysql user statics
  • mariadb server audit审计日志
  • elk显示每秒慢SQL查询量
  • linux:top
  • mysql 信息统计:select * from information_schema.processlist where command='query' order by time\G
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容