摘要
- InnoDB 行级锁排查
- MDL 元数据锁排查
- 死锁排查
InnoDB ‘行级锁’排查
当反馈有锁等待产生时,第一步可以通过以下命令诊断:
mysql> select locked_table,waiting_pid,waiting_query,blocking_trx_id,blocking_pid from sys.innodb_lock_waits;
+--------------+-------------+------------------------+-----------------+--------------+
| locked_table | waiting_pid | waiting_query | blocking_trx_id | blocking_pid |
+--------------+-------------+------------------------+-----------------+--------------+
| `test`.`t` | 250423 | insert into t select 1 | 2338 | 250422 |
+--------------+-------------+------------------------+-----------------+--------------+
# MySQL 5.7.x 高版本中语句为
mysql> select locked_table,waiting_pid,waiting_query,blocking_trx_id,blocking_pid from sys.x$innodb_lock_waits;
可以根据查找到的thread ID定位线程正在执行的操作
mysql> select * from information_schema.processlist where Command != 'Sleep';
+--------+------+--------------------+------+------------------+---------+---------------------------------------------------------------+-----------------------------------------------------------------------+
| ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO |
+--------+------+--------------------+------+------------------+---------+---------------------------------------------------------------+-----------------------------------------------------------------------+
| 250403 | root | localhost | sys | Query | 0 | executing | select * from information_schema.processlist where Command != 'Sleep' |
| 64312 | repl | 10.186.65.33:53714 | NULL | Binlog Dump GTID | 2506984 | Master has sent all binlog to slave; waiting for more updates | NULL |
+--------+------+--------------------+------+------------------+---------+---------------------------------------------------------------+-----------------------------------------------------------------------+
sys.innodb_lock_waits表结构
mysql> desc sys.innodb_lock_waits;
+------------------------------+---------------------+------+-----+---------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------------+---------------------+------+-----+---------------------+-------+
| wait_started | datetime | YES | | NULL | |
| wait_age | time | YES | | NULL | |
| wait_age_secs | bigint(21) | YES | | NULL | |
| locked_table | varchar(1024) | NO | | | |
| locked_index | varchar(1024) | YES | | NULL | |
| locked_type | varchar(32) | NO | | | |
| waiting_trx_id | varchar(18) | NO | | | |
| waiting_trx_started | datetime | NO | | 0000-00-00 00:00:00 | |
| waiting_trx_age | time | YES | | NULL | |
| waiting_trx_rows_locked | bigint(21) unsigned | NO | | 0 | |
| waiting_trx_rows_modified | bigint(21) unsigned | NO | | 0 | |
| waiting_pid | bigint(21) unsigned | NO | | 0 | |
| waiting_query | longtext | YES | | NULL | |
| waiting_lock_id | varchar(81) | NO | | | |
| waiting_lock_mode | varchar(32) | NO | | | |
| blocking_trx_id | varchar(18) | NO | | | |
| blocking_pid | bigint(21) unsigned | NO | | 0 | |
| blocking_query | longtext | YES | | NULL | |
| blocking_lock_id | varchar(81) | NO | | | |
| blocking_lock_mode | varchar(32) | NO | | | |
| blocking_trx_started | datetime | NO | | 0000-00-00 00:00:00 | |
| blocking_trx_age | time | YES | | NULL | |
| blocking_trx_rows_locked | bigint(21) unsigned | NO | | 0 | |
| blocking_trx_rows_modified | bigint(21) unsigned | NO | | 0 | |
| sql_kill_blocking_query | varchar(32) | YES | | NULL | |
| sql_kill_blocking_connection | varchar(26) | YES | | NULL | |
+------------------------------+---------------------+------+-----+---------------------+-------+
MySQL MDL(元数据锁)排查
背景
-
官方资料
MySQL使用元数据定来管理对数据库对象的并发访问并确保数据一致性。元数据锁定不仅适用于表,还适用于
Schema
,存储程序
(存储过程,函数,触发器和调度事件)和表空间
-
Performance_Schema.metadata_locks
表记录了元数据锁定信息,这对于查看哪些会话持有锁,被阻塞等待锁等等非常有用- 默认情况下该表没有任何记录,原因参考(MDL锁检测开启)
- 谁持有了锁(显示哪些会话拥有当前元数据锁定)
- 已请求但尚未持有的锁(显示哪些会话正在等待哪些元数据锁定)。
- 被死锁检测器杀死的锁请求
- 超时的锁请求和等待请求会话的锁请求被丢弃
-
MDL锁检测开启
- 元数据锁检测使用
wait/lock/metadata/sql/mdl
工具,默认情况下该工具为禁用
状态,也就导致metadata_locks
该表不会有记录 - 启动MDL锁检测
# 方法一:写入my.cnf配置文件,并启动数据库 [mysqld] performance-schema-instrument='wait/lock/metadata/sql/mdl=ON' # 方法二:在数据库运行时控制元数据锁定检测状态,更新setup_instruments表 UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME = 'wait/lock/metadata/sql/mdl';
- 关闭MDL锁检测
UPDATE performance_schema.setup_instruments SET ENABLED = 'NO', TIMED = 'NO' WHERE NAME = 'wait/lock/metadata/sql/mdl';
- 元数据锁检测使用
排查过程
MDL场景
事物隔离级别
当前事物隔离级别为RR (repeatable read),可重复读锁的语句
# session 1
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> lock table test write;
Query OK, 0 rows affected (0.00 sec)
# session 2
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> delete from test where a = 5;
## wait...
# session 3
mysql> show processlist;
+----+-------+-----------+--------------------+---------+------+---------------------------------+------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------+-----------+--------------------+---------+------+---------------------------------+------------------------------+
| 2 | ha_op | localhost | NULL | Sleep | 0 | | NULL |
| 3 | ha_op | localhost | NULL | Sleep | 0 | | NULL |
| 9 | ha_op | localhost | NULL | Sleep | 0 | | NULL |
| 13 | ha_op | localhost | NULL | Sleep | 0 | | NULL |
| 39 | ha_op | localhost | NULL | Sleep | 0 | | NULL |
| 60 | root | localhost | test | Sleep | 78 | | NULL |
| 61 | root | localhost | test | Query | 72 | Waiting for table metadata lock | delete from test where a = 5 |
| 63 | root | localhost | performance_schema | Query | 0 | starting | show processlist |
+----+-------+-----------+--------------------+---------+------+---------------------------------+------------------------------+
8 rows in set (0.00 sec)
排查与分析
- 通过engine innodb status可供查看的信息非常少
------------
TRANSACTIONS
------------
···············
---TRANSACTION 421935543116288, not started
mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
··················
- 通过关联performance_schema库下metadata_locks和threads表可以找到对应的线程ID及锁情况
mysql> select m.*,t.processlist_id as pid from performance_schema.metadata_locks m,performance_schema.threads t where m.OWNER_THREAD_ID = t.THREAD_ID and m.OBJECT_SCHEMA != 'performance_schema';
+-------------+---------------+-------------+-----------------------+----------------------+---------------+-------------+--------+-----------------+----------------+------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE | OWNER_THREAD_ID | OWNER_EVENT_ID | pid |
+-------------+---------------+-------------+-----------------------+----------------------+---------------+-------------+--------+-----------------+----------------+------+
| SCHEMA | test | NULL | 140458585528672 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | | 87 | 52 | 60 |
| TABLE | test | test | 140458585457360 | SHARED_NO_READ_WRITE | TRANSACTION | GRANTED | | 87 | 52 | 60 |
| TABLE | test | test | 140459197873008 | SHARED_WRITE | TRANSACTION | PENDING | | 88 | 25 | 61 |
+-------------+---------------+-------------+-----------------------+----------------------+---------------+-------------+--------+-----------------+----------------+------+
3 rows in set (0.00 sec)
# 并且可以通过对应线程id,与sys库中session表中线程id找到其processlist—id、正在执行的状态、以及该线程最后一次执行的语句:
select * from sys.session s,performance_schema.metadata_locks m where s.thd_id = m.THREAD_ID;
-
与InnoDB引擎类似的是如lock table xx write 这样的SQL语句:
- 其首先会对
GLOBAL
级别加INTENTION_EXCLUSIVE锁(意向排它锁
) - 再对
SCHEMA
级别加INTENTION_EXCLUSIVE锁 - 最后对
TABLE
级别加SHARED_NO_READ_WRITE锁
(持有该锁可访问表结构并且读写表数据,并且禁止其它事务读写)
- 其首先会对
-
根据上图反馈信息可知
- LOCK_STATUS处于
PENDING
状态的线程正在等待MDL - 持有
SHARED_NO_READ_WRITE
锁及处于GRANTED
状态的线程持有了对象为test表的表级
MDL锁。
- LOCK_STATUS处于
解决办法
表级
元数据锁可以等待其操作完成,或者unlock table xxx
释放的方式进行解锁,但在特殊情况下也可以通过 kill PID操作进行释放。
死锁排查
死锁场景
事物隔离级别
当前事物隔离级别为RR (repeatable read),可重复读
表结构及表数据
# 表结构信息
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_a` (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=16 DEFAULT CHARSET=utf8mb4
# 表数据
+----+------+------+
| id | a | b |
+----+------+------+
| 1 | 1 | 2 |
| 2 | 2 | 3 |
| 3 | 4 | 5 |
| 4 | 3 | 5 |
| 6 | 6 | 5 |
| 15 | 5 | 10 |
+----+------+------+
死锁操作顺序:
T1 | T2 |
---|---|
begin; | - |
delete from test where a = 5; | begin; |
- | delete from test where a = 5; |
insert into test select null,5,12; | - |
- | ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction |
操作执行顺序从上到下依次执行
事务1 :mysql> start transaction;
事务2: mysql> start transaction ;
事务1:mysql> delete from test where a=2;
事务2:mysql> delete from test where a=2; #wait
事务1:mysql> insert into test select 10,2;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarti ng transaction
innodb引擎死锁日志分析
使用show engine innodb status\G查看死锁日志如下
------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-03-08 09:31:16 0x7f73700a2700
*** (1) TRANSACTION:
TRANSACTION 13420, ACTIVE 30 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 12, OS thread handle 140133777950464, query id 3001 localhost root updating
delete from test where a = 5
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 3685 page no 4 n bits 80 index idx_a of table `test`.`test` trx id 13420 lock_mode X waiting
Record lock, heap no 8 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000005; asc ;;
1: len 4; hex 8000000f; asc ;;
*** (2) TRANSACTION:
TRANSACTION 13415, ACTIVE 40 sec inserting
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 2
MySQL thread id 11, OS thread handle 140133777680128, query id 3002 localhost root executing
insert into test select null,5,12
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 3685 page no 4 n bits 80 index idx_a of table `test`.`test` trx id 13415 lock_mode X locks rec but not gap
Record lock, heap no 8 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000005; asc ;;
1: len 4; hex 8000000f; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 3685 page no 4 n bits 80 index idx_a of table `test`.`test` trx id 13415 lock mode S waiting
Record lock, heap no 8 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000005; asc ;;
1: len 4; hex 8000000f; asc ;;
*** WE ROLL BACK TRANSACTION (1)
engine status日志分析详细过程
*** (1) TRANSACTION: #事务2
TRANSACTION 13420, ACTIVE 30 sec starting index read
事务编号为 13420
,活跃30秒,starting index read
表示事务状态为根据索引读取数据。常见的其他状态:
-
fetching rows
表示事务状态在row_search_for_mysql中被设置,表示正在查找记录。 -
updating or deleting
表示事务已经真正进入了Update/delete的函数逻辑(row_update_for_mysql) -
thread declared inside
InnoDB 说明事务已经进入innodb层。通常而言 不在innodb层的事务大部分是会被回滚的。
mysql tables in use 1, locked 1
说明当前的事务使用一个表。locked 1 表示表上有一个意向表锁,对于DML语句为LOCK_IX
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
LOCK WAIT表示正在等待锁, 2 lock struct(s) 表示trx->trx_locks锁链表的长度为2,每个链表节点代表该事务持有的一个锁结构,包括表锁、记录锁以及auto_inc锁等。这里 2 locks 表示IX锁和lock_mode X(Next-key lock)#InnoDB对于行的查询都是采用了Next-Key Lock的算法,锁定的不是单个值,而是一个范围(GAP)
heap size 1136 表示事务分配的锁堆内存大小,一般没有什么具体的用处。
1 row lock(s) 表示当前事务持有的行记录锁/gap 锁的个数。
delete from ty where a=5
表示事务2在执行的sql ,不过当前的MySQL版本是show engine innodb status 是查看不到完整的事务的sql的,通常显示当前正在等待锁的sql
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 3685 page no 4 n bits 80 index idx_a of table `test`.`test` trx id 13420 lock_mode X waiting
Record lock, heap no 8 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000005; asc ;;
1: len 4; hex 8000000f; asc ;;
RECORD LOCKS 表示记录锁,space id为3685,page号为4 ,n bits 80 表示这个聚集索引记录锁结构上留有80个Bit位
表示事务2 正在等待表 test 上的索引 idx_a
的 X 锁其实该场景是是Next-Key lock
事务1的部分日志和上面分析类似,不再赘述,其它日志分析如下
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 3685 page no 4 n bits 80 index idx_a of table `test`.`test` trx id 13415 lock_mode X locks rec but not gap
Record lock, heap no 8 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000005; asc ;;
1: len 4; hex 8000000f; asc ;;
显示了事务1
持有了lock_mode X locks rec but not gap
该锁模式 ,不过我们从日志里面看不到事务1
执行的 delete from test where a = 5 语句;这点也是造成仅仅根据engine status日志难以分析死锁的问题的根本原因。
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 3685 page no 4 n bits 80 index idx_a of table `test`.`test` trx id 13415 lock mode S waiting
Record lock, heap no 8 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000005; asc ;;
1: len 4; hex 8000000f; asc ;;
表示事物1
的insert语句正在等待lock mode S waiting
*** WE ROLL BACK TRANSACTION (1)
表示innodb引擎执行回滚事物2
的操作
日志中的锁组合模式
上文分析中需要注意的是锁组合模式,类似lock_mode X waiting ,lock_mode X,lock_mode X locks gap before rec insert intention waiting 这是分析死锁的核心重点。如何理解锁组合?
-
首先我们要知道对于MySQL有两种常规锁模式
- LOCK_S(读锁,共享锁)
- LOCK_X(写锁,排它锁)
- 最容易理解的锁模式,读加共享锁,写加排它锁.
-
锁的属性有如下几种
- LOCK_REC_NOT_GAP (锁记录)
- LOCK_GAP (锁记录前的GAP,开区间)
- LOCK_ORDINARY (同时锁记录+记录前的GAP 。即 Next Key锁,前开后闭区间)
- LOCK_INSERT_INTENTION (插入意向锁,其实是特殊的GAP锁)
-
锁的属性可以与锁模式任意组合。例如.
- lock->type_mode 可以是Lock_X 或者Lock_S
- locks gap before rec 表示为gap锁:lock->type_mode & LOCK_GAP
- locks rec but not gap 表示为记录锁,非gap锁:lock->type_mode & LOCK_REC_NOT_GAP
- insert intention 表示为插入意向锁:lock->type_mode & LOCK_INSERT_INTENTION
- waiting 表示锁等待:lock->type_mode & LOCK_WAIT
分析结果
- 通过日志分析可以知道
- 事务1是TRANSACTION 13415,事务2是RANSACTION 13420
- 死锁发生时,事务1处于inserting状态,事务2处于waiting lock的delete状态
- 事务1在执行delete语句时会持有 X Locks rec but not gap(混合模式写锁)
- 事务2在执行delete语句时会进入X lock waiting状态(等待写锁)
- 事务1在执行insert语句时由于
a
字段是唯一索引,要做duplicate-key检测,检查时需要申请S Lock。那么,insert时由于事务2执行delete时申请的X lock还在等待中,事务1申请的S Lock也需要在X Lock waiting后等待,而这时事务2又持有X Lock锁无法释放,如此事务1和事务2陷入了循环等待,死锁出现。
- 如果将上面的死锁操作中的事务1的insert操作换成:update test set b =10 where a=5; 其engine status 日志如下
*** (2) TRANSACTION:
TRANSACTION 13422, ACTIVE 7 sec starting index read
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140133777680128, query id 3010 localhost root updating
update test set b =10 where a=5
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 3685 page no 4 n bits 80 index idx_a of table `test`.`test` trx id 13422 lock_mode X locks rec but not gap
Record lock, heap no 8 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000005; asc ;;
1: len 4; hex 8000000f; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 3685 page no 4 n bits 80 index idx_a of table `test`.`test` trx id 13422 lock_mode X waiting
Record lock, heap no 8 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000005; asc ;;
1: len 4; hex 8000000f; asc ;;
可以发现TRANSACTION 13422(事务1)中的waiting for this lock to be granted 中的锁是X Lock waiting
通过分析可以知道:
- 对于唯一性索引执行insert 操作时会进行duplicate-key检测,并在检测中为了数据不会被改变施加 S Lock锁
死锁相关策略
当MySQL出现死锁,有以下策略
- innodb_lock_wait_timeout 设置超时参数,默认50s,当出现死锁后第一个被锁住的线程过50s后超时退出
- innodb_deadlock_detect 设置死锁检测,默认on开启,发现死锁后,主动回滚死锁链条中某一个事物,让其它事物得以继续执行。
- 最终,减少死锁的主要方向,还是控制访问相同资源的并发事务量
相关技术点
死锁分析需要涉及的技术知识点:
- 业务逻辑sql及执行场景
- 事物隔离级别
- MySQL的锁机制
- 各种锁之间的兼容性
- 最好能了解源码