mysql简单排查&终止死锁线程
INFORMATION_SCHEMA
MySQL把INFORMATION_SCHEMA看作是一个【数据库】
在INFORMATION_SCHEMA中,有若干个只读表。这些只读表实际上是视图而不是基本表,因此,无法看到与之相关的任何文件
在INFORMATION_SCHEMA的这些表中,有3个表可以帮助我们排查死锁
- INNODB_LOCKS:现在获取的锁,但是不含没有获取的锁,而且只是针对INNODB的。
- INNODB_LOCK_WAITS:系统锁等待相关信息,包含了阻塞的一行或者多行的记录,包含锁请求和被阻塞锁请求的锁信息等。
- INNODB_TRX:包含了所有正在执行的事务相关信息(INNODB),而且包含了事务是否被阻塞或者请求锁。
1:查看当前活跃的事务
SELECT * FROM INFORMATION_SCHEMA.INNODB_TRX;
2:查看当前已经获取锁的事务
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCKS;
3:查看当前等待锁的事务
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCK_WAITS;
or
SELECT
waiting_trx_id,
waiting_pid,
waiting_query,
blocking_trx_id,
blocking_pid,
blocking_query
FROM sys.innodb_lock_waits;
找到导致死锁的事务后,直接kill related_mysql_thread_id
结束事务
死锁解决方式
- 预防死锁。例如固定持有锁的先后顺序(该方法不具备普适性),缩小锁的影响范围
To reduce the possibility of deadlocks,
- use transactions rather than LOCK TABLES statements; (缩小锁的影响范围)
- keep transactions that insert or update data small enough that they do not stay open for long periods of time; (缩小锁的影响范围)
- when different transactions update multiple tables or large ranges of rows, use the same order of operations (such as SELECT ... FOR UPDATE) in each transaction; (固定持有锁的先后顺序)
- create indexes on the columns used in SELECT ... FOR UPDATE and UPDATE ... WHERE statements. (缩小锁的影响范围)
The possibility of deadlocks is not affected by the isolation level, because the isolation level changes the behavior of read operations, while deadlocks occur because of write operations.
- 处理死锁。必然需要有一方放弃资源占用,可以依据优先级、等待时间等条件选择放弃资源的一方
更多详见官网文档
https://dev.mysql.com/doc/refman/5.7/en/innodb-information-schema-examples.html
解读死锁日志
InnoDB monitor output里关于锁的描述释义如下
记录锁(LOCK_REC_NOT_GAP): lock_mode X locks rec but not gap
间隙锁(LOCK_GAP): lock_mode X locks gap before rec
Next-key 锁(LOCK_ORNIDARY): lock_mode X
插入意向锁(LOCK_INSERT_INTENTION): lock_mode X locks gap before rec insert intention
example1
> show engine innodb status; // InnoDB monitor output reads as following
------------------------
LATEST DETECTED DEADLOCK
------------------------
*** (1) TRANSACTION: #事务1
TRANSACTION 462308399, ACTIVE 33 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 3525577, OS thread handle 0x7f896cc4b700, query id 780039657 localhost root updating
delete from ty where a=5
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 219 page no 4 n bits 72 index `idxa` of table `test`.`ty` trx id 462308399 lock_mode X waiting
*** (2) TRANSACTION: #事务2
TRANSACTION 462308398, ACTIVE 61 sec inserting, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1184, 4 row lock(s), undo log entries 2
MySQL thread id 3525490, OS thread handle 0x7f78eab82700, query id 780039714 localhost root update
insert into ty (a,b) values(2,10)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 219 page no 4 n bits 72 index `idxa` of table `test`.`ty` trx id 462308398 lock_mode X
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 219 page no 4 n bits 72 index `idxa` of table `test`.`ty` trx id 462308398 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (1)
日志分析
*** (1) TRANSACTION: #事务1
TRANSACTION 462308399, ACTIVE 33 sec starting index read
事务编号为 462308399 ,活跃33秒,starting index read 表示事务状态为根据索引读取数据。常见的其他状态:
fetching rows 表示事务状态在row_search_for_mysql中被设置,正在查找记录。
updating or deleting 表示事务已经真正进入了Update/delete的函数逻辑(row_update_for_mysql)
thread declared inside InnoDB 说明事务已经进入innodb层。通常而言 不在innodb层的事务大部分是会被回滚的
mysql tables in use 1, 说明当前的事务使用一个表。locked 1 表示表上有一个表锁,对应DML语句为LOCK_IX
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
LOCK WAIT表示正在等待锁, 2 lock struct(s) 表示trx->trx_locks(该事务持有的锁的链表)的长度为2,每个链表节点代表该事务持有的一个锁结构,包括表锁,记录锁以及auto_inc锁等
heap size 360 表示事务分配的锁堆内存大小,一般没有什么具体的用处
delete from ty where a=5
表示事务1当前正在等待锁的sql
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:【这里是重点】
RECORD LOCKS space id 219 page no 4 n bits 72 index idxa
of table test
.ty
trx id 462308399 lock_mode X waiting
RECORD LOCKS 表示记录锁, space id为219, page号4 ,n bits 72表示这个聚集索引记录锁结构上留有72个Bit位
表示事务1 正在等待表 ty 上的 索引idxa 的Next-Key lock
事务2的log类似,
*** (2) HOLDS THE LOCK(S):【这里是重点】
RECORD LOCKS space id 219 page no 4 n bits 72 index idxa
of table test
.ty
trx id 462308398 lock_mode X
显示了事务2已经持有了一个Next-key锁,这正是使得事务1日志里面出现lock_mode X waiting
的原因
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:【这里是重点】
RECORD LOCKS space id 219 page no 4 n bits 72 index idxa
of table test
.ty
trx id 462308398 lock_mode X locks gap before rec insert intention waiting
lock_mode X locks gap before rec insert intention waiting
表示事务2的insert语句正在等待插入意向锁
example2
------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-04-22 18:48:20 0x7f52ef1a0700
*** (1) TRANSACTION:
TRANSACTION 3019177082, ACTIVE 916 sec fetching rows
mysql tables in use 3, locked 3
LOCK WAIT 30208 lock struct(s), heap size 3252432, 899952 row lock(s)
MySQL thread id 28804115, OS thread handle 139989662291712, query id 35898224955 11.111.xxx.xxx xxx_console Searching rows for update
UPDATE `host` SET `is_deleted` = 1, `gmt_modified` = '2022-04-22 18:33:04' WHERE (version < '1650620210' and module_id = 7269 and is_deleted = 0)
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 29 page no 42059 n bits 1000 index IDX_host_module_id of table `db`.`host` trx id 3019177082 lock_mode X // 【事务1尝试更新表host的若干记录,通过where条件在`db`.`host`的索引IDX_host_module_id持有module_id = 7269的next-key lock】
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
...
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 29 page no 30452 n bits 96 index PRIMARY of table `db`.`host` trx id 3019177082 lock_mode X locks rec but not gap waiting // 【事务1在更新时等待`db`.`host`的主键上的记录锁】
Record lock, heap no 26 PHYSICAL RECORD: n_fields 67; compact format; info bits 128
...
*** (2) TRANSACTION:
TRANSACTION 3018902995, ACTIVE 3490 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 28794268, OS thread handle 139980536006400, query id 35890389634 11.134.33.42 db_console updating
UPDATE `host` SET `module_id` = 7261 WHERE `host`.id = xxx.
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 29 page no 30452 n bits 96 index PRIMARY of table `db`.`host` trx id 3018902995 lock_mode X locks rec but not gap // 【事务2更新操作通过where条件获取了主键上的一个记录锁,然后意图修改module_id,这需要在IDX_host_module_id的特定位置上再加一个记录锁】
Record lock, heap no 26 PHYSICAL RECORD: n_fields 67; compact format; info bits 128
...
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 29 page no 42059 n bits 1000 index IDX_host_module_id of table `db`.`host` trx id 3018902995 lock_mode X locks rec but not gap waiting // 【事务1持有的IDX_host_module_id上的next-key lock与事务2想加的记录锁冲突】
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
...
*** WE ROLL BACK TRANSACTION (2)
简单来说就是:
事务1更新数据
hold module_id next-key lock
wait host.primary record lock
事务2更新数据
hold host.primary record lock
wait module_id record lock
各自hold一个索引的锁,去等待另一个索引上的锁
example3 update并发死锁
表all_grade,id\grade\course
其中grade
和course
上建有二级索引
数学成绩59分的每人加1分
update all_grade set grade = grade + 1 where grade = 59 and course = "math"
英语成绩59分的每人加1分
update all_grade set grade = grade + 1 where grade = 59 and course = "english"
以上两条update语句并发时,出现死锁
但理论上,这两条语句应该命中的是不同的记录行,不应该发生锁冲突才对。这的确是一个非常感性的认知
既然冲突了,看一下show engine innodb status
截取了一些关键信息
(1) TRANSACTION: TRANSACTION 5419486286, ACTIVE 1 sec starting index read(**开始扫描索引**)
mysql tables in use 3, locked 3 LOCK WAIT 4 lock struct(s), heap size 1136, 3 row lock(s)
MySQL thread id 51677946, OS thread handle 139987348158208, query id 61483207592 11.134.xxx.xxx all_grade Searching rows for update
UPDATE `all_grade` SET `grade` = `grade` + 1 WHERE (grade = 59 and course = "math")
(1) HOLDS THE LOCK(S): RECORD LOCKS space id 29 page no 32292 n bits 96 index PRIMARY of table `testdb`.`all_grade`
trx id 5419486286 lock_mode X locks rec but not gap Record lock, heap no 6 PHYSICAL RECORD(**持有主键锁**)
(1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 29 page no 31616 n bits 1192 index IDX_grade of table `testdb`.`all_grade`
trx id 5419486286 lock_mode X waiting Record lock, heap no 2 PHYSICAL RECORD
(2) TRANSACTION: TRANSACTION 5419478123, ACTIVE 2 sec fetching rows
mysql tables in use 3, locked 3 LOCK WAIT 34147 lock struct(s), heap size 3678416, 1001924 row lock(s)
MySQL thread id 51678007, OS thread handle 139988848178944, query id 61483169608 11.134.xxx.xxx all_grade Searching rows for update
UPDATE `all_grade` SET `grade` = `grade` + 1 WHERE (grade = 59 and course = "english")
(2) HOLDS THE LOCK(S): RECORD LOCKS space id 29 page no 31616 n bits 1192 index IDX_grade of table `testdb`.`all_grade`
trx id 5419478123 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1;
(2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 29 page no 32292 n bits 96 index PRIMARY of table `testdb`.`all_grade`
trx id 5419478123 lock_mode X locks rec but not gap waiting Record lock, heap no 6 PHYSICAL RECORD
然后通过explain发现,对于update的where使用了两个带索引的字段,mysql使用了indexmerge。这是产生死锁的关键
- t1\t2都在并发扫描course和grade索引
- a. t1扫描并锁course=math索引记录的时候,紧接着把主键索引记录也一起锁住了。而被锁住的主键索引记录中的一部分恰好指向grade=59的数据行
- b. t2扫描并锁grade=59索引记录的时候,紧接着要去把主键索引记录也一起锁住了,但是部分主键索引记录已经被a中t1加锁持有,于是等待PRIMARY
- c. t1扫描并打算锁grade=59索引记录的时候,发现b中t2已经上了锁,于是等待IDX_grade
简单来说就是,
t1扫描course索引并锁定的记录,有一部分是t2扫描grade需要锁定的
t2扫描grade索引并锁定的记录,t1也要锁定一摸一样的记录
使用了多个索引进行搜索的写操作,【当a事务扫描索引1需要锁定的记录 与 b事务扫描索引2需要锁定的记录存在交集】,并发时死锁的概率大大提高
其他人遇到的update并发死锁场景:
https://developer.aliyun.com/article/332485