本文是以拣货系统的一次死锁错误为背景,介绍发现问题到解决问题的过程。
发现死锁问题
查看应用日志
发现有大量的error信息,仔细查看后发现大部分都是死锁错误。由于日志过多,之前的错误日志已经刷没了,但是可以通过mysql status查看最后一次死锁快照。查看Mysql statis
status可以查看到MySQL服务的很多状态,如事务数、文件I/O、日志大小、缓存区等。我们重点关注“LATEST DETECTED DEADLOCK”部分,这部分记录了数据库的最后一次死锁发生的时间和事务快照。建议在发现死锁的时候打开“innodb_print_all_deadlocks”参数,记录所有死锁发生的情况。
(查看status.txt文件 SHOW ENGINE INNODB STATUS)
------------------------
LATEST DETECTED DEADLOCK
------------------------
171130 10:14:44
*** (1) TRANSACTION:
TRANSACTION 25AFDEA1, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 1
MySQL thread id 23859938, OS thread handle 0x7f39202ec700, query id 1639722385 10.187.54.140 o2o_pick_good_rw Updating
UPDATE picking_list_53
SET pick_status = 4,picking_end_time = '2017-11-30 10:14:44',update_pin = 'system'
WHERE id = 54700
AND station_no = '11702453'
AND pick_status in (2,3)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2109 page no 767 n bits 152 index `PRIMARY` of table `o2o_pick_goods`.`picking_list_53` trx id 25AFDEA1 lock_mode X locks rec but not gap waiting
Record lock, heap no 79 PHYSICAL RECORD: n_fields 25; compact format; info bits 0
0: len 8; hex 800000000000d5ac; asc ;;
1: len 6; hex 000025afdea2; asc % ;;
2: len 7; hex 530000013d2705; asc S =' ;;
3: len 8; hex 80001258750b5cad; asc Xu \ ;;
4: len 8; hex 80001258750b5ebf; asc Xu ^ ;;
5: len 8; hex 80001258750b5ec4; asc Xu ^ ;;
6: len 4; hex 80000004; asc ;;
7: len 2; hex 3330; asc 30;;
8: len 4; hex 80000003; asc ;;
9: SQL NULL;
10: len 7; hex 33303531313938; asc 3051198;;
11: len 8; hex 3131373032343533; asc 11702453;;
12: len 4; hex 80000000; asc ;;
13: len 4; hex 80000003; asc ;;
14: len 8; hex 80001258750b5cad; asc Xu \ ;;
15: len 7; hex 33303531313938; asc 3051198;;
16: SQL NULL;
17: len 6; hex 7069636b6572; asc picker;;
18: len 1; hex 80; asc ;;
19: len 4; hex 5a1f6994; asc Z i ;;
20: len 6; hex 333136333133; asc 316313;;
21: len 4; hex 80000000; asc ;;
22: SQL NULL;
23: SQL NULL;
24: len 4; hex 80000002; asc ;;
*** (2) TRANSACTION:
TRANSACTION 25AFDEA2, ACTIVE 0 sec starting index read, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 1
MySQL thread id 23907472, OS thread handle 0x7f2f9da1d700, query id 1639722386 10.187.155.53 o2o_pick_good_rw Updating
UPDATE orderinfo_53
SET pick_status = 4,update_time = '2017-11-30 10:14:44',update_pin
='picker'
WHERE order_id = '728750539000441'
AND pick_id = 54700
AND (pick_status in(3,6) OR (update_pin = 'system' AND pick_status=4))
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2109 page no 767 n bits 144 index `PRIMARY` of table `o2o_pick_goods`.`picking_list_53` trx id 25AFDEA2 lock_mode X locks rec but not gap
Record lock, heap no 79 PHYSICAL RECORD: n_fields 25; compact format; info bits 0
0: len 8; hex 800000000000d5ac; asc ;;
1: len 6; hex 000025afdea2; asc % ;;
2: len 7; hex 530000013d2705; asc S =' ;;
3: len 8; hex 80001258750b5cad; asc Xu \ ;;
4: len 8; hex 80001258750b5ebf; asc Xu ^ ;;
5: len 8; hex 80001258750b5ec4; asc Xu ^ ;;
6: len 4; hex 80000004; asc ;;
7: len 2; hex 3330; asc 30;;
8: len 4; hex 80000003; asc ;;
9: SQL NULL;
10: len 7; hex 33303531313938; asc 3051198;;
11: len 8; hex 3131373032343533; asc 11702453;;
12: len 4; hex 80000000; asc ;;
13: len 4; hex 80000003; asc ;;
14: len 8; hex 80001258750b5cad; asc Xu \ ;;
15: len 7; hex 33303531313938; asc 3051198;;
16: SQL NULL;
17: len 6; hex 7069636b6572; asc picker;;
18: len 1; hex 80; asc ;;
19: len 4; hex 5a1f6994; asc Z i ;;
20: len 6; hex 333136333133; asc 316313;;
21: len 4; hex 80000000; asc ;;
22: SQL NULL;
23: SQL NULL;
24: len 4; hex 80000002; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 4557 page no 18150 n bits 344 index `¶©µ¥±àºÅ` of table `o2o_pick_goods`.`orderinfo_53` trx id 25AFDEA2 lock_mode X locks rec but not gap waiting
Record lock, heap no 38 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 15; hex 373238373530353339303030343431; asc 728750539000441;;
1: len 8; hex 80000000000c3c07; asc < ;;
*** WE ROLL BACK TRANSACTION (2)
死锁产生原因
死锁产生的前提是操作必须在数据库事务中。下面介绍一下数据库事务。
spring事务原理
死锁
Deadlocks in InnoDB
A deadlock is a situation where different transactions are unable to proceed because each holds a lock that the other needs. Because both transactions are waiting for a resource to become available, neither ever release the locks it holds.
演示DEMO
模拟死锁产生(InnoDB)
演示spring事务(InnoDB)
如何避免死锁
保证事务操作对象的顺序一致性。
如事务需要更新A、B两张表,需要保证所有的事务操作都是按照先A后B的顺序,这样就能避免资源的争抢。尽量保证事务最小化。
剥离非事务操作,如查询,逻辑判断,redis操作等。
因为数据库的事务是一个数据库链接,剥离非事务操作的目的是减少数据库链接占用的时间,从而加快数据库事务的处理。
分析当前事务问题
拣货状态:1、待集合 2、已集合 3、拣货中 4、拣货完成 5、已取消
从上面的日志中,我们可以看到,死锁是发生在拣货完成时。拣货完成需要更新拣货单和订单状态。
了解了死锁产生的原因和事务的机制,对于修复问题就容易多了,具体操作如下:
分析系统代码:
- 发现系统有多处更新状态的代码,但是更新顺序不一致,有的先更新订单,有的先更新拣货单。
修改方式:统一方法的更新顺序; - 在更新状态的事务中有非事务操作。
修改方式:查询逻辑提取到事务外,更新Redis提取到事务外;
观察修改前和修改后的代码。
后记
按照上面的逻辑修改后,打开“innodb_print_all_deadlocks”参数,观察1-2周,确定死锁问题已经修复。