简单记录遇到的问题
一、问题如下
图中的敏感数据已经处理掉了,其中rw lock部分堵塞如下
--Thread 139672336717568 has waited at row0ins.cc line 2936 for 4.00 seconds the semaphore:
S-lock on RW-latch at 0x7f08b70f7550 created in file dict0dict.cc line 2737
a writer (thread id 139674827130624) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: ffffffffdfffffff
Last time read locked in file row0ins.cc line 2936
Last time write locked in file /export/zhg/mysql-5.7.24/storage/innobase/btr/btr0bulk.cc line 53
反正都是这个玩意。版本5.7.24。
二、大概说明
大概的看了一下,实际上这里涉及到
- dict0dict.cc line 2737:index->lock 作为索引树得并发并控制,比如悲观插入照成索引分裂的时候,需要涉及到这个锁(SX)。
- row0ins.cc line 2936:进行insert的需要对online DDL的index进行日志记录,需要持有上面的index->lock(S/SX)
- btr0bulk.cc line 53:这里涉及到讲tuple以bulk的方式插入到新的索引中,这里需要持有index->lock(X)
在以往的文章中大概知道online DDL的时候会讲排序好的数据通过bulk的方式插入到新的索引中,并且为循环插入,每次插入一行数据。
这种方式理论上是一种较快的构建索引的方式。但是如果在压力较大的情况下,比如IO遇到瓶颈,那么可能bulk构建索引的方式比较慢,那么就可能导致insert等DML操作被无情的堵塞。
三、BUG说明
随后查询BUG中发现这个问题已经修复如下:
In versions before 5.7.28 and 8.0.19, caution should be had with regard to running ALGORITHM=INPLACE and LOCK=NONE DDL on huge tables. Make sure other processes or queries do not use all the Disk I/O, and aim to run DDL on a near idle system instead.
Fixed as of the upcoming 5.7.28, 8.0.19 release, and here's the changelog entry:
A long running ALTER TABLE ... ADD INDEX operation with concurrent
inserts caused sempahore waits.
查看提交实际上就是去掉了btr0bulk.cc line 53上的加锁行为,原因如下:
The indexes on the intermediate tables are built using bulk load insert. A concurrent DML
at this stage does not acquire the index lock. An index lock acquired on the intermediate
table, which is not visible to anyone else, does not block concurrent DMLs.
其实高压下可能导致online DDL出现各种问题,并不仅仅是这个BUG。不管如何建议即便是online DDL也应该"run DDL on a near idle system instead"。