仅记录一个故障的分析过程,如果有误请见谅。
一、问题来源
最近一个朋友在做大表truncte的时候出现了堵塞。SEMAPHORES几乎都是DICT_SYS
对于show processlist显示如下,并且killed标记已经不做响应了。
从时间来看我们发现truncate命令执行的时间最长,那么可能是它造成的问题,但是为什么会造成问题呢。结合pstack简单分析一下
二、查看pstack栈分析
这里先贴出堵塞源头的truncate语句的栈,如下:
Thread 178 (Thread 0x7f5ed0e8d700 (LWP 135097)):
#0 mach_read_from_1 (b=<optimized out>) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/rem/rem0rec.cc:561
#1 rec_get_bit_field_1 (rec=0x7f5eab8394df "931706384825733120", shift=0, mask=7, offs=3) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/include/rem0rec.ic:168
#2 rec_get_status (rec=0x7f5eab8394df "931706384825733120") at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/include/rem0rec.ic:472
#3 rec_get_offsets_func (rec=0x7f5eab8394df "931706384825733120", index=0x7f5d9f04ad88, offsets=0x7f5d71627d08, n_fields=1, heap=0x7f5ed0e88ba8) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/rem/rem0rec.cc:561
#4 0x000000000111895f in btr_search_drop_page_hash_index (block=0x7f5ea7e48570) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/btr/btr0sea.cc:1278
#5 0x000000000111b0e9 in btr_search_drop_page_hash_when_freed (page_id=..., page_size=...) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/btr/btr0sea.cc:1387
#6 0x000000000119f026 in fseg_free_extent (seg_inode=<optimized out>, space=116, page_size=..., page=192256, ahi=<optimized out>, mtr=<optimized out>) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/fsp/fsp0fsp.cc:3798
#7 0x00000000011a317b in fseg_free_step (header=<optimized out>, ahi=true, mtr=0x7f5ed0e893c0) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/fsp/fsp0fsp.cc:3890
#8 0x00000000010f45bd in btr_free_but_not_root (block=0x7f5e9f76e3e8, log_mode=MTR_LOG_NO_REDO) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/btr/btr0btr.cc:1152
#9 0x00000000010f48ec in btr_free_if_exists (page_id=..., page_size=..., index_id=153, mtr=0x7f5ed0e89a00) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/btr/btr0btr.cc:1200
#10 0x000000000114f86d in dict_drop_index_tree (rec=<optimized out>, pcur=<optimized out>, mtr=0x7f5ed0e89a00) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/dict/dict0crea.cc:1160
#11 0x000000000107ab0c in DropIndex::operator() (this=0x7f5ed0e8b210, mtr=0x7f5ed0e89a00, pcur=0x7f5ed0e89ed0) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/row/row0trunc.cc:947
#12 0x0000000001083d2a in for_each<DropIndex> (callback=..., this=<optimized out>) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/row/row0trunc.cc:102
#13 SysIndexIterator::for_each<DropIndex> (this=<optimized out>, callback=...) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/row/row0trunc.cc:160
#14 0x000000000107fd2d in row_truncate_table_for_mysql (table=<optimized out>, trx=0x7f5ec0c00ad0) at /export/home/pb2/build/sb_0-32013917-1545390211.74/mysql-5.7.25/storage/innobase/row/row0trunc.cc:2022
实际上我们的truncate有很多步骤,其中步骤9为
Step-9: Drop all indexes (this include freeing of the pages
associated with them).
这一步中包含3个代价较高的部分,从DEBUG DICT_SYC的加锁和释放来看,5.7.22版本如下:
- 持有DICT_SYS维护AHI,并且持有较早在row_truncate_table_for_mysql函数的1844行持有,释放在2071行的下层函数row_mysql_unlock_data_dictionary中,随后进行了内存维护。
- 不持有DICT_SYS维护innodb buffer主要接口为buf_LRU_remove_pages。
- 不持有DICT_SYS,unlink文件。
如上故障的栈正是在进行AHI的维护,并且在dict_drop_index_tree中开头就对ut_ad(mutex_own(&dict_sys->mutex))进行了断言,也就是说这里肯定是在持有dict_sys锁的情况下进行的,那么我们来看一下到底发生了什么,我贴出大概的执行的步骤如下:
循环每一个index
btr_free_but_not_root
释放所有的非root节点
循环每个extent进行释放
->fseg_free_step
主要释放一个extent
是否开启了AHI
循环extent
->fseg_free_extent
是否开启了AHI,如果是,循环每一个page
->btr_search_drop_page_hash_when_freed
对单个的page进行处理,
首先调用buf_page_get_gen,标记为BUF_PEEK_IF_IN_POOL,看page是否在pool中
/*!< get if in pool, do not make the block young in the LRU list */
如果block不在pool中直接返回,不做处理。
如果block存在与pool中,则
->btr_search_drop_page_hash_index
->循环block中的每行记录 (这个步骤最慢)
->计算每个字段的偏移量
->循环每个字段
并且然后flod放入flod数组中。每行都包含这个值
->根据构建的flod值drop AHI信息
ha_remove_all_nodes_to_page
每次extent的删除进行mtr 提交
稍微总结一下如下,下面做这些的目的是精准的清理掉AHI中的信息:
循环每一个index
循环每个extent
AHI 如果开启,则
循环每一个page
如果page在pool中(注意这里)
循环每一行
循环每一个字段
我们发现这实际上是一个5层循环,代价最高的落在最后的循环每个page中每一行的每个字段上,如果page存在于这正是这个栈带给我们的信息。
这些步骤需要持有DICT_SYS这样一个全局字典结构的保护锁进行,而在很多地方比如如下的函数中都会持有这个锁进行互斥保护:
- >ha_innobase::get_foreign_key_list:获取外键列表
- >ha_innobase::open:每当table cache不在的时候都会调用它取建立。
这2个函数也是本例中等待的会话,并且在进行mutex的带的时候并不会被kill标记唤醒,因此kill是不能生效的。我在查询BUG的发现也有不少人遇到类似的问题如下:
https://bugs.mysql.com/bug.php?id=91977
但是bug的状态一直没有变为closed,在这个BUG的最后也有人问出问题:
因此不确认是否已经改进了。
三、5.7中drop和truncate在内存维护上的区别
其次我们来也简单讨论下truncate在innodb buffer维护上的代价,对于5.7来讲这个地方和drop是有区别的,函数接口 buf_LRU_flush_or_remove_pages 用于确认是否维护 LRU list,其中有三种类型:
/** Algorithm to remove the pages for a tablespace from the buffer pool.
See buf_LRU_flush_or_remove_pages(). */
enum buf_remove_t {
BUF_REMOVE_ALL_NO_WRITE, /*!< Remove all pages from the buffer
pool, don't write or sync to disk */
BUF_REMOVE_FLUSH_NO_WRITE, /*!< Remove only, from the flush list,
don't write or sync to disk */
BUF_REMOVE_FLUSH_WRITE /*!< Flush dirty pages to disk only
don't remove from the buffer pool */
};
- drop为:BUF_REMOVE_FLUSH_NO_WRITE,需要维护flush list,不回写数据
- trunacte为:BUF_REMOVE_ALL_NO_WRITE,需要维护flush list和lru list,不回写数据
因此5.7最好使用drop+create代替trucnate。
四、测试和建议
随后我对一个稍微大的表truncate,测试发现代价如我们描述:
正如我们描述的,我这里实际上没有建立AHI,但是全表扫描了一次,那么大量的page加载到了innodb buffer。这里btr_search_drop_page_hash_when_freed实际就是AHI维护的上层接口,如果有大量的AHI的存在,代价确实客观,并且这个是持有DICT_SYS的操作。
对于AHI的维护方面直到8.0.21当时我看一个增加分区慢的case的时候依旧存在一些问题提交的BUG如下:
https://bugs.mysql.com/bug.php?id=101900
因此建议如下:
- 对于大表的truncate操作,尽量保存在innodb buffer中没有相关page了,自然的淘汰的page也会单页维护AHI。如果page在innodb buffer中不存在,自然不会进入AHI维护。
- 对于大表的truncate操作,5.7还是建议用drop+create来代替。我也见过因为维护innodb buffer带来的故障。
- drop可以先link一个硬链接
- 低峰期进行DDL,任何DDL都是,避免未知的影响,影响太大了。
其他:断点
我主要使用了如下断点
- 线程条件断点:this==0x3424938 为DICT_SYS全局地址
- 函数断点:unlink为文件释放的linux接口
- 函数断点:btr_free_but_not_root为AHI维护的上层接口
- 函数断点:buf_LRU_remove_all_pages为内存维护接口
20 breakpoint keep y <MULTIPLE> thread 42
stop only if this==0x3424938
stop only in thread 42
breakpoint already hit 29 times
20.1 y 0x0000000001998de0 in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int)
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:973
20.2 y 0x000000000199906a in PolicyMutex<TTASEventMutex<BlockMutexPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int)
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:973
20.3 y 0x0000000001a52966 in PolicyMutex<OSTrackMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int)
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:973
21 breakpoint keep y <MULTIPLE> thread 42
stop only if this==0x3424938
stop only in thread 42
breakpoint already hit 29 times
21.1 y 0x0000000001998e83 in PolicyMutex<TTASEventMutex<GenericPolicy> >::exit()
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:947
21.2 y 0x000000000199910d in PolicyMutex<TTASEventMutex<BlockMutexPolicy> >::exit()
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:947
21.3 y 0x0000000001a52859 in PolicyMutex<OSTrackMutex<GenericPolicy> >::exit() at /opt/percona-server-locks-detail-5.7.22/storage/innobase/include/ib0mutex.h:947
22 breakpoint keep y 0x00007ffff6667140 <unlink>
breakpoint already hit 4 times
23 breakpoint keep y 0x0000000001bd815e in btr_free_but_not_root(buf_block_t*, mtr_log_t) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/btr/btr0btr.cc:1158
breakpoint already hit 2 times
24 breakpoint keep y 0x00007ffff6667140 <unlink>
breakpoint already hit 2 times
25 breakpoint keep y 0x0000000001c49722 in buf_LRU_remove_all_pages(buf_pool_t*, ulint) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/buf/buf0lru.cc:786
breakpoint already hit 8 times
5.7 drop栈
#0 btr_free_but_not_root (block=0x7fffb43622c0, log_mode=MTR_LOG_ALL) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/btr/btr0btr.cc:1158
#1 0x0000000001bd8516 in btr_free_if_exists (page_id=..., page_size=..., index_id=7175, mtr=0x7fffec04b450) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/btr/btr0btr.cc:1239
#2 0x0000000001c5f83e in dict_drop_index_tree (rec=0x7fffc6383cbb "", pcur=0x7fff84b0e4b0, mtr=0x7fffec04b450)
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/dict/dict0crea.cc:1162
#3 0x0000000001b429ad in row_upd_clust_step (node=0x7fff84b0ddb0, thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:2883
#4 0x0000000001b43080 in row_upd (node=0x7fff84b0ddb0, thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:3042
#5 0x0000000001b4355f in row_upd_step (thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:3188
#6 0x0000000001a890b1 in que_thr_step (thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/que/que0que.cc:1045
#7 0x0000000001a8938d in que_run_threads_low (thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/que/que0que.cc:1125
#8 0x0000000001a89543 in que_run_threads (thr=0x7fff84b11750) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/que/que0que.cc:1165
#9 0x0000000001a897f9 in que_eval_sql (info=0x7fff8403d680,
sql=0x7fff8403b960 "PROCEDURE DROP_TABLE_PROC () IS\nsys_foreign_id CHAR;\ntable_id CHAR;\nindex_id CHAR;\nforeign_id CHAR;\nspace_id INT;\nfound INT;\nDECLARE CURSOR cur_fk IS\nSELECT ID FROM SYS_FOREIGN\nWHERE FOR_NAME = :table"..., reserve_dict_mutex=0, trx=0x7fffeca7ab60) at /opt/percona-server-locks-detail-5.7.22/storage/innobase/que/que0que.cc:1242
#10 0x0000000001ae5f83 in row_drop_table_for_mysql (name=0x7fffec04c840 "test/test11", trx=0x7fffeca7ab60, drop_db=false, nonatomic=true, handler=0x0)
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/row/row0mysql.cc:5207
#11 0x0000000001984580 in ha_innobase::delete_table (this=0x7fff84006db0, name=0x7fffec04df70 "./test/test11")
at /opt/percona-server-locks-detail-5.7.22/storage/innobase/handler/ha_innodb.cc:13545
#12 0x0000000000f5d1ae in handler::ha_delete_table (this=0x7fff84006db0, name=0x7fffec04df70 "./test/test11") at /opt/percona-server-locks-detail-5.7.22/sql/handler.cc:5240
#13 0x0000000000f566b7 in ha_delete_table (thd=0x7fff84000c00, table_type=0x2e64aa0, path=0x7fffec04df70 "./test/test11", db=0x7fff84006bb8 "test", alias=0x7fff840065e8 "test11",
generate_warning=true) at /opt/percona-server-locks-detail-5.7.22/sql/handler.cc:2763
#14 0x00000000015f6bab in mysql_rm_table_no_locks (thd=0x7fff84000c00, tables=0x7fff84006630, if_exists=false, drop_temporary=false, drop_view=false, dont_log_query=false)
at /opt/percona-server-locks-detail-5.7.22/sql/sql_table.cc:2680
#15 0x00000000015f59b4 in mysql_rm_table (thd=0x7fff84000c00, tables=0x7fff84006630, if_exists=0 '\000', drop_temporary=0 '\000')
at /opt/percona-server-locks-detail-5.7.22/sql/sql_table.cc:2221
由于delete了SYS_INDEXES而调用了dict_drop_index_tree
sql += "DECLARE CURSOR cur_idx IS\n"
"SELECT ID FROM SYS_INDEXES\n"
"WHERE TABLE_ID = table_id\n"
"LOCK IN SHARE MODE;\n";
sql += "found := 1;\n"
"OPEN cur_idx;\n"
"WHILE found = 1 LOOP\n"
" FETCH cur_idx INTO index_id;\n"
" IF (SQL % NOTFOUND) THEN\n"
" found := 0;\n"
" ELSE\n"
" DELETE FROM SYS_FIELDS\n"
" WHERE INDEX_ID = index_id;\n"
" DELETE FROM SYS_INDEXES\n"
" WHERE ID = index_id\n"
" AND TABLE_ID = table_id;\n"
" END IF;\n"
"END LOOP;\n"
"CLOSE cur_idx;\n";
8.0 为BUF_REMOVE_NONE
#0 Fil_shard::space_delete (this=0x7fffe02e7be0, space_id=1448, buf_remove=BUF_REMOVE_NONE) at /newdata/mysql-8.0.23/storage/innobase/fil/fil0fil.cc:4512
#1 0x000000000543c023 in fil_delete_tablespace (space_id=1448, buf_remove=BUF_REMOVE_NONE) at /newdata/mysql-8.0.23/storage/innobase/fil/fil0fil.cc:4665
#2 0x000000000514733b in row_drop_tablespace (space_id=1448, filepath=0xaa81ec0 "./t10/#sql-ib2899-698767554.ibd") at /newdata/mysql-8.0.23/storage/innobase/row/row0mysql.cc:3772
#3 0x00000000050527a9 in Log_DDL::replay_delete_space_log (this=0x7fffe031c790, space_id=1448, file_path=0xaa81ec0 "./t10/#sql-ib2899-698767554.ibd")
at /newdata/mysql-8.0.23/storage/innobase/log/log0ddl.cc:1703
#4 0x00000000050521b3 in Log_DDL::replay (this=0x7fffe031c790, record=...) at /newdata/mysql-8.0.23/storage/innobase/log/log0ddl.cc:1590
#5 0x0000000005051dc2 in Log_DDL::replay_by_thread_id (this=0x7fffe031c790, thread_id=7) at /newdata/mysql-8.0.23/storage/innobase/log/log0ddl.cc:1520
#6 0x00000000050536bb in Log_DDL::post_ddl (this=0x7fffe031c790, thd=0xabe73e0) at /newdata/mysql-8.0.23/storage/innobase/log/log0ddl.cc:1924
#7 0x0000000004f18da4 in innobase_post_ddl (thd=0xabe73e0) at /newdata/mysql-8.0.23/storage/innobase/handler/ha_innodb.cc:4682
#8 0x0000000003eaedc6 in Sql_cmd_truncate_table::cleanup_base (this=0xabbd9c8, thd=0xabe73e0, hton=0xaa39a20) at /newdata/mysql-8.0.23/sql/sql_truncate.cc:401
#9 0x0000000003eaf09c in Sql_cmd_truncate_table::<lambda()>::operator()(void) const (__closure=0x7fffe47cc188) at /newdata/mysql-8.0.23/sql/sql_truncate.cc:494
#10 0x0000000003eb0196 in Scope_guard<Sql_cmd_truncate_table::truncate_base(THD*, TABLE_LIST*)::<lambda()> >::~Scope_guard(void) (this=0x7fffe47cc180, __in_chrg=<optimized out>)
at /newdata/mysql-8.0.23/include/scope_guard.h:41
#11 0x0000000003eaf81f in Sql_cmd_truncate_table::truncate_base (this=0xabbd9c8, thd=0xabe73e0, table_ref=0xabbda00) at /newdata/mysql-8.0.23/sql/sql_truncate.cc:606
#12 0x0000000003eb00e6 in Sql_cmd_truncate_table::execute (this=0xabbd9c8, thd=0xabe73e0) at /newdata/mysql-8.0.23/sql/sql_truncate.cc:746