水平有限,如有错误请谅解。源码版本8.0.21。
在处理一个故障的时候怀疑大量的删除数据导致了查询比较慢,但是自己对purge线程的工作流程一直不太清楚,本文不做深入解析,只做工作流程解析,带着如下问题进行:
- del flag记录是否能够及时清理
- 为什么History list length持续不为0,是否代表del flag记录没有清理
- purge线程触发的规则是什么
一、purge线程综述
一般来讲我们理解的purge线程可以做如下的工作:
- 清理del flag标签的记录
- 清理undo的历史版本
- 如果需要进行undo tablespace截断。
其包含一个协调线程和多个工作线程由如下参数设置:
innodb_purge_threads=4
这代表1个协调线程和3个工作线程。协调线程也会充当一个工作线程角色。
二、协调线程循环检测变化
如下调入:
srv_purge_coordinator_thread
->srv_purge_coordinator_suspend
判断如下:
(rseg_history_len <= trx_sys->rseg_history_len) {
//如果当前history_len大于等于上一次循环的的history_len
ret =os_event_wait_time_low(slot->event, SRV_PURGE_MAX_TIMEOUT, sig_count);
//等待10毫秒后进行处理或者等待被唤醒
唤醒的条件是有事务有提交或者回滚
/* Tell server some activity has happened, since the trx
does changes something. Background utility threads like
master thread, purge thread or page_cleaner thread might
have some work to do. */
srv_active_wake_master_thread();
但是需要注意的是如果长期没有新的事务进行提交,那么可能进入永久堵塞状态而不是每10毫秒醒来,直到唤醒
if (ret == OS_SYNC_TIME_EXCEEDED) { //如果是等待超时
if (rseg_history_len == trx_sys->rseg_history_len &&
trx_sys->rseg_history_len < 5000) { //如果上次的history_len和本次history_len相同且小于5000那么需要等待唤醒
stop = true; //设置为true,进行无限期等待,直到唤醒
}
三、克隆最老的read view
这一步没什么好说的,因为清理undo需要根据当前最老的read view来清理,否则可能清理到正在读取需要的undo。
如下调入:
srv_purge_coordinator_thread
->srv_do_purge
->trx_purge
操作如下:
trx_sys->mvcc->clone_oldest_view(&purge_sys->view); //克隆老的 read view srv_do_purge
四、从可能需要清理的purge_queue中取出undo segment(简单理解为事务)
调入如下:
srv_purge_coordinator_thread
->srv_do_purge
->trx_purge
->trx_purge_attach_undo_recs
->trx_purge_fetch_next_rec
->TrxUndoRsegsIterator::set_next
操作如下:
const page_size_t &page_size = purge_sys->rseg_iter->set_next();
注意这里是一个迭代器,迭代的就是purge_sys->purge_queue,这是std::priority_queue实现的优先队列。具体迭代的代码如下:
while (!m_purge_sys->purge_queue->empty()) { //如果有事务需要清理
if (m_trx_undo_rsegs.get_trx_no() == UINT64_UNDEFINED) {
m_trx_undo_rsegs = purge_sys->purge_queue->top();
} else if (purge_sys->purge_queue->top().get_trx_no() ==
m_trx_undo_rsegs.get_trx_no()) {
m_trx_undo_rsegs.append(purge_sys->purge_queue->top()); //弹出一个
} else {
break;
}
而事务进入purge_queue是在事务commit的时候调用trx_serialisation_number_get
purge_sys->purge_queue->push(elem);
因此到这里我们知道事务提交的时候可能会唤醒purge协调线程进行工作,并且会加入可能需要purge的事务队列purge_queue中。
五、判断是否符合清理规则
调入如下:
srv_purge_coordinator_thread
->srv_do_purge
->trx_purge
->trx_purge_attach_undo_recs
->trx_purge_fetch_next_rec
判断如下:
if (purge_sys->iter.trx_no >= purge_sys->view.low_limit_no()) {
return (nullptr);
}
这里就是判断是否需要清理事务的trx no是否大于了oldest read view的low limit no,如果不满足则返回为nullptr,如果符合那么返回需要清理的page数量,并且指向下一个需要清理的undo segment。
六、每次清理默认为300个page
这个值由参数innodb_purge_batch_size进行控制,默认为300
调入如下:
srv_purge_coordinator_thread
->srv_do_purge
->trx_purge
->trx_purge_attach_undo_recs
生效如下:
for (ulint i = 0; n_pages_handled < batch_size; ++i)
清理流程会一致持续到没有块需要清理为止
调入如下:
srv_purge_coordinator_thread
->srv_do_purge
判断如下:
(!srv_purge_should_exit(n_pages_purged) && n_pages_purged > 0 &&
purge_sys->state == PURGE_STATE_RUN);
//清理完成后n_pages_purged > 0 将不会满足
return (rseg_history_len); //返回 rseg_history_len
七、工作线程处理
分发给工作线程后进入如下调用,进行del flag的清理,没有仔细的看这部分,调用比较复杂。但是可以肯定是其构造row_purge_parse_undo_rec)和删除过程可能需要大量的循环和数据定位(btr_cur_search_to_nth_level)操作。
srv_worker_thread
->srv_task_execute
->que_run_threads
->que_run_threads_low
->que_thr_step
->row_purge_step
->row_purge
->row_purge_record_func
八、默认每128次batch undo清理会进行undo history清理
这个和参数innodb_purge_rseg_truncate_frequency的设置有关,默认为128,如果满负荷计算为 :
- 300(undo log pages)*128(truncate frequency ) = 38,400
38400个undo log pages处理完成后会进行一次undo history清理。
根据参数赋值
set_rseg_truncate_frequency(
static_cast<ulint>(srv_purge_rseg_truncate_frequency));
参数判断
ulint rseg_truncate_frequency = ut_min(
static_cast<ulint>(srv_purge_rseg_truncate_frequency), undo_trunc_freq); //128
n_pages_purged = trx_purge(n_use_threads, srv_purge_batch_size,
(++count % rseg_truncate_frequency) == 0);//每128次进行一次清理
判断是否进入truncate流程
if (truncate || srv_upgrade_old_undo_found) { //truncate就是根据(++count % rseg_truncate_frequency)计算而来
trx_purge_truncate();
}
但是需要注意的count是一个static局部变量,因此每次调入函数会继续上次的取值继续计数。如果压力很小那么undo可能不能及时清理:
小事务
如果都是小事务那么每个事务修改的undo page数可能达不到300个,那么必然需要等待128个事务才能进行一次清理。大事务
如果事务比较大,有许多undo page,那么超过了300*128 那么就会进行清理。
这不是说del flag记录不清理,而是说undo history链表不清理。因此我们经常看到History list length不为0的情况。
九、清理undo history和undo空间
这里简单记录其工作的流程。不做深入函数描述(能力有限)
清理undo history
调入如下:
srv_purge_coordinator_thread
->srv_do_purge
->trx_purge
->trx_purge_truncate
->trx_purge_truncate_history
->trx_purge_truncate_rseg_history
清理的方式如下:
清理的起点:
hdr_addr = trx_purge_get_log_from_hist(
flst_get_last(rseg_hdr + TRX_RSEG_HISTORY, &mtr));
向上扫描:
hdr_addr = prev_hdr_addr;
结束条件:
if (undo_trx_no >= limit->trx_no) { //这里代表结束了
/* limit space_id should match the rollback segment
space id to avoid freeing if the page belongs to a
different rollback segment for the same trx_no. */
if (undo_trx_no == limit->trx_no &&
rseg->space_id == limit->undo_rseg_space) {
trx_undo_truncate_start(rseg, hdr_addr.page, hdr_addr.boffset,
limit->undo_no);
}
rseg->unlatch();
mtr_commit(&mtr);
return;
}
值得注意的是这个清理过程不能大于oldest read view的 trx no,否则清理结束。
truncate undo流程
调入如下:
srv_purge_coordinator_thread
->srv_do_purge
->trx_purge
->trx_purge_truncate
->trx_purge_truncate_history
->trx_purge_truncate_marked_undo
这之前有一个判定是否清理的过程
trx_purge_mark_undo_for_truncate
->Tablespace::needs_truncation
Tablespace::needs_truncation会判断是否进行undo truncate,这里涉及到两个参数
- 参数innodb_undo_log_truncate的作用
if (!srv_undo_log_truncate || m_rsegs == nullptr || m_rsegs->is_empty() ||
m_rsegs->is_init()) {
m_rsegs->s_unlock();
return (false); //如果没有开启undo truncate则不进行清理
}
- 参数innodb_max_undo_log_size的作用
page_no_t trunc_size = ut_max(
static_cast<page_no_t>(srv_max_undo_tablespace_size / srv_page_size),
static_cast<page_no_t>(SRV_UNDO_TABLESPACE_SIZE_IN_PAGES)); //10MB
if (fil_space_get_size(id()) > trunc_size) { //如果undo tablespace大小大于了innodb_max_undo_log_size
return (true); //则进行清理
}
十、总结
到这里开头的问题我们基本就了解了,如下:
del flag在事务提交后,由协调线程判定是否能够进行清理,如果可以清理会分发给工作线程进行清理,这是一个异步的过工程,如果修改数据比较多,那么这个过程可能比较慢,并且可以看到purge的相关线程压力较大,但是还算及时。
purge线程总会积压一段时间才会进行History list length的清理,如果是小事务(每次修改的page小于innodb_purge_batch_size的设置),那么需要128个这种小时候才清理一次,如果是大事务那么修改两超过了(innodb_purge_batch_size*innodb_purge_rseg_truncate_frequency)的设置则进行一次清理,但是不管如何这个指标持续不为0是正常。如果较大那么可能意味着要么有大查询,要么purge的各个线程满负荷工作。如下,9281为一个purge的工作线程:
并且purge线程状态处于running状态
- purge的协调线程会在每次事务提交的时候醒来,判断是否有需要清理的事务,如果长期没有事务到来那么会第一次等待10ms,超时过后进入长时间的堵塞等待状态。
十一、相关断点
(gdb) info b
Num Type Disp Enb Address What
1 breakpoint keep y 0x00000000035a5f75 in main(int, char**) at /opt/mysql/mysql-8.0.21/sql/main.cc:25
breakpoint already hit 1 time
5 breakpoint keep y 0x0000000005030e9e in row_purge_record_func(purge_node_t*, trx_undo_rec_t*, que_thr_t const*, bool, THD*)
at /opt/mysql/mysql-8.0.21/storage/innobase/row/row0purge.cc:1082
7 breakpoint keep y 0x00000000050d5bb0 in trx_purge_fetch_next_rec(trx_id_t*, roll_ptr_t*, ulint*, mem_heap_t*)
at /opt/mysql/mysql-8.0.21/storage/innobase/trx/trx0purge.cc:2022
breakpoint already hit 2 times (这个断点比较重要如果没有需要清理的这里不会进入)
高压下的工作线程堆栈:
(gdb) bt
#0 ha_chain_get_first (table=0x7fff642eec10, fold=14919132254711319759) at /opt/mysql/mysql-8.0.21/storage/innobase/include/ha0ha.ic:95
#1 0x000000000519e235 in ha_search_with_data (table=0x7fff642eec10, fold=14919132254711319759, data=0x7fff338fa218 "") at /opt/mysql/mysql-8.0.21/storage/innobase/include/ha0ha.ic:173
#2 0x000000000519e2da in ha_search_and_delete_if_found (table=0x7fff642eec10, fold=14919132254711319759, data=0x7fff338fa218 "")
at /opt/mysql/mysql-8.0.21/storage/innobase/include/ha0ha.ic:200
#3 0x00000000051a567a in btr_search_update_hash_on_delete (cursor=0x7fff668555c0) at /opt/mysql/mysql-8.0.21/storage/innobase/btr/btr0sea.cc:1724
#4 0x00000000051948bb in btr_cur_pessimistic_delete (err=0x7ffed17f10ec, has_reserved_extents=0, cursor=0x7fff668555c0, flags=0, rollback=false, trx_id=38031, undo_no=0, rec_type=14,
mtr=0x7ffed17f1420, pcur=0x7fff668555c0) at /opt/mysql/mysql-8.0.21/storage/innobase/btr/btr0cur.cc:4833
#5 0x000000000502ed6e in row_purge_remove_clust_if_poss_low (node=0x7fff66855510, mode=65569) at /opt/mysql/mysql-8.0.21/storage/innobase/row/row0purge.cc:204
#6 0x000000000502ee8c in row_purge_remove_clust_if_poss (node=0x7fff66855510) at /opt/mysql/mysql-8.0.21/storage/innobase/row/row0purge.cc:245
#7 0x000000000502feb4 in row_purge_del_mark (node=0x7fff66855510) at /opt/mysql/mysql-8.0.21/storage/innobase/row/row0purge.cc:683
#8 0x0000000005030f92 in row_purge_record_func (node=0x7fff66855510, undo_rec=0x7ffefc431418 "$&N", thr=0x7fff66855328, updated_extern=false, thd=0x7fff0c000a40)
at /opt/mysql/mysql-8.0.21/storage/innobase/row/row0purge.cc:1094
#9 0x0000000005031333 in row_purge (node=0x7fff66855510, undo_rec=0x7ffefc431418 "$&N", thr=0x7fff66855328) at /opt/mysql/mysql-8.0.21/storage/innobase/row/row0purge.cc:1168
#10 0x000000000503164c in row_purge_step (thr=0x7fff66855328) at /opt/mysql/mysql-8.0.21/storage/innobase/row/row0purge.cc:1240
#11 0x0000000004f97868 in que_thr_step (thr=0x7fff66855328) at /opt/mysql/mysql-8.0.21/storage/innobase/que/que0que.cc:915
#12 0x0000000004f97a1b in que_run_threads_low (thr=0x7fff66855328) at /opt/mysql/mysql-8.0.21/storage/innobase/que/que0que.cc:966
#13 0x0000000004f97c52 in que_run_threads (thr=0x7fff66855328) at /opt/mysql/mysql-8.0.21/storage/innobase/que/que0que.cc:1001
#14 0x0000000005082f83 in srv_task_execute () at /opt/mysql/mysql-8.0.21/storage/innobase/srv/srv0srv.cc:2888
#15 0x000000000508311a in srv_worker_thread () at /opt/mysql/mysql-8.0.21/storage/innobase/srv/srv0srv.cc:2927
#16 0x0000000004f18ea3 in std::_Bind<void (*())()>::__call<void>(std::tuple<>&&, std::_Index_tuple<>) (this=0x7ffed17f1c50,
__args=<unknown type in /opt/mysql/mysql3310/bin/mysqld, CU 0xc621f25, DIE 0xc6c1813>) at /usr/local/include/c++/5.5.0/functional:1074
#17 0x0000000004f18e4a in std::_Bind<void (*())()>::operator()<, void>() (this=0x7ffed17f1c50) at /usr/local/include/c++/5.5.0/functional:1133
#18 0x0000000004f18d8a in Runnable::operator()<void (*)()>(void (*&&)()) (this=0xa814df0, f=<unknown type in /opt/mysql/mysql3310/bin/mysqld, CU 0xc621f25, DIE 0xc6c1563>)
at /opt/mysql/mysql-8.0.21/storage/innobase/include/os0thread-create.h:101
#19 0x0000000004f18ccf in std::_Bind_simple<Runnable (void (*)())>::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0xa814de8) at /usr/local/include/c++/5.5.0/functional:1531
#20 0x0000000004f18be9 in std::_Bind_simple<Runnable (void (*)())>::operator()() (this=0xa814de8) at /usr/local/include/c++/5.5.0/functional:1520
#21 0x0000000004f18b88 in std::thread::_Impl<std::_Bind_simple<Runnable (void (*)())> >::_M_run() (this=0xa814dd0) at /usr/local/include/c++/5.5.0/thread:115
#22 0x00007ffff66d6880 in std::execute_native_thread_routine (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
#23 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#24 0x00007ffff5e388dd in clone () from /lib64/libc.so.6