新增复制时间戳
MySQL8在binlog文件中,对每一个事务(注意是事务,不是事件)都记录了两个时间戳:original_commit_timestamp和immediate_commit_timestamp,通过计算这两个时间戳的差来测量复制延时。这种延时的测量是基于事务(transaction)的,而不再是基于事件(event)。
- original_commit_timestamp: 在产生master节点(事务产生的原始节点),事务成功commit(写入binlog)的毫秒数(基于unix epoch time:1970-01-01T00:00:00Z算起)
- immediate_commit_timestamp: 在即时主机,该事务被成功commit的毫秒数(基于unix epoch time:1970-01-01T00:00:00Z算起)
什么是即时主机?在主从拓扑架构中,即时主机可以被理解为应用该binglog的主机,也可简单理解为slave.
同一个事务在同一主从复制架构中,其original_commit_timestamp都是一样的,而immediate_commit_timestamp则因各主机应用(apply)该事务的时间而异。
在主节点(master)上original_commit_timestamp和immediate_commit_timestamp是一致的,同样在slave 的relaylog里面这两个值也是一致的
#180613 9:52:36 server id 1 end_log_pos 263 GTID last_committed=0 sequence_number=1 rbr_only=yes original_committed_timestamp=1528854756197790 immediate_commit_time
stamp=1528854756197790 transaction_length=510242
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=1528854756197790 (2018-06-13 09:52:36.197790 CST)
# immediate_commit_timestamp=1528854756197790 (2018-06-13 09:52:36.197790 CST)
/*!80001 SET @@session.original_commit_timestamp=1528854756197790*//*!*/;
SET @@SESSION.GTID_NEXT= '76a5d623-64b0-11e8-9060-5254004332fa:17927'/*!*/;
而在slave的binlog里就不同了
#180613 9:48:52 server id 1 end_log_pos 270 GTID last_committed=0 sequence_number=1 rbr_only=yes original_committed_timestamp=1528854532813094 immediate_commit_time
stamp=1528854476838381 transaction_length=510244
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=1528854532813094 (2018-06-13 09:48:52.813094 CST)
# immediate_commit_timestamp=1528854476838381 (2018-06-13 09:47:56.838381 CST)
/*!80001 SET @@session.original_commit_timestamp=1528854532813094*//*!*/;
SET @@SESSION.GTID_NEXT= '76a5d623-64b0-11e8-9060-5254004332fa:11590'/*!*/;
在主从时间一致的情况下,immediate_commit_time会大于等于original_committed_timestamp,上述片段中的slave比主库系统时间慢3分钟,所以导致了immediate_commit_time小于original_committed_timestamp的现象,这是不正常的。这会导致延时复制(deplayed replication)可能会不准确。
但show slave status 产出中的Second_Behind_Master是处理了主从时钟不一致的问题的,slave_IO 在连接master是会获取Master 的timestamp,计算和本机的时差。见下面源码实现( rpl_slave.cc ):
/*
The difference in seconds between the clock of the master and the clock of
the slave (second - first). It must be signed as it may be <0 or >0.
clock_diff_with_master is computed when the I/O thread starts; for this the
I/O thread does a SELECT UNIX_TIMESTAMP() on the master.
"how late the slave is compared to the master" is computed like this:
clock_of_slave - last_timestamp_executed_by_SQL_thread - clock_diff_with_master
*/
long clock_diff_with_master;
......
if (mi->rli->slave_running)
{
/*
Check if SQL thread is at the end of relay log
Checking should be done using two conditions
condition1: compare the log positions and
condition2: compare the file names (to handle rotation case)
*/
if ((mi->get_master_log_pos() == mi->rli->get_group_master_log_pos()) &&
(!strcmp(mi->get_master_log_name(), mi->rli->get_group_master_log_name())))
{
if (mi->slave_running == MYSQL_SLAVE_RUN_CONNECT)
protocol->store(0LL);
else
protocol->store_null();
}
else
{
long time_diff= ((long)(time(0) - mi->rli->last_master_timestamp)
- mi->clock_diff_with_master);
/*
Apparently on some systems time_diff can be <0. Here are possible
reasons related to MySQL:
- the master is itself a slave of another master whose time is ahead.
- somebody used an explicit SET TIMESTAMP on the master.
Possible reason related to granularity-to-second of time functions
(nothing to do with MySQL), which can explain a value of -1:
assume the master's and slave's time are perfectly synchronized, and
that at slave's connection time, when the master's timestamp is read,
it is at the very end of second 1, and (a very short time later) when
the slave's timestamp is read it is at the very beginning of second
2. Then the recorded value for master is 1 and the recorded value for
slave is 2. At SHOW SLAVE STATUS time, assume that the difference
between timestamp of slave and rli->last_master_timestamp is 0
(i.e. they are in the same second), then we get 0-(2-1)=-1 as a result.
This confuses users, so we don't go below 0: hence the max().
last_master_timestamp == 0 (an "impossible" timestamp 1970) is a
special marker to say "consider we have caught up".
*/
protocol->store((longlong)(mi->rli->last_master_timestamp ?
max(0L, time_diff) : 0));
}
在group replication 架构中情况又有所不同
#180606 14:46:08 server id 1 end_log_pos 27261 GTID last_committed=0 sequence_number=17 rbr_only=yes original_committed_timestamp=1528267568948594 immediate_com
mit_timestamp=1528267569283514 transaction_length=1687
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=1528267568948594 (2018-06-06 14:46:08.948594 CST)
# immediate_commit_timestamp=1528267569283514 (2018-06-06 14:46:09.283514 CST)
/*!80001 SET @@session.original_commit_timestamp=1528267568948594*//*!*/;
SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:420051'/*!*/;
# at 27261
#180606 14:46:08 server id 1 end_log_pos 27341 Query thread_id=461 exec_time=0 error_code=0
SET TIMESTAMP=1528267568/*!*/;
BEGIN
上述binlog片段来自与group replication架构中的master节点,可以看到immediate_commit_timestamp 是略晚于original_commit_timestamp的。
这是因为original_commit_timestamp是在一个事务准备好commit时产生的,而此时尚未commit,也就是说这个事务已经在主节点上完成执行(还没有最终commit),它的写集(writeset)已经产生,并准备发送给其它节点做冲突校验(还没有发送),此时产生的original_commit_timestamp时间戳。该时间戳会随事务一起广播到其它组内节点(group replication members)。
immediate_commit_timestamp 时间戳则是该事务在master节点上的最终commit时间。
主节点上immediate_commit_timestamp - original_commit_timestamp 恰恰是该事务校验所花费的时间。
在group replication架构中,还有一种事件比较特殊:视图改变事件(View Change Event), 当复制组中有新节点加入,或有节点退出时,会触发view change evnet. 该事件是由个节点自主产生,自主应用的,而非由一个节点产生然后复制广播到其它节点,因此这些事务的original_commit_timestamp 会被设置为0.
复制延时监控
常用的复制延时查看是通过执行show slave status命令,查看Seconds_Behind_Master字段,但这种简单的监控只适用于传统的主从复制,而对于新特性组复制(Group Replication) 则不适用了。
MySQL 对于GR复制模式提供了更加细粒度的监控信息,这些信息通过查看performance_schema库中的一些表可以获得
- replication_connection_status: 当前连接信息,提供当前加入到relay队列中的事务,以及该事务的original_commit_timestamp等信息
- replication_applier_status_by_coordinator: 启用MTS时,当前调度线程的信息,包括上一个已处理事务的GTID,以及original_commit_timestamp等信息
mysql> select * from replication_applier_status_by_coordinator\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
THREAD_ID: 147
SERVICE_STATE: ON
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_PROCESSED_TRANSACTION: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:2523170
LAST_PROCESSED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2018-06-12 10:36:51.884949
LAST_PROCESSED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_PROCESSED_TRANSACTION_START_BUFFER_TIMESTAMP: 2018-06-12 10:38:02.636455
LAST_PROCESSED_TRANSACTION_END_BUFFER_TIMESTAMP: 2018-06-12 10:38:02.636493
PROCESSING_TRANSACTION: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:2523171
PROCESSING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2018-06-12 10:36:51.884960
PROCESSING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
PROCESSING_TRANSACTION_START_BUFFER_TIMESTAMP: 2018-06-12 10:38:02.636495
- replication_applier_status_worker: 当前应用线程的信息,如果启用了MTS,会有多个,其表结构与replication_applier_status_by_coordinator类似
下面我写了几个在MGR架构中常用的监控SQL,在各节点时钟一致的情况下,其结果可靠的。
查看复制落后延时(MTS ON):
select if(PROCESSING_TRANSACTION <>'', timestampdiff(second,LAST_PROCESSED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,current_timestamp()),0) lag_time from performance_schema.replication_applier_status_by_coordinator where SERVICE_STATE='ON' and CHANNEL_NAME='group_replication_applier';
查看复制落后延时(MTS OFF):
select if(APPLYING_TRANSACTION <>'', timestampdiff(second,LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,current_timestamp()),0) lag_time from performance_schema.replication_applier_status_by_worker where SERVICE_STATE='ON' and CHANNEL_NAME='group_replication_applier';
如果查看落后的事务数,而非时间
select Count_transactions_remote_in_applier_queue from performance_schema.replication_group_member_stats where member_id=@@server_uuid;