脑裂
由于网络问题,集群节点失去联系。主从数据不同步;重新平衡选举,产生两个主服务。两套主服务一起运行,导致数据不一致。详细请参考: 《redis 脑裂等极端情况分析》
🔥 文章来源:《redis 脑裂现象》
解决方案
比较简单的方案,进行 redis 设置
// master 至少有 3 个副本连接
min-slaves-to-write 3
// 数据复制和同步的延迟不能超过 10 秒
min-slaves-max-lag 10
redis.conf 相关解析
# It is possible for a master to stop accepting writes if there are less than
# N slaves connected, having a lag less or equal than M seconds.
#
# The N slaves need to be in "online" state.
#
# The lag in seconds, that must be <= the specified value, is calculated from
# the last ping received from the slave, that is usually sent every second.
#
# This option does not GUARANTEE that N replicas will accept the write, but
# will limit the window of exposure for lost writes in case not enough slaves
# are available, to the specified number of seconds.
#
# For example to require at least 3 slaves with a lag <= 10 seconds use:
#
# min-slaves-to-write 3
# min-slaves-max-lag 10
#
# Setting one or the other to 0 disables the feature.
#
# By default min-slaves-to-write is set to 0 (feature disabled) and
# min-slaves-max-lag is set to 10.
redis(3.2.8)代码实现流程(有删减)
#define run_with_period(_ms_) if ((_ms_ <= 1000/server.hz) || !(server.cronloops%((_ms_)/(1000/server.hz))))
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
run_with_period(1000) replicationCron();
}
/* Replication cron function, called 1 time per second. */
// 复制周期执行的函数,每秒调用1次
void replicationCron(void) {
// 更新延迟至log小于min-slaves-max-lag的从服务器数量
refreshGoodSlavesCount();
}
/* This function counts the number of slaves with lag <= min-slaves-max-lag.
* If the option is active, the server will prevent writes if there are not
* enough connected slaves with the specified lag (or less). */
// 更新延迟至log小于min-slaves-max-lag的从服务器数量
void refreshGoodSlavesCount(void) {
listIter li;
listNode *ln;
int good = 0;
// 没设置限制则返回
if (!server.repl_min_slaves_to_write ||
!server.repl_min_slaves_max_lag) return;
listRewind(server.slaves,&li);
// 遍历所有的从节点client
while((ln = listNext(&li))) {
client *slave = ln->value;
// 计算延迟值
time_t lag = server.unixtime - slave->repl_ack_time;
// 计数小于延迟限制的个数
if (slave->replstate == SLAVE_STATE_ONLINE &&
lag <= server.repl_min_slaves_max_lag) good++;
}
server.repl_good_slaves_count = good;
}