3.哨兵机制
在《4.1 集群设计绕不开的话题》中我们提到了单点故障的问题,在《4.2 主从复制》中我们又提到了使用从库来做主库的备份,保证数据尽量不丢失。不过问题来了,如果主库挂掉了,我们如何知道呢?而且每次主库挂掉我们都让运维人员来将从库手动改成主库吗?如果代码里已经将主库连接写死了,是不是还得把所有影响到的项目都重新发布呢?这样显然不合理,所以,就有了哨兵机制。
哨兵机制(Redis Sentinel)有两个主要任务:
- 监控:定时给被监控的Redis实例做心跳检测,看看是否在正常工作
- 自动故障转移:当主Redis挂掉了,Sentinel会将从服务器升级为主服务器,并且给客户端提供新的主服务器地址。
3.1 Sentinel 启动初始化
当一个Sentinel启动时,它会执行以下步骤:
-
初始化服务器
这里的初始化服务器并不会像《3.1.2.1 Redis服务启动流程》中描述的那样完整的初始化Redis服务器,比如它不会载入RDB或AOF文件。
-
将普通Redis服务器的代码替换成Sentinel代码
Redis服务器启动的时候会载入命令表,而Sentinel启动载入的命令表和正常的不太一样,所以它只支持PING、SENTINEL、INFO、SUBSCRIBE、UNSUBSCRIBE、PSUBSCRIBE和PUNSUBSCRIBE这七个命令。除了这个以外还有一些其他的代码替换。
-
初始化Sentinel状态
这里会初始化一个sentinelState结构体,如下:
/* Main state. */ struct sentinelState { char myid[CONFIG_RUN_ID_SIZE+1]; /* This sentinel ID. */ uint64_t current_epoch; /* Current epoch. */ dict *masters; /* Dictionary of master sentinelRedisInstances. Key is the instance name, value is the sentinelRedisInstance structure pointer. */ int tilt; /* Are we in TILT mode? */ int running_scripts; /* Number of scripts in execution right now. */ mstime_t tilt_start_time; /* When TITL started. */ mstime_t previous_time; /* Last time we ran the time handler. */ list *scripts_queue; /* Queue of user scripts to execute. */ char *announce_ip; /* IP addr that is gossiped to other sentinels if not NULL. */ int announce_port; /* Port that is gossiped to other sentinels if non zero. */ unsigned long simfailure_flags; /* Failures simulation. */ int deny_scripts_reconfig; /* Allow SENTINEL SET ... to change script paths at runtime? */ } sentinel;
-
根据给定的配置文件,初始化Sentinel的监视主服务器列表
上述sentinelState结构体中的masters字典记录了所有被Sentinel监视的主服务器相关信息。其中key是被监视主服务器的名字,value是Redis服务器实例,用sentinelRedisInstance来保存信息,如下:
typedef struct sentinelRedisInstance { int flags; /* See SRI_... defines */ char *name; /* Master name from the point of view of this sentinel. */ char *runid; /* Run ID of this instance, or unique ID if is a Sentinel.*/ uint64_t config_epoch; /* Configuration epoch. */ sentinelAddr *addr; /* Master host. */ instanceLink *link; /* Link to the instance, may be shared for Sentinels. */ mstime_t last_pub_time; /* Last time we sent hello via Pub/Sub. */ mstime_t last_hello_time; /* Only used if SRI_SENTINEL is set. Last time we received a hello from this Sentinel via Pub/Sub. */ mstime_t last_master_down_reply_time; /* Time of last reply to SENTINEL is-master-down command. */ mstime_t s_down_since_time; /* Subjectively down since time. */ mstime_t o_down_since_time; /* Objectively down since time. */ mstime_t down_after_period; /* Consider it down after that period. */ mstime_t info_refresh; /* Time at which we received INFO output from it. */ dict *renamed_commands; /* Commands renamed in this instance: Sentinel will use the alternative commands mapped on this table to send things like SLAVEOF, CONFING, INFO, ... */ /* Role and the first time we observed it. * This is useful in order to delay replacing what the instance reports * with our own configuration. We need to always wait some time in order * to give a chance to the leader to report the new configuration before * we do silly things. */ int role_reported; mstime_t role_reported_time; mstime_t slave_conf_change_time; /* Last time slave master addr changed. */ /* Master specific. */ dict *sentinels; /* Other sentinels monitoring the same master. */ dict *slaves; /* Slaves for this master instance. */ unsigned int quorum;/* Number of sentinels that need to agree on failure. */ int parallel_syncs; /* How many slaves to reconfigure at same time. */ char *auth_pass; /* Password to use for AUTH against master & replica. */ char *auth_user; /* Username for ACLs AUTH against master & replica. */ /* Slave specific. */ mstime_t master_link_down_time; /* Slave replication link down time. */ int slave_priority; /* Slave priority according to its INFO output. */ mstime_t slave_reconf_sent_time; /* Time at which we sent SLAVE OF <new> */ struct sentinelRedisInstance *master; /* Master instance if it's slave. */ char *slave_master_host; /* Master host as reported by INFO */ int slave_master_port; /* Master port as reported by INFO */ int slave_master_link_status; /* Master link status as reported by INFO */ unsigned long long slave_repl_offset; /* Slave replication offset. */ /* Failover */ char *leader; /* If this is a master instance, this is the runid of the Sentinel that should perform the failover. If this is a Sentinel, this is the runid of the Sentinel that this Sentinel voted as leader. */ uint64_t leader_epoch; /* Epoch of the 'leader' field. */ uint64_t failover_epoch; /* Epoch of the currently started failover. */ int failover_state; /* See SENTINEL_FAILOVER_STATE_* defines. */ mstime_t failover_state_change_time; mstime_t failover_start_time; /* Last failover attempt start time. */ mstime_t failover_timeout; /* Max time to refresh failover state. */ mstime_t failover_delay_logged; /* For what failover_start_time value we logged the failover delay. */ struct sentinelRedisInstance *promoted_slave; /* Promoted slave instance. */ /* Scripts executed to notify admin or reconfigure clients: when they * are set to NULL no script is executed. */ char *notification_script; char *client_reconfig_script; sds info; /* cached INFO output */ } sentinelRedisInstance;
-
创建连向主服务器的网络连接
Sentinel会和被监视的主服务器创建两个异步网络连接,一个是用来向Redis服务器发送命令和接收命令用的,另一个是用来订阅_sentinel_:hello频道用的。
3.2 监视流程
Sentinel默认以十秒一次的频率给主服务器发送INFO命令,并通过分析INFO命令来获取主服务器的当前信息。这里除了获得主服务器信息以外,还会取到从服务器的一些信息,并且Sentinel会给每个从服务器构建一个sentinelRedisInstance结构体来保存从服务器信息。
当每次有新的从服务器被加入进来,Sentinel还会构建和从服务器的命令连接和订阅连接,并且也是十秒一次向从服务器发送INFO命令。
除了发送INFO命令以外,Sentinel还会以两秒一次的频率向所有被监视的服务器发送PUBLISH命令,这样就能保证所有订阅了这个服务器的Sentinel都接收到相关的信息,除此之外,Sentinel还可以互相知晓彼此的存在。sentinelRedisInstance结构体中的sentinels字典就是用来保存其他Sentinel的地方。每个Sentinel还会互相建立命令连接来相互交换信息,不过Sentinel不会互相建立订阅连接。
Sentinel会以每秒一次,向所有建立了命令连接的机器发送PING命令,如果有机器在down-after-milliseconds时间范围内未响应的话,这个机器会被Sentinel标记成主观下线(Sentinel自己认为这个服务器不可用了)。当有机器被Sentinel认为主观下线之后,Sentinel会询问其他监视这个机器的Sentinel,如果返回的信息表示这个机器确实不能提供服务了,这时,所有Sentinel会将其标记为客观下线。被标记成下线并不意味着不被Sentinel监视了,Sentinel会定时监听这个机器,如果他又可以PING通了,Sentinel还是会重新将其标记为在线状态。如果是master节点不可用,那么当新的master被选出来之后,老的master就会被当做新master的从服务器。
master节点不可用的情况下,Sentinel会做一次故障转移。但是毕竟一台服务器可能有多个master,不可能所有Sentinel都做故障转移,所以在做故障转移之前,Sentinel会选举出一个领头的Sentinel来做这件事情,选举的规则就是每个Sentinel都向其他Sentinel发送一个SENTINEL is-master-down-by-addr命令,如果其他Sentinel有半数都通过了某个Sentinel,那么就由这个有个Sentinel作为领头有个Sentinel来执行故障转移(这里使用了raft算法)。
故障转移的时候需要做两步,一是选举一个新的master,新master会在所有正常的从节点中选取一个偏移量最大的节点作为新master,这里可能会涉及到一部分的数据丢失,不过这不可避免(可参阅《4.1 集群设计绕不开的话题》)。二是其他从服务器SLAVEOF这个新的master。
思考:
常规我们都是配置三台Sentinel来处理,如果Sentinel有一台挂了,刚好master服务器也挂了。Sentinel机制需要选举一台领头Sentinel处理故障转移,但是两台Sentinel会出现脑裂情况,所以这个时候Redis服务就不可用了。