Redis奇幻之旅（四）3.哨兵机制

3.哨兵机制

在《4.1 集群设计绕不开的话题》中我们提到了单点故障的问题，在《4.2 主从复制》中我们又提到了使用从库来做主库的备份，保证数据尽量不丢失。不过问题来了，如果主库挂掉了，我们如何知道呢？而且每次主库挂掉我们都让运维人员来将从库手动改成主库吗？如果代码里已经将主库连接写死了，是不是还得把所有影响到的项目都重新发布呢？这样显然不合理，所以，就有了哨兵机制。

哨兵机制（Redis Sentinel）有两个主要任务：

监控：定时给被监控的Redis实例做心跳检测，看看是否在正常工作
自动故障转移：当主Redis挂掉了，Sentinel会将从服务器升级为主服务器，并且给客户端提供新的主服务器地址。

3.1 Sentinel 启动初始化

当一个Sentinel启动时，它会执行以下步骤：

初始化服务器

这里的初始化服务器并不会像《3.1.2.1 Redis服务启动流程》中描述的那样完整的初始化Redis服务器，比如它不会载入RDB或AOF文件。
将普通Redis服务器的代码替换成Sentinel代码

Redis服务器启动的时候会载入命令表，而Sentinel启动载入的命令表和正常的不太一样，所以它只支持PING、SENTINEL、INFO、SUBSCRIBE、UNSUBSCRIBE、PSUBSCRIBE和PUNSUBSCRIBE这七个命令。除了这个以外还有一些其他的代码替换。

初始化Sentinel状态

这里会初始化一个sentinelState结构体，如下：

/* Main state. */
struct sentinelState {
    char myid[CONFIG_RUN_ID_SIZE+1]; /* This sentinel ID. */
    uint64_t current_epoch;         /* Current epoch. */
    dict *masters;      /* Dictionary of master sentinelRedisInstances.
                           Key is the instance name, value is the
                           sentinelRedisInstance structure pointer. */
    int tilt;           /* Are we in TILT mode? */
    int running_scripts;    /* Number of scripts in execution right now. */
    mstime_t tilt_start_time;       /* When TITL started. */
    mstime_t previous_time;         /* Last time we ran the time handler. */
    list *scripts_queue;            /* Queue of user scripts to execute. */
    char *announce_ip;  /* IP addr that is gossiped to other sentinels if
                           not NULL. */
    int announce_port;  /* Port that is gossiped to other sentinels if
                           non zero. */
    unsigned long simfailure_flags; /* Failures simulation. */
    int deny_scripts_reconfig; /* Allow SENTINEL SET ... to change script
                                  paths at runtime? */
} sentinel;

根据给定的配置文件，初始化Sentinel的监视主服务器列表

上述sentinelState结构体中的masters字典记录了所有被Sentinel监视的主服务器相关信息。其中key是被监视主服务器的名字，value是Redis服务器实例，用sentinelRedisInstance来保存信息，如下：

typedef struct sentinelRedisInstance {
    int flags;      /* See SRI_... defines */
    char *name;     /* Master name from the point of view of this sentinel. */
    char *runid;    /* Run ID of this instance, or unique ID if is a Sentinel.*/
    uint64_t config_epoch;  /* Configuration epoch. */
    sentinelAddr *addr; /* Master host. */
    instanceLink *link; /* Link to the instance, may be shared for Sentinels. */
    mstime_t last_pub_time;   /* Last time we sent hello via Pub/Sub. */
    mstime_t last_hello_time; /* Only used if SRI_SENTINEL is set. Last time
                                 we received a hello from this Sentinel
                                 via Pub/Sub. */
    mstime_t last_master_down_reply_time; /* Time of last reply to
                                             SENTINEL is-master-down command. */
    mstime_t s_down_since_time; /* Subjectively down since time. */
    mstime_t o_down_since_time; /* Objectively down since time. */
    mstime_t down_after_period; /* Consider it down after that period. */
    mstime_t info_refresh;  /* Time at which we received INFO output from it. */
    dict *renamed_commands;     /* Commands renamed in this instance:
                                   Sentinel will use the alternative commands
                                   mapped on this table to send things like
                                   SLAVEOF, CONFING, INFO, ... */

    /* Role and the first time we observed it.
     * This is useful in order to delay replacing what the instance reports
     * with our own configuration. We need to always wait some time in order
     * to give a chance to the leader to report the new configuration before
     * we do silly things. */
    int role_reported;
    mstime_t role_reported_time;
    mstime_t slave_conf_change_time; /* Last time slave master addr changed. */

    /* Master specific. */
    dict *sentinels;    /* Other sentinels monitoring the same master. */
    dict *slaves;       /* Slaves for this master instance. */
    unsigned int quorum;/* Number of sentinels that need to agree on failure. */
    int parallel_syncs; /* How many slaves to reconfigure at same time. */
    char *auth_pass;    /* Password to use for AUTH against master & replica. */
    char *auth_user;    /* Username for ACLs AUTH against master & replica. */

    /* Slave specific. */
    mstime_t master_link_down_time; /* Slave replication link down time. */
    int slave_priority; /* Slave priority according to its INFO output. */
    mstime_t slave_reconf_sent_time; /* Time at which we sent SLAVE OF <new> */
    struct sentinelRedisInstance *master; /* Master instance if it's slave. */
    char *slave_master_host;    /* Master host as reported by INFO */
    int slave_master_port;      /* Master port as reported by INFO */
    int slave_master_link_status; /* Master link status as reported by INFO */
    unsigned long long slave_repl_offset; /* Slave replication offset. */
    /* Failover */
    char *leader;       /* If this is a master instance, this is the runid of
                           the Sentinel that should perform the failover. If
                           this is a Sentinel, this is the runid of the Sentinel
                           that this Sentinel voted as leader. */
    uint64_t leader_epoch; /* Epoch of the 'leader' field. */
    uint64_t failover_epoch; /* Epoch of the currently started failover. */
    int failover_state; /* See SENTINEL_FAILOVER_STATE_* defines. */
    mstime_t failover_state_change_time;
    mstime_t failover_start_time;   /* Last failover attempt start time. */
    mstime_t failover_timeout;      /* Max time to refresh failover state. */
    mstime_t failover_delay_logged; /* For what failover_start_time value we
                                       logged the failover delay. */
    struct sentinelRedisInstance *promoted_slave; /* Promoted slave instance. */
    /* Scripts executed to notify admin or reconfigure clients: when they
     * are set to NULL no script is executed. */
    char *notification_script;
    char *client_reconfig_script;
    sds info; /* cached INFO output */
} sentinelRedisInstance;

创建连向主服务器的网络连接

Sentinel会和被监视的主服务器创建两个异步网络连接，一个是用来向Redis服务器发送命令和接收命令用的，另一个是用来订阅_sentinel_:hello频道用的。

3.2 监视流程

Sentinel默认以十秒一次的频率给主服务器发送INFO命令，并通过分析INFO命令来获取主服务器的当前信息。这里除了获得主服务器信息以外，还会取到从服务器的一些信息，并且Sentinel会给每个从服务器构建一个sentinelRedisInstance结构体来保存从服务器信息。

当每次有新的从服务器被加入进来，Sentinel还会构建和从服务器的命令连接和订阅连接，并且也是十秒一次向从服务器发送INFO命令。

除了发送INFO命令以外，Sentinel还会以两秒一次的频率向所有被监视的服务器发送PUBLISH命令，这样就能保证所有订阅了这个服务器的Sentinel都接收到相关的信息，除此之外，Sentinel还可以互相知晓彼此的存在。sentinelRedisInstance结构体中的sentinels字典就是用来保存其他Sentinel的地方。每个Sentinel还会互相建立命令连接来相互交换信息，不过Sentinel不会互相建立订阅连接。

Sentinel会以每秒一次，向所有建立了命令连接的机器发送PING命令，如果有机器在down-after-milliseconds时间范围内未响应的话，这个机器会被Sentinel标记成主观下线（Sentinel自己认为这个服务器不可用了）。当有机器被Sentinel认为主观下线之后，Sentinel会询问其他监视这个机器的Sentinel，如果返回的信息表示这个机器确实不能提供服务了，这时，所有Sentinel会将其标记为客观下线。被标记成下线并不意味着不被Sentinel监视了，Sentinel会定时监听这个机器，如果他又可以PING通了，Sentinel还是会重新将其标记为在线状态。如果是master节点不可用，那么当新的master被选出来之后，老的master就会被当做新master的从服务器。

master节点不可用的情况下，Sentinel会做一次故障转移。但是毕竟一台服务器可能有多个master，不可能所有Sentinel都做故障转移，所以在做故障转移之前，Sentinel会选举出一个领头的Sentinel来做这件事情，选举的规则就是每个Sentinel都向其他Sentinel发送一个SENTINEL is-master-down-by-addr命令，如果其他Sentinel有半数都通过了某个Sentinel，那么就由这个有个Sentinel作为领头有个Sentinel来执行故障转移（这里使用了raft算法）。

故障转移的时候需要做两步，一是选举一个新的master，新master会在所有正常的从节点中选取一个偏移量最大的节点作为新master，这里可能会涉及到一部分的数据丢失，不过这不可避免（可参阅《4.1 集群设计绕不开的话题》）。二是其他从服务器SLAVEOF这个新的master。

思考：

常规我们都是配置三台Sentinel来处理，如果Sentinel有一台挂了，刚好master服务器也挂了。Sentinel机制需要选举一台领头Sentinel处理故障转移，但是两台Sentinel会出现脑裂情况，所以这个时候Redis服务就不可用了。

Redis奇幻之旅（四）3.哨兵机制

3.哨兵机制

3.1 Sentinel 启动初始化

3.2 监视流程

推荐阅读更多精彩内容