前言
最近在看关于redis的一些东西,以前都只是用用redis,很多redis本身的结构和原理不太清楚,最近公司也在考虑对于双活或者多活场景的一个分布式缓存设计,所以也要自己开始看看这方面的知识。(本文很大部分内容来自于redis 4.x cookbook,然鹅本文使用的redis版本是3.2.x)
主从
单机的redis本文就不在赘述了,网上下载安装使用教程非常多,由于单机的redis无法做到数据的备份乃至后面的高可用,因此我们需要考虑针对redis进行主从的复制部署(我在本机上启动了两个vm虚拟机来模拟主从甚至是两个机房,ip分别是192.168.93.101和192.168.93.102)。
首先我们来看看配置方式,针对master(192.168.93.101)我们需要对redis.conf做如下改动:
- 去掉bind 127.0.0.1 这一行,保证我们可以接受外部ip的访问
- protected-mode 从yes改为no 放弃保护模式(测试用)
然后我们在slave(192.168.93.102)上对redis.conf做如下改动:
- 针对master的改动也同样作用到slave
- 在replication部分添加一行slaveof 192.168.93.101 6379
很明显,第二个改动就是表明了这个redis实例是从属于192.168.93.101:6379这个master的。
这时候我们来分别启动一下master和slave来看看他们的日志。
首先是master:
27619:M 26 Dec 11:15:29.485 * DB loaded from disk: 0.000 seconds
27619:M 26 Dec 11:15:29.486 * The server is now ready to accept connections on port 6379
27619:M 26 Dec 11:15:38.115 * Slave 192.168.93.102:6379 asks for synchronization
27619:M 26 Dec 11:15:38.115 * Full resync requested by slave 192.168.93.102:6379
27619:M 26 Dec 11:15:38.115 * Starting BGSAVE for SYNC with target: disk
27619:M 26 Dec 11:15:38.116 * Background saving started by pid 27625
27625:C 26 Dec 11:15:38.117 * DB saved on disk
27625:C 26 Dec 11:15:38.117 * RDB: 6 MB of memory used by copy-on-write
27619:M 26 Dec 11:15:38.205 * Background saving terminated with success
27619:M 26 Dec 11:15:38.205 * Synchronization with slave 192.168.93.102:6379 succeeded
从master的日志我们可以看到,最开始从磁盘中恢复(我们没有持久化备份,所以无视),然后在6379端口上监听,然后接收到slave的同步请求,并发起了一次全量同步。全量同步会fork一个进程对自己现在的存储情况生成一个rdb文件的快照,也就是日志中看到的Starting BGSAVE for SYNC with target: disk。这个进程最终会把生成的rdb文件存放在磁盘,在生成完毕之后发送给slave来完成同步。
我们再来看看slave:
97094:S 26 Dec 11:15:33.368 * The server is now ready to accept connections on port 6379
97094:S 26 Dec 11:15:33.368 * Connecting to MASTER 192.168.93.101:6379
97094:S 26 Dec 11:15:33.368 * MASTER <-> SLAVE sync started
97094:S 26 Dec 11:15:33.368 * Non blocking connect for SYNC fired the event.
97094:S 26 Dec 11:15:33.369 * Master replied to PING, replication can continue...
97094:S 26 Dec 11:15:33.369 * Partial resynchronization not possible (no cached master)
97094:S 26 Dec 11:15:33.371 * Full resync from master: 4212c5a9b3635107c01e819cb6bf7a01a199e28d:1
97094:S 26 Dec 11:15:33.460 * MASTER <-> SLAVE sync: receiving 77 bytes from master
97094:S 26 Dec 11:15:33.460 * MASTER <-> SLAVE sync: Flushing old data
97094:S 26 Dec 11:15:33.460 * MASTER <-> SLAVE sync: Loading DB in memory
97094:S 26 Dec 11:15:33.460 * MASTER <-> SLAVE sync: Finished with success
从slave的日志我们可以看到,它在启动之后试图连接master并在接收到回复之后开始了全量复制,最终完成。接下来我们redis-cli来对主从进行下操作,可以看到如下结果:
master:
127.0.0.1:6379> INFO REPLICATION
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.93.102,port=6379,state=online,offset=435,lag=0
master_repl_offset:435
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:434
这里面看到master的角色信息,还有现在已连接的slave信息,还有一个需要注意的就是offset偏移,这个偏移标记会随着master上的数据时间的发生而增长,slave可以通过这个标记来比对自己与master之间的差距。从上面的信息我们可以看到,master和slave的偏移都是435,说明两边已经完成了一个同步。
slave:
127.0.0.1:6379> INFO REPLICATION
# Replication
role:slave
master_host:192.168.93.102
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:435
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
slave的信息跟master差不多,除了表明了master的地址之外,也表明了自己的复制流偏移。
我们针对主从做一些操作试试:
# master
127.0.0.1:6379> set "key1" "value"
OK
# slave
127.0.0.1:6379> get "key1"
"value"
127.0.0.1:6379> set "key1" "value2"
(error) READONLY You can't write against a read only slave.
可以看到,我们可以在master上进行写入并且在slave上进行读取,slave没有写入权限。
这时候我把从slave 停掉,并且在这段时间写入master,再启动slave试试:
# slave
127.0.0.1:6379> shutdown
not connected> INFO REPLICATION
# master
127.0.0.1:6379> INFO REPLICATION
# Replication
role:master
connected_slaves:0
master_repl_offset:772
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:771
127.0.0.1:6379> set "key2" "value2"
OK
127.0.0.1:6379> set "key3" "value3"
OK
# slave after start up
not connected> INFO REPLICATION
# Replication
role:slave
master_host:192.168.93.101
master_port:6379
master_link_status:up
master_last_io_seconds_ago:8
master_sync_in_progress:0
slave_repl_offset:884
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
127.0.0.1:6379> get "key2"
"value2"
127.0.0.1:6379> get "key3"
"value3"
这段日志我们看到,首先我们关掉了slave,这时候我们在master上查看复制信息已经看不到我们的slave信息了,然后我们写入两个key,接着我们启动slave,查看复制master信息,然后获取这两个key的值都没问题。说明几点:
- 运行过程中的slave宕机不影响master的写入
- slave重启之后会重新与master取得联系并进行复制
从slave的日志中我们可以看到,重启的slave进行了一次全量复制:
98662:S 26 Dec 15:03:40.071 * Connecting to MASTER 192.168.93.101:6379
98662:S 26 Dec 15:03:40.071 * MASTER <-> SLAVE sync started
98662:S 26 Dec 15:03:40.072 * Non blocking connect for SYNC fired the event.
98662:S 26 Dec 15:03:40.072 * Master replied to PING, replication can continue...
98662:S 26 Dec 15:03:40.072 * Partial resynchronization not possible (no cached master)
98662:S 26 Dec 15:03:40.075 * Full resync from master: a39261f7f60964642a4b9f5b20de725263f8b59b:884
98662:S 26 Dec 15:03:40.100 * MASTER <-> SLAVE sync: receiving 120 bytes from master
98662:S 26 Dec 15:03:40.100 * MASTER <-> SLAVE sync: Flushing old data
98662:S 26 Dec 15:03:40.100 * MASTER <-> SLAVE sync: Loading DB in memory
98662:S 26 Dec 15:03:40.100 * MASTER <-> SLAVE sync: Finished with success
原因是没有缓存的master信息,所以无法进行部分复制。在redis 4.x之后,对于一个slave,他的master的id(日志中的a39261f7f60964642a4b9f5b20de725263f8b59b:884)也会被写入本地的rdb文件,因此在redis 4.x之后重启的slave也可以进行部分复制了。
在理解了主从的机理之后,我们就会思考,如果master发生宕机这时候有什么措施能够做到业务的高可用呢?这就要提到redis的哨兵机制了。
哨兵机制
哨兵(Sentinel)是redis 2.8之后推出的一个高可用的功能,它是针对专主从模式的基础上做的一个故障检测判断和自动主从切换机制。
我们在上一段主从的基础上继续,在master侧编辑一个sentinel-master.conf:
protected-mode no
port 26379
sentinel monitor mymaster 127.0.0.1 6379 2
第一行关闭保护状态,第二行表示这个哨兵是在26379端口上进行监听,第三行表示这个哨兵监听的master地址,最后一个2表示quorum,即必须要有两个哨兵都认为这个master挂掉,那这个master才会被挂掉。
然后我们通过命令来启动哨兵:
./bin/redis-server ./conf/sentinel-master.conf --sentinel
然后我们来看看日志输出:
15772:X 25 Dec 13:12:47.457 # Sentinel ID is 66bd4c9ac1ee9456997085032d1b4b8da5342471
15772:X 25 Dec 13:12:47.457 # +monitor master mymaster 192.168.93.101 6379 quorum 2
15772:X 25 Dec 13:12:47.457 * +slave slave 192.168.93.102:6379 192.168.93.102 6379 @ mymaster 192.168.93.101 6379
这时候我们看到在日志里面输出了,启动了一个哨兵并且监听master地址,并且发现了我们的slave。
然后,我们在slave侧以相同的方法起两个哨兵,其中一个配置文件如下:
protected-mode no
port 26379
sentinel monitor mymaster 192.168.93.101 6379 2
注意这个配置文件前两行没什么区别(第二个哨兵监听的端口会有变化,我自己设置的是26380),变化的是第三行监听的master的地址,主要是ip地址需要换为master的地址,这个也好理解对吧。
然后我们启动这两个哨兵,可以看到他们的日志输出是这样:
88462:X 25 Dec 13:14:00.774 # Sentinel ID is d7365c4e626ccf9ce091b0213515d64d46afb28d
88462:X 25 Dec 13:14:00.774 # +monitor master mymaster 192.168.93.101 6379 quorum 2
88462:X 25 Dec 13:14:00.774 * +slave slave 192.168.93.102:6379 192.168.93.102 6379 @ mymaster 192.168.93.101 6379
88462:X 25 Dec 13:14:01.837 * +sentinel sentinel 9ce2291ac0bfc2fe9973d51b0c610ea1ad5d6690 192.168.93.102 26379 @ mymaster 192.168.93.101 6379
88462:X 25 Dec 13:14:01.839 # +new-epoch 3
88462:X 25 Dec 13:14:02.511 * +sentinel sentinel 66bd4c9ac1ee9456997085032d1b4b8da5342471 192.168.93.101 26379 @ mymaster 192.168.93.101 6379
这个日志中我们看到,除了表示了监听了master并发现本机的slave之外,哨兵还发现了远端其他两个哨兵,通过ip可以看到一个哨兵在slave侧一个哨兵在master侧。
这时候我们回过头来看master侧的哨兵日志,会发现多了两行:
15772:X 25 Dec 13:12:49.028 * +sentinel sentinel 9ce2291ac0bfc2fe9973d51b0c610ea1ad5d6690 192.168.93.102 26379 @ mymaster 192.168.93.101 6379
15772:X 25 Dec 13:14:01.408 * +sentinel sentinel d7365c4e626ccf9ce091b0213515d64d46afb28d 192.168.93.102 26380 @ mymaster 192.168.93.101 6379
这表示我们master侧的哨兵也发现了slave侧的两个哨兵。
大家可能会有疑问,为什么我们要这样来分布我们的哨兵呢?后面我会解释。
至于为什么至少要三个哨兵,很简单我们定义了quorum为2,那如果是2个哨兵,一旦有哨兵挂掉,残存的一个哨兵是无法完成一个挂掉的master的认定的,导致我们的failover无法执行,因此必须要3个。
这时候我们通过redis-cli来shutdown我们的master,模拟master挂掉的情况。这时候我们先来看看slave的日志:
88540:S 25 Dec 13:17:12.685 # Connection with master lost.
88540:S 25 Dec 13:17:12.685 * Caching the disconnected master state.
88540:S 25 Dec 13:17:12.735 * Connecting to MASTER 192.168.93.101:6379
88540:S 25 Dec 13:17:12.735 * MASTER <-> SLAVE sync started
88540:S 25 Dec 13:17:12.735 # Error condition on socket for SYNC: Connection refused
88540:S 25 Dec 13:17:13.737 * Connecting to MASTER 192.168.93.101:6379
88540:S 25 Dec 13:17:13.737 * MASTER <-> SLAVE sync started
88540:S 25 Dec 13:17:13.738 # Error condition on socket for SYNC: Connection refused
...
88540:S 25 Dec 13:17:42.798 # Error condition on socket for SYNC: Connection refused
88540:M 25 Dec 13:17:43.130 * Discarding previously cached master state.
88540:M 25 Dec 13:17:43.130 * MASTER MODE enabled (user request from 'id=5 addr=192.168.93.102:35965 fd=9 name=sentinel-d7365c4e-cmd age=92 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qb
88540:M 25 Dec 13:17:43.132 # CONFIG REWRITE executed with success.
我们注意到会有很长的时间slave一直在打印日志表示master连接不上,然后到达一定的时间间隔之后,这个slave会首先丢弃之前的master状态,然后自己被提升为master,最后完成config文件的修改保存。我们可以再去看看新的master对应的redis.conf的变化,可以发现里面写的slaveof已经不见了,说明已经动态修改了自己的conf文件。
然后我们再来看看哨兵的日志:
88462:X 25 Dec 13:17:42.809 # +sdown master mymaster 192.168.93.101 6379
88462:X 25 Dec 13:17:42.900 # +odown master mymaster 192.168.93.101 6379 #quorum 3/2
88462:X 25 Dec 13:17:42.900 # +new-epoch 4
88462:X 25 Dec 13:17:42.900 # +try-failover master mymaster 192.168.93.101 6379
88462:X 25 Dec 13:17:42.901 # +vote-for-leader d7365c4e626ccf9ce091b0213515d64d46afb28d 4
88462:X 25 Dec 13:17:42.903 # 9ce2291ac0bfc2fe9973d51b0c610ea1ad5d6690 voted for d7365c4e626ccf9ce091b0213515d64d46afb28d 4
88462:X 25 Dec 13:17:42.904 # 66bd4c9ac1ee9456997085032d1b4b8da5342471 voted for d7365c4e626ccf9ce091b0213515d64d46afb28d 4
88462:X 25 Dec 13:17:42.991 # +elected-leader master mymaster 192.168.93.101 6379
88462:X 25 Dec 13:17:42.991 # +failover-state-select-slave master mymaster 192.168.93.101 6379
88462:X 25 Dec 13:17:43.053 # +selected-slave slave 192.168.93.102:6379 192.168.93.102 6379 @ mymaster 192.168.93.101 6379
88462:X 25 Dec 13:17:43.053 * +failover-state-send-slaveof-noone slave 192.168.93.102:6379 192.168.93.102 6379 @ mymaster 192.168.93.101 6379
88462:X 25 Dec 13:17:43.130 * +failover-state-wait-promotion slave 192.168.93.102:6379 192.168.93.102 6379 @ mymaster 192.168.93.101 6379
88462:X 25 Dec 13:17:43.384 # +promoted-slave slave 192.168.93.102:6379 192.168.93.102 6379 @ mymaster 192.168.93.101 6379
88462:X 25 Dec 13:17:43.384 # +failover-state-reconf-slaves master mymaster 192.168.93.101 6379
88462:X 25 Dec 13:17:43.436 # +failover-end master mymaster 192.168.93.101 6379
88462:X 25 Dec 13:17:43.436 # +switch-master mymaster 192.168.93.101 6379 192.168.93.102 6379
88462:X 25 Dec 13:17:43.436 * +slave slave 192.168.93.101:6379 192.168.93.101 6379 @ mymaster 192.168.93.102 6379
88462:X 25 Dec 13:18:13.448 # +sdown slave 192.168.93.101:6379 192.168.93.101 6379 @ mymaster 192.168.93.102 6379
日志中首先判定了老的master挂掉了,然后推选了一个新的leader(这个日志里面显示推举出来的leader就是自己,所以把整个提升slave到master的步骤都打出来了),更新自己的config文件,然后切换master到slave上,标定老的master为slave,并判定新的slave挂掉。整个日志写的还是比较清晰和简单。我们再来看看哨兵的conf文件,看有没有什么变化:
sentinel monitor mymaster 192.168.93.102 6379 2
# Generated by CONFIG REWRITE
sentinel known-slave mymaster 192.168.93.101 6379
sentinel known-sentinel mymaster 192.168.93.101 26379 66bd4c9ac1ee9456997085032d1b4b8da5342471
sentinel known-sentinel mymaster 192.168.93.102 26380 d7365c4e626ccf9ce091b0213515d64d46afb28d
sentinel current-epoch 4
首先的一个变化就是哨兵监控的master地址变到了我们新的master上。然后就是config文件最下方有一段是由配置重写过程自动生成的,里面标明了已经知晓的slave和已经知晓的哨兵,这里注意的就是挂掉的master被认为是已经知晓的slave。
到这里我们就模拟了一整个哨兵支撑redis主从高可用的场景。我们可能有疑问,哨兵是如何发现彼此的呢?答案是哨兵会在自己监控的master上的__sentinel__:hello频道上定起发布自己的信息,这样当多个哨兵监控到一个master的时候,他们就能够知道各自的ip和端口并进行tcp通信,我们可以通过redis-cli注册到这个频道来看:
$ ./redis-cli
127.0.0.1:6379> SUBSCRIBE __sentinel__:hello
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "__sentinel__:hello"
3) (integer) 1
1) "message"
2) "__sentinel__:hello"
3) "192.168.93.102,26380,d7365c4e626ccf9ce091b0213515d64d46afb28d,4,mymaster,192.168.93.102,6379,4"
1) "message"
2) "__sentinel__:hello"
3) "192.168.93.101,26379,66bd4c9ac1ee9456997085032d1b4b8da5342471,4,mymaster,192.168.93.102,6379,4"
1) "message"
2) "__sentinel__:hello"
3) "192.168.93.102,26379,9ce2291ac0bfc2fe9973d51b0c610ea1ad5d6690,4,mymaster,192.168.93.102,6379,4"
1) "message"
2) "__sentinel__:hello"
3) "192.168.93.102,26380,d7365c4e626ccf9ce091b0213515d64d46afb28d,4,mymaster,192.168.93.102,6379,4"
1) "message"
2) "__sentinel__:hello"
3) "192.168.93.101,26379,66bd4c9ac1ee9456997085032d1b4b8da5342471,4,mymaster,192.168.93.102,6379,4"
1) "message"
2) "__sentinel__:hello"
3) "192.168.93.102,26379,9ce2291ac0bfc2fe9973d51b0c610ea1ad5d6690,4,mymaster,192.168.93.102,6379,4"
1) "message"
2) "__sentinel__:hello"
3) "192.168.93.102,26380,d7365c4e626ccf9ce091b0213515d64d46afb28d,4,mymaster,192.168.93.102,6379,4"
1) "message"
2) "__sentinel__:hello"
3) "192.168.93.101,26379,66bd4c9ac1ee9456997085032d1b4b8da5342471,4,mymaster,192.168.93.102,6379,4"
1) "message"
2) "__sentinel__:hello"
3) "192.168.93.102,26379,9ce2291ac0bfc2fe9973d51b0c610ea1ad5d6690,4,mymaster,192.168.93.102,6379,4"
1) "message"
2) "__sentinel__:hello"
3) "192.168.93.102,26380,d7365c4e626ccf9ce091b0213515d64d46afb28d,4,mymaster,192.168.93.102,6379,4"
从这里我们看到三个哨兵的信息这里面都体现了。
双活机房的考虑
在企业应用场景中,会涉及到双活机房的一个统一redis集群部署的问题,对于集群部署由上层代理方案(codis\twemproxy),也有原生的redis cluster方案。redis cluster的问题是当一个机房宕机的情况下肯定会有一半的master挂掉,这时候cluster是无法继续工作的,所以我考虑使用一个上层代理来做key的slot分槽位和逻辑分片,底层还是用redis哨兵来做一个高可用的架构。这个时候需要考虑的就是在一边机房宕机的情况下如何应用哨兵来完成failover。对于这一块我是这样思考的:
- 对于master侧的机房,其实对于哨兵的数量是没有特别硬性的要求了,因为就算另外那边机房挂掉,master还活着,还能写入和同步到本地机房的slave(参照主从部分的内容)
- 对于slave侧的机房,如果master侧机房宕机了,那么就必须在本地机房有足够的哨兵来完成master宕机的确认和failover,所以这一侧对哨兵是有硬性要求的,至少要达到quorum
因此我的设计就跟之前哨兵部分的接近了,即master侧一个哨兵,slave侧两个,然后我通过关掉虚拟机或者设置iptables来模拟机房宕机来看看两边的表现。
首先我们在master这边加入如下规则:
iptables -I INPUT -s 192.168.93.102 -j DROP
过了一段时间,我们看到slave侧的哨兵有如下日志打出:
99428:X 26 Dec 16:38:01.056 # +sdown master mymaster 192.168.93.101 6379
99428:X 26 Dec 16:38:01.056 # +sdown sentinel 66bd4c9ac1ee9456997085032d1b4b8da5342471 192.168.93.101 26379 @ mymaster 192.168.93.101 6379
99428:X 26 Dec 16:38:01.133 # +odown master mymaster 192.168.93.101 6379 #quorum 2/2
99428:X 26 Dec 16:38:01.133 # +new-epoch 5
99428:X 26 Dec 16:38:01.133 # +try-failover master mymaster 192.168.93.101 6379
99428:X 26 Dec 16:38:01.134 # +vote-for-leader d7365c4e626ccf9ce091b0213515d64d46afb28d 5
99428:X 26 Dec 16:38:01.136 # 9ce2291ac0bfc2fe9973d51b0c610ea1ad5d6690 voted for d7365c4e626ccf9ce091b0213515d64d46afb28d 5
99428:X 26 Dec 16:38:01.205 # +elected-leader master mymaster 192.168.93.101 6379
99428:X 26 Dec 16:38:01.205 # +failover-state-select-slave master mymaster 192.168.93.101 6379
99428:X 26 Dec 16:38:01.306 # +selected-slave slave 192.168.93.102:6379 192.168.93.102 6379 @ mymaster 192.168.93.101 6379
99428:X 26 Dec 16:38:01.306 * +failover-state-send-slaveof-noone slave 192.168.93.102:6379 192.168.93.102 6379 @ mymaster 192.168.93.101 6379
99428:X 26 Dec 16:38:01.406 * +failover-state-wait-promotion slave 192.168.93.102:6379 192.168.93.102 6379 @ mymaster 192.168.93.101 6379
99428:X 26 Dec 16:38:02.151 # +promoted-slave slave 192.168.93.102:6379 192.168.93.102 6379 @ mymaster 192.168.93.101 6379
99428:X 26 Dec 16:38:02.151 # +failover-state-reconf-slaves master mymaster 192.168.93.101 6379
99428:X 26 Dec 16:38:02.204 # +failover-end master mymaster 192.168.93.101 6379
99428:X 26 Dec 16:38:02.204 # +switch-master mymaster 192.168.93.101 6379 192.168.93.102 6379
99428:X 26 Dec 16:38:02.204 * +slave slave 192.168.93.101:6379 192.168.93.101 6379 @ mymaster 192.168.93.102 6379
99428:X 26 Dec 16:38:32.240 # +sdown slave 192.168.93.101:6379 192.168.93.101 6379 @ mymaster 192.168.93.102 6379
这里面可以看到,首先哨兵发现master和master侧的哨兵都挂掉了(网络不通导致),然后有两台哨兵都确认master挂掉了,达到了quorum,接着就开始选取leader进行failover,并最终完成了slave的提升工作,这一段跟之前哨兵部分差不多。我们再来看看slave的日志:
98662:M 26 Dec 16:38:01.407 # Connection with master lost.
98662:M 26 Dec 16:38:01.407 * Caching the disconnected master state.
98662:M 26 Dec 16:38:01.407 * Discarding previously cached master state.
98662:M 26 Dec 16:38:01.407 * MASTER MODE enabled (user request from 'id=6 addr=192.168.93.102:50662 fd=9 name=sentinel-d7365c4e-cmd age=1137 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')
98662:M 26 Dec 16:38:01.411 # CONFIG REWRITE executed with success.
98662:M 26 Dec 16:38:01.448 * 1 changes in 900 seconds. Saving...
98662:M 26 Dec 16:38:01.449 * Background saving started by pid 99556
99556:C 26 Dec 16:38:01.453 * DB saved on disk
99556:C 26 Dec 16:38:01.453 * RDB: 6 MB of memory used by copy-on-write
98662:M 26 Dec 16:38:01.549 * Background saving terminated with success
这一段也跟之前哨兵差不多,也就是完成了自己提升。这时候我们通过redis-cli来看新的master的信息:
127.0.0.1:6379> INFO REPLICATION
# Replication
role:master
connected_slaves:0
master_repl_offset:227866
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
可以看到他自己的角色已经提升到master了。
我们再反过来看看之前的老的master,对它来说,它的视角是slave侧机房挂掉了,这时候他应该还是能够访问和写入的,只是slave没了,我们来通过redis-cli看看:
127.0.0.1:6379> INFO REPLICATION
# Replication
role:master
connected_slaves:0
master_repl_offset:310602
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:310601
结果也应证了我的猜想。
结语
至此本文针对redis主从、哨兵高可用以及双机房的一个部署设计做了一个阐述和分析,希望大家喜欢!