redis 高可用方案

常见的几种方案：

如果程序支持 sentinel ，直接使用 sentinel （推荐）
如果程序不支持 sentinel，只支持单节点模式，可考虑 keepalived + sentinel
风险：切换时间小于5s，切换期间 redis 虚地址只读不可写。
要监控和确保 keepalived 进程的存活，如果 redis master 上的 keepalived 进程挂掉，则虚地址会切换到从节点，从而导致虚地址只读不可写入。
如果程序不支持 sentinel，只支持单节点模式，可考虑 haproxy + sentinel (再加 keepalived 做虚地址切换)
风险：经过测试和观察，切换大约15s，主从切换期间 haproxy 的健康检测会认为
新旧主均为健康状态，轮询流量分发，导致主从切换期间 redis 虚地址只读状态。
同时注意：需要 haproxy 1.5 及以上的版本（CentOS7 自带）。
推荐使用此方法，三个节点都运行haproxy，haproxy 监听 6379 端口，redis 监听 6378 端口，sentinel 监听 26379 端口。连接任意节点 6379 端口，都可以进行读写。
haproxy 使用 balance leastconn 最小连接数算法，可以尽快切换到新的主。
监听 sentinel 事件的脚本，切换主从后，触发自定义脚本。通过脚本调用 Consul 或是 etcd，再通过 confd 生成 haproxy 的配置文件。
使用 redis cluster
风险：需要客户端驱动支持，适合较大规模的业务。
单个集群建议至少 9 台服务器，3 主，每主分配 2 从。（分成3个分片）

版本升级步骤：
3.x 升级到 4.x ，切换时间小于1分钟

redis master 老版本 --> redis 新版本（设置从可写）--> 两个 redis 新版本从
切换 dns
重启业务

keepalived + sentinel

/etc/keepalived/keepalived.conf

global_defs {
  router_id LVS_DEVEL
  script_user root
  enable_script_security
}
vrrp_script chk_process {
    script "/bin/bash /etc/keepalived/chk_redis_master.sh"
    interval 1
    weight 20
    timeout 2
    fall 2
    rise 1
}

vrrp_instance VI_1 {
 state BACKUP
 interface eth0
 virtual_router_id 200
 priority 101                      # 101 on master, 99,97 on backup
 #nopreempt                     # 使用默认的抢占模式，确保高优先级理解切换到master状态
 advert_int 1
 authentication {
     auth_type PASS
     auth_pass password
 }
  track_script {
    chk_process
  }
  virtual_ipaddress {
    192.168.10.140/24 dev eth0 label eth0:1     # VIP
  }
  unicast_src_ip 192.168.10.144        # My IP
  unicast_peer {                                  # Peer IP
    192.168.10.145                   
    192.168.10.146
  }

}

/etc/keepalived/chk_redis_master.sh

#!/bin/bash
#set -x
PWD=""
ROLE=""
PATH=$PATH:/usr/local/bin:/app/redis/bin

[ -r /etc/redis.conf ] && PWD=$(cat /etc/redis.conf | grep '^requirepass' | awk '{print $2}'|  tr -d "\"\'" )

ROLE=$(redis-cli -a "$PWD" info | grep role | tr -d '\r' )

echo $ROLE

redis_master() {
exit 0
}

redis_slave() {
exit 1
}

redis_fail() {
echo "FATAL: FAIL TO GET REDIS INFO"
exit 2
}


case $ROLE in
  "")
      redis_fail
      ;;
  "role:master")
      redis_master
      ;;
  "role:slave")
      redis_slave
      ;;
esac

haproxy + sentinel

# /etc/haproxy/haproxy.cfg

defaults
    log 127.0.0.1 local0
    #log 127.0.0.1 local0 debug
    mode tcp
    retries 1
    option httplog                  # 日志类别http日志格式
    option tcplog
    option dontlognull            # 不记录健康检查的日志信息
    option redispatch            # serverid对应服务器宕掉后,强制定向到其他健康的服务器
    option abortonclose         #当服务器负载很高的话,自动结束到当前处理比较久的连接
    maxconn 2000000
    timeout connect 3s
    timeout client  6s
    timeout server  6s
listen admin_stats
    bind 0.0.0.0:1080
    mode http
    option httplog
    maxconn 1000
    stats refresh 30s
    stats uri /haproxy.status    # curl '127.0.0.1:1080/haproxy.status;csv'
    stats realm Haproxy

listen redis 0.0.0.0:6380
    mode tcp
    balance leastconn
    retries 1
    option tcplog
    option tcp-check
    #uncomment these lines if you have basic auth
    #tcp-check send AUTH\ password\r\n
    #tcp-check expect string +OK
    tcp-check send PING\r\n
    tcp-check expect string +PONG
    tcp-check send info\ replication\r\n
    tcp-check expect string role:master
    tcp-check send QUIT\r\n
    tcp-check expect string +OK
    server redis-1 192.168.10.244:6379 maxconn 10240 check inter 1s
    server redis-2 192.168.10.245:6379 maxconn 10240 check inter 1s
    server redis-3 192.168.10.246:6379 maxconn 10240 check inter 1s

# /etc/keepalived/keepalived.conf
global_defs {
  router_id LVS_DEVEL
  script_user root
  enable_script_security
}
vrrp_script chk_process {
    script "killall -0 haproxy"
    interval 1
    weight -20
    timeout 2
    fall 2
    rise 1
}
vrrp_script chk_tcp_port {
    script "/usr/bin/timeout 1 /bin/bash -c '</dev/tcp/127.0.0.1/6379'"
    interval 1
    weight -20
    timeout 2
    fall 2
    rise 1
}
vrrp_instance VI_1 {
  state BACKUP
  virtual_router_id 200
  advert_int 1
  authentication {
     auth_type PASS
     auth_pass password
  }
  track_script {
    chk_tcp_port
  }
  interface eth0
  virtual_ipaddress {           # VIP
    10.10.6.30/24 dev eth0 label eth0:1     
  }
  nopreempt                      # 使用非抢占模式，随便哪一个当 master 都可以
  priority 101                      # 101 on master, 99,97 on backup
  unicast_src_ip 10.10.6.31       # My IP
  unicast_peer {                                  # Peer IP
    10.10.6.32
    10.10.6.33
  }
}

redis sentinel 模式下，修改 redis 实例的端口地址

修改一台从节点的配置文件，端口由 6379 改为 6378，重启 redis
修改所有集群中的 sentinel 配置，删除旧 IP+端口
所有sentinel 节点上运行 sentinel reset <SentinelName>

redis-cli -p 26379 sentinel reset <SentinelName>

修改两台从节点后，执行 sentinel failover <SentinelName> 切换主从，将原来的主节点变为从节点，可以继续修改。

主从节点改造为 Sentinel 的操作

先搭建好主从关系，然后在三台 sentinel 上都执行：

redis-cli -p 26379 sentinel monitor <SentinelName> 1.2.3.4 6378 2
redis-cli -p 26379 sentinel set  <SentinelName>  auth-pass <redis-password>
redis-cli -p 26379 info Sentinel
redis-cli -p 26379 sentinel slaves <SentinelName>

配置成功后，多执行几次 failover 操作，观察能否正常切换。

sentinel 快速去掉复制集中的一个节点

先停止需要下线的redis 实例
在所有sentinel 节点上运行 sentinel reset <SentinelName>
关闭下线的redis 实例的配置文件中，去掉 "slaveof " 相关信息。防止启动服务后再次连上sentinel。

https://blog.51cto.com/tianshili/1759289

redis 高可用方案

keepalived + sentinel

haproxy + sentinel

redis sentinel 模式下，修改 redis 实例的端口地址

主从节点改造为 Sentinel 的操作

sentinel 快速去掉复制集中的一个节点

推荐阅读更多精彩内容