1.Redis集群的搭建流程

网上有很多这样的blog,在这里就不一一赘述了

Redis 3.0集群模式的搭建

https://www.cnblogs.com/wuxl360/p/5920330.html

Redis三种方式的搭建方式 redis-5.0.0

https://blog.csdn.net/qq_20597727/article/details/83385737

2.Redis的集群完整的配置文件redis.conf

bind "主机地址"
protected-mode no
port 6379
maxmemory 10gb
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
supervised no
pidfile "/redis/6379/data/redis_6379.pid"
loglevel notice
logfile "/redis/6379/log/redis_6379.log"
databases 16
always-show-logo yes
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "dump- "主机地址"-6379.rdb"
dir "redis/6379/data"
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
replica-priority 100
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
appendonly no
appendfilename "appendonly- "主机地址"-6379.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 80
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 256
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 2gb 256mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes

cluster-enabled yes
cluster-node-timeout 15000
cluster-config-file "/redis/6379/data/nodes-6379.conf"
requirepass "集群密码"

3.Flink将数据结果写入到Redis集群时的异常

这里分享一个关于一次生产环境上线所遇到的问题，先看异常信息如下：

企业微信截图_1562926365879.png

这个错误信息很常见，网上80-90%的人都指出了很有可能是脏数据导致的。
在测试的过程中，我们发现程序一开始会正常运行，直到遇到window窗口输出执行的时候就会抛出这个异常，从报错信息上我们找到了那个JedisClusterPipeline类，发现是获取jedis对象的时候抛出的 NullPointerExeception,问题追踪到这就无从下手了，找不到原因所在了。后来我们怀疑redis的连接池是不是定义的太小了导致的获取连接的时候出了问题呢？

然而多次调整后，问题依旧存在？后来请教了公司架构部Redis 大佬，大佬连接上redis集群查看了一下Redis 的健康状况（指令：redis-cli -h ip -p 9379 -a password cluster info），发现Redis集群中master和slave副本之间的通讯是 disconnect的状态？显然这个redis集群是有问题的，当时我们去测试Redis集群的时候，只是手动启动了一个 redis_cli.sh ,查看是否能连接成功，以此作为判断Redis集群是否健康的依据。很显然这样方式是不能说明redis集群的健康的，导致我们浪费了大量的时间。

总结：

判断Redis 集群是否健康，不能单单是通过指令客户端的存储和查询来判断。
redis 连接异常时也有可能出现异常：Could not forward element to next operator

Redis Cluster集群的搭建