我们都知道通过cluster nodes可以查看集群列表,当遇到机器下线或者机器物理故障的时候需要置换机器。但是通过cluster nodes查看的时候还可以看到原来的无效ip, 所幸redis提供了cluster forget xx这个命令。
突然有一次执行完cluster forget
后,经过短暂的几秒后,依然可以查到该无效ip,但是节点状态变成了"handshake"
握手状态,而且nodeId在不停的发生变化。
后面经查证,是因为集群所有节点都持有该节点的信息,不停的在发起重连操作。而且redis作者也针对这种情况给出了结论:
There are only two ways this can happen:
1. You fail to send CLUSTER FORGET to all the nodes in the cluster. So eventually there are nodes that still has a clue about this other node, and it will inform the other nodes via gossip. Make sure to send CLUSTER FORGET to every single node in the cluster.
2. Or alternatively, there is an instance running in 10.15.107.150 but you said there is not.
也就是需要在redis cluster所有节点上(包括从节点)执行cluster forget xx
操作,才能彻底的移除掉无效节点列表,问题才得以解决。