最近项目中部分服务迁移到go,为了方便监控服务,采用了etcd。启动go服务的时候,去etcd获取一个租约,然后每隔一段时间,就去续约。go的etcd包是通过返回一个chan,通过chan可以获取到续约的情况。在项目中是通过select监听chan,如果失败则打印一条日志。因为需要监听的不只续约,还有键的变化,所以失败的时候并不会退出。
for {
select {
case aliveResp := <-aliveChan:
if aliveResp == nil {
log.Println("keep alive end")
}
case watchResp := <-watchChan:
do somthing()
}
在某个时刻,突然这个方法在逛打日志,由于go的chan关闭了,会一直返回nil,也就是这里陷入了死循环。而chan关闭的原因无非就是etcd服务挂了,或者处于那个时刻刚好处于选举,没有leader,响应不了。
查看了etcd服务,发现服务是正常运行的,那么原因就是第二个了。这个时候需要去查看etcd的日志确认一下。在线上etcd服务总共有三个节点,分别是102,105,104。下面我描述一下当时的场景。在时刻0的时候,102是leader,其他两个节点是follower,处于任期1。在时刻1的时候,105节点突然断开了网络连接,由于收不到leader的心跳,节点开始新的选举。节点总共选举了三次,因为连接断开了,所以收不到其他节点的投票。在时刻4的时候,102节点收到了三个投票请求,由于节点还处于任期内,所以拒绝了三个投票。在时刻5,104节点也拒绝了三个投票,此时,102节点因为收到了更高的任期4,所以变为follower。所以当前时刻,整个etcd服务并没有leader,也就是会拒绝所有请求,等到新的leader选举完成。而这个时刻,刚好go的服务去续约,因为etcd无法服务导致chan关闭,所以出现了开头的问题。通过go服务日志的时间与etcd服务日志的时间,可以确认这个猜想。
知道了问题,解决方案也很简单,在chan关闭后,通过定时去重新租约,获取新的chan。代码如下。
for {
select {
case aliveResp := <-aliveChan:
if aliveResp == nil {
log.Println("keep alive end")
var err error
_playRegister.leaseResp, _playRegister.aliveChan, err = serviceRegister(_playRegister.client, 5, _playRegister.statusKey)
if err != nil {
time.Sleep(1 * time.Second)
} else {
aliveChan = _playRegister.aliveChan
}
}
case watchResp := <-watchChan:
do somthing()
}
任期内无法投票的原因,可以参见下面源码。
case m.Term > r.Term:
// 消息的Term大于节点当前的Term
lead := m.From
if m.Type == pb.MsgVote || m.Type == pb.MsgPreVote {
// 如果收到的是投票类消息
// 当context为campaignTransfer时表示强制要求进行竞选
force := bytes.Equal(m.Context, []byte(campaignTransfer))
// 是否在租约期以内
inLease := r.checkQuorum && r.lead != None && r.electionElapsed < r.electionTimeout
if !force && inLease {
// 如果非强制,而且又在租约期以内,就不做任何处理
// 非强制又在租约期内可以忽略选举消息,见论文的4.2.3,这是为了阻止已经离开集群的节点再次发起投请求
// If a server receives a RequestVote request within the minimum election timeout
// of hearing from a current leader, it does not update its term or grant its vote
r.logger.Infof("%x [logterm: %d, index: %d, vote: %x] ignored %s from %x [logterm: %d, index: %d] at term %d: lease is not expired (remaining ticks: %d)",
r.id, r.raftLog.lastTerm(), r.raftLog.lastIndex(), r.Vote, m.Type, m.From, m.LogTerm, m.Index, r.Term, r.electionTimeout-r.electionElapsed)
return nil
}
// 否则将lead置为空
lead = None
}
————————————————
版权声明:本文为CSDN博主「跨链技术践行者」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/shangsongwww/java/article/details/90287901
详细的日志在这里
[105]3d7c0fd6445ef729
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 is starting a new election at term 8
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 became candidate at term 9
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 received MsgVoteResp from 3d7c0fd6445ef729 at term 9
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to 6404974f67850bd9 at term 9
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to de352282a8b96da5 at term 9
raft2020/03/31 18:40:16 INFO: raft.node: 3d7c0fd6445ef729 lost leader 6404974f67850bd9 at term 9
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 is starting a new election at term 9
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 became candidate at term 10
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 received MsgVoteResp from 3d7c0fd6445ef729 at term 10
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to 6404974f67850bd9 at term 10
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to de352282a8b96da5 at term 10
2020-03-31 18:40:19.499809 W | rafthttp: lost the TCP streaming connection with peer 6404974f67850bd9 (stream MsgApp v2 reader)
2020-03-31 18:40:19.503224 E | rafthttp: failed to read 6404974f67850bd9 on stream MsgApp v2 (read tcp 192.168.10.105:51776->192.168.10.102:2380: i/o timeout)
2020-03-31 18:40:19.503242 I | rafthttp: peer 6404974f67850bd9 became inactive (message send to peer failed)
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 is starting a new election at term 10
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 became candidate at term 11
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 received MsgVoteResp from 3d7c0fd6445ef729 at term 11
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to 6404974f67850bd9 at term 11
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to de352282a8b96da5 at term 11
2020-03-31 18:40:19.658495 W | rafthttp: lost the TCP streaming connection with peer de352282a8b96da5 (stream MsgApp v2 reader)
2020-03-31 18:40:19.658557 E | rafthttp: failed to read de352282a8b96da5 on stream MsgApp v2 (read tcp 192.168.10.105:19098->192.168.10.104:2380: i/o timeout)
2020-03-31 18:40:19.658567 I | rafthttp: peer de352282a8b96da5 became inactive (message send to peer failed)
2020-03-31 18:40:19.658656 W | rafthttp: lost the TCP streaming connection with peer de352282a8b96da5 (stream Message reader)
2020-03-31 18:40:19.909333 I | rafthttp: peer 6404974f67850bd9 became active
2020-03-31 18:40:19.909379 I | rafthttp: established a TCP streaming connection with peer 6404974f67850bd9 (stream MsgApp v2 reader)
2020-03-31 18:40:19.969068 I | rafthttp: peer de352282a8b96da5 became active
2020-03-31 18:40:19.969132 I | rafthttp: established a TCP streaming connection with peer de352282a8b96da5 (stream Message reader)
2020-03-31 18:40:19.969233 I | rafthttp: established a TCP streaming connection with peer de352282a8b96da5 (stream MsgApp v2 reader)
2020-03-31 18:40:20.509461 W | rafthttp: lost the TCP streaming connection with peer 6404974f67850bd9 (stream Message reader)
2020-03-31 18:40:20.509505 E | rafthttp: failed to read 6404974f67850bd9 on stream Message (read tcp 192.168.10.105:51774->192.168.10.102:2380: i/o timeout)
2020-03-31 18:40:20.509514 I | rafthttp: peer 6404974f67850bd9 became inactive (message send to peer failed)
2020-03-31 18:40:20.510346 I | rafthttp: peer 6404974f67850bd9 became active
2020-03-31 18:40:20.510361 I | rafthttp: established a TCP streaming connection with peer 6404974f67850bd9 (stream Message reader)
raft2020/03/31 18:40:21 INFO: 3d7c0fd6445ef729 [term: 11] received a MsgVote message with higher term from 6404974f67850bd9 [term: 12]
raft2020/03/31 18:40:21 INFO: 3d7c0fd6445ef729 became follower at term 12
raft2020/03/31 18:40:21 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977, vote: 0] cast MsgVote for 6404974f67850bd9 [logterm: 8, index: 978] at term 12
raft2020/03/31 18:40:21 INFO: raft.node: 3d7c0fd6445ef729 elected leader 6404974f67850bd9 at term 12
2020-03-31 18:40:21.583620 W | rafthttp: health check for peer de352282a8b96da5 could not connect: dial tcp 192.168.10.104:2380: connect: network is unreachable
[102]6404974f67850bd9
2020-03-31 18:40:19.910296 W | rafthttp: closed an existing TCP streaming connection with peer 3d7c0fd6445ef729 (stream MsgApp v2 writer)
2020-03-31 18:40:19.910324 I | rafthttp: established a TCP streaming connection with peer 3d7c0fd6445ef729 (stream MsgApp v2 writer)
raft2020/03/31 18:40:19 INFO: 6404974f67850bd9 [logterm: 8, index: 977, vote: 6404974f67850bd9] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 4)
raft2020/03/31 18:40:19 INFO: 6404974f67850bd9 [logterm: 8, index: 977, vote: 6404974f67850bd9] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 4)
raft2020/03/31 18:40:19 INFO: 6404974f67850bd9 [logterm: 8, index: 977, vote: 6404974f67850bd9] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 4)
raft2020/03/31 18:40:20 INFO: 6404974f67850bd9 [term: 8] received a MsgAppResp message with higher term from 3d7c0fd6445ef729 [term: 11]
raft2020/03/31 18:40:20 INFO: 6404974f67850bd9 became follower at term 11
raft2020/03/31 18:40:20 INFO: raft.node: 6404974f67850bd9 lost leader 6404974f67850bd9 at term 11
raft2020/03/31 18:40:20 INFO: 6404974f67850bd9 [term: 11] ignored a MsgAppResp message with lower term from de352282a8b96da5 [term: 8]
2020-03-31 18:40:20.511031 W | rafthttp: closed an existing TCP streaming connection with peer 3d7c0fd6445ef729 (stream Message writer)
2020-03-31 18:40:20.511054 I | rafthttp: established a TCP streaming connection with peer 3d7c0fd6445ef729 (stream Message writer)
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 is starting a new election at term 11
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 became candidate at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 received MsgVoteResp from 6404974f67850bd9 at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 [logterm: 8, index: 978] sent MsgVote request to 3d7c0fd6445ef729 at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 [logterm: 8, index: 978] sent MsgVote request to de352282a8b96da5 at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 received MsgVoteResp from 3d7c0fd6445ef729 at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 has received 2 MsgVoteResp votes and 0 vote rejections
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 became leader at term 12
raft2020/03/31 18:40:21 INFO: raft.node: 6404974f67850bd9 elected leader 6404974f67850bd9 at term 12
[104]de352282a8b96da5
2020-03-31 18:40:19.978326 W | rafthttp: closed an existing TCP streaming connection with peer 3d7c0fd6445ef729 (stream Message writer)
2020-03-31 18:40:19.983180 I | rafthttp: established a TCP streaming connection with peer 3d7c0fd6445ef729 (stream Message writer)
2020-03-31 18:40:19.983202 W | rafthttp: closed an existing TCP streaming connection with peer 3d7c0fd6445ef729 (stream MsgApp v2 writer)
2020-03-31 18:40:19.983213 I | rafthttp: established a TCP streaming connection with peer 3d7c0fd6445ef729 (stream MsgApp v2 writer)
raft2020/03/31 18:40:20 INFO: de352282a8b96da5 [logterm: 8, index: 978, vote: 0] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 6)
raft2020/03/31 18:40:20 INFO: de352282a8b96da5 [logterm: 8, index: 978, vote: 0] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 6)
raft2020/03/31 18:40:20 INFO: de352282a8b96da5 [logterm: 8, index: 978, vote: 0] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 6)
raft2020/03/31 18:40:21 INFO: de352282a8b96da5 [term: 8] received a MsgVote message with higher term from 6404974f67850bd9 [term: 12]
raft2020/03/31 18:40:21 INFO: de352282a8b96da5 became follower at term 12
raft2020/03/31 18:40:21 INFO: de352282a8b96da5 [logterm: 8, index: 978, vote: 0] cast MsgVote for 6404974f67850bd9 [logterm: 8, index: 978] at term 12
raft2020/03/31 18:40:21 INFO: raft.node: de352282a8b96da5 lost leader 6404974f67850bd9 at term 12
raft2020/03/31 18:40:21 INFO: raft.node: de352282a8b96da5 elected leader 6404974f67850bd9 at term 12