一次etcd踩坑记

最近项目中部分服务迁移到go,为了方便监控服务,采用了etcd。启动go服务的时候,去etcd获取一个租约,然后每隔一段时间,就去续约。go的etcd包是通过返回一个chan,通过chan可以获取到续约的情况。在项目中是通过select监听chan,如果失败则打印一条日志。因为需要监听的不只续约,还有键的变化,所以失败的时候并不会退出。

for {
    select {
    case aliveResp := <-aliveChan:
        if aliveResp == nil {
            log.Println("keep alive end")
        }
    case watchResp := <-watchChan:
        do somthing()
    }

在某个时刻,突然这个方法在逛打日志,由于go的chan关闭了,会一直返回nil,也就是这里陷入了死循环。而chan关闭的原因无非就是etcd服务挂了,或者处于那个时刻刚好处于选举,没有leader,响应不了。

查看了etcd服务,发现服务是正常运行的,那么原因就是第二个了。这个时候需要去查看etcd的日志确认一下。在线上etcd服务总共有三个节点,分别是102,105,104。下面我描述一下当时的场景。在时刻0的时候,102是leader,其他两个节点是follower,处于任期1。在时刻1的时候,105节点突然断开了网络连接,由于收不到leader的心跳,节点开始新的选举。节点总共选举了三次,因为连接断开了,所以收不到其他节点的投票。在时刻4的时候,102节点收到了三个投票请求,由于节点还处于任期内,所以拒绝了三个投票。在时刻5,104节点也拒绝了三个投票,此时,102节点因为收到了更高的任期4,所以变为follower。所以当前时刻,整个etcd服务并没有leader,也就是会拒绝所有请求,等到新的leader选举完成。而这个时刻,刚好go的服务去续约,因为etcd无法服务导致chan关闭,所以出现了开头的问题。通过go服务日志的时间与etcd服务日志的时间,可以确认这个猜想。

知道了问题,解决方案也很简单,在chan关闭后,通过定时去重新租约,获取新的chan。代码如下。

for {
    select {
    case aliveResp := <-aliveChan:
        if aliveResp == nil {
            log.Println("keep alive end")
            var err error
            _playRegister.leaseResp, _playRegister.aliveChan, err = serviceRegister(_playRegister.client, 5, _playRegister.statusKey)
            if err != nil {
                time.Sleep(1 * time.Second)
            } else {
                aliveChan = _playRegister.aliveChan
            }
        }
    case watchResp := <-watchChan:
        do somthing()
    }

任期内无法投票的原因,可以参见下面源码。


case m.Term > r.Term:
    // 消息的Term大于节点当前的Term
    lead := m.From
    if m.Type == pb.MsgVote || m.Type == pb.MsgPreVote {
        // 如果收到的是投票类消息
        // 当context为campaignTransfer时表示强制要求进行竞选
        force := bytes.Equal(m.Context, []byte(campaignTransfer))
        // 是否在租约期以内
        inLease := r.checkQuorum && r.lead != None && r.electionElapsed < r.electionTimeout
        if !force && inLease {
            // 如果非强制,而且又在租约期以内,就不做任何处理
            // 非强制又在租约期内可以忽略选举消息,见论文的4.2.3,这是为了阻止已经离开集群的节点再次发起投请求
            // If a server receives a RequestVote request within the minimum election timeout
            // of hearing from a current leader, it does not update its term or grant its vote
            r.logger.Infof("%x [logterm: %d, index: %d, vote: %x] ignored %s from %x [logterm: %d, index: %d] at term %d: lease is not expired (remaining ticks: %d)",
            r.id, r.raftLog.lastTerm(), r.raftLog.lastIndex(), r.Vote, m.Type, m.From, m.LogTerm, m.Index, r.Term, r.electionTimeout-r.electionElapsed)
            return nil
        }
        // 否则将lead置为空
        lead = None
    }
————————————————
版权声明:本文为CSDN博主「跨链技术践行者」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/shangsongwww/java/article/details/90287901

详细的日志在这里

[105]3d7c0fd6445ef729
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 is starting a new election at term 8
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 became candidate at term 9
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 received MsgVoteResp from 3d7c0fd6445ef729 at term 9
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to 6404974f67850bd9 at term 9
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to de352282a8b96da5 at term 9
raft2020/03/31 18:40:16 INFO: raft.node: 3d7c0fd6445ef729 lost leader 6404974f67850bd9 at term 9
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 is starting a new election at term 9
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 became candidate at term 10
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 received MsgVoteResp from 3d7c0fd6445ef729 at term 10
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to 6404974f67850bd9 at term 10
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to de352282a8b96da5 at term 10
2020-03-31 18:40:19.499809 W | rafthttp: lost the TCP streaming connection with peer 6404974f67850bd9 (stream MsgApp v2 reader)
2020-03-31 18:40:19.503224 E | rafthttp: failed to read 6404974f67850bd9 on stream MsgApp v2 (read tcp 192.168.10.105:51776->192.168.10.102:2380: i/o timeout)
2020-03-31 18:40:19.503242 I | rafthttp: peer 6404974f67850bd9 became inactive (message send to peer failed)
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 is starting a new election at term 10
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 became candidate at term 11
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 received MsgVoteResp from 3d7c0fd6445ef729 at term 11
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to 6404974f67850bd9 at term 11
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to de352282a8b96da5 at term 11
2020-03-31 18:40:19.658495 W | rafthttp: lost the TCP streaming connection with peer de352282a8b96da5 (stream MsgApp v2 reader)
2020-03-31 18:40:19.658557 E | rafthttp: failed to read de352282a8b96da5 on stream MsgApp v2 (read tcp 192.168.10.105:19098->192.168.10.104:2380: i/o timeout)
2020-03-31 18:40:19.658567 I | rafthttp: peer de352282a8b96da5 became inactive (message send to peer failed)
2020-03-31 18:40:19.658656 W | rafthttp: lost the TCP streaming connection with peer de352282a8b96da5 (stream Message reader)
2020-03-31 18:40:19.909333 I | rafthttp: peer 6404974f67850bd9 became active
2020-03-31 18:40:19.909379 I | rafthttp: established a TCP streaming connection with peer 6404974f67850bd9 (stream MsgApp v2 reader)
2020-03-31 18:40:19.969068 I | rafthttp: peer de352282a8b96da5 became active
2020-03-31 18:40:19.969132 I | rafthttp: established a TCP streaming connection with peer de352282a8b96da5 (stream Message reader)
2020-03-31 18:40:19.969233 I | rafthttp: established a TCP streaming connection with peer de352282a8b96da5 (stream MsgApp v2 reader)
2020-03-31 18:40:20.509461 W | rafthttp: lost the TCP streaming connection with peer 6404974f67850bd9 (stream Message reader)
2020-03-31 18:40:20.509505 E | rafthttp: failed to read 6404974f67850bd9 on stream Message (read tcp 192.168.10.105:51774->192.168.10.102:2380: i/o timeout)
2020-03-31 18:40:20.509514 I | rafthttp: peer 6404974f67850bd9 became inactive (message send to peer failed)
2020-03-31 18:40:20.510346 I | rafthttp: peer 6404974f67850bd9 became active
2020-03-31 18:40:20.510361 I | rafthttp: established a TCP streaming connection with peer 6404974f67850bd9 (stream Message reader)
raft2020/03/31 18:40:21 INFO: 3d7c0fd6445ef729 [term: 11] received a MsgVote message with higher term from 6404974f67850bd9 [term: 12]
raft2020/03/31 18:40:21 INFO: 3d7c0fd6445ef729 became follower at term 12
raft2020/03/31 18:40:21 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977, vote: 0] cast MsgVote for 6404974f67850bd9 [logterm: 8, index: 978] at term 12
raft2020/03/31 18:40:21 INFO: raft.node: 3d7c0fd6445ef729 elected leader 6404974f67850bd9 at term 12
2020-03-31 18:40:21.583620 W | rafthttp: health check for peer de352282a8b96da5 could not connect: dial tcp 192.168.10.104:2380: connect: network is unreachable

[102]6404974f67850bd9
2020-03-31 18:40:19.910296 W | rafthttp: closed an existing TCP streaming connection with peer 3d7c0fd6445ef729 (stream MsgApp v2 writer)
2020-03-31 18:40:19.910324 I | rafthttp: established a TCP streaming connection with peer 3d7c0fd6445ef729 (stream MsgApp v2 writer)
raft2020/03/31 18:40:19 INFO: 6404974f67850bd9 [logterm: 8, index: 977, vote: 6404974f67850bd9] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 4)
raft2020/03/31 18:40:19 INFO: 6404974f67850bd9 [logterm: 8, index: 977, vote: 6404974f67850bd9] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 4)
raft2020/03/31 18:40:19 INFO: 6404974f67850bd9 [logterm: 8, index: 977, vote: 6404974f67850bd9] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 4)
raft2020/03/31 18:40:20 INFO: 6404974f67850bd9 [term: 8] received a MsgAppResp message with higher term from 3d7c0fd6445ef729 [term: 11]
raft2020/03/31 18:40:20 INFO: 6404974f67850bd9 became follower at term 11
raft2020/03/31 18:40:20 INFO: raft.node: 6404974f67850bd9 lost leader 6404974f67850bd9 at term 11
raft2020/03/31 18:40:20 INFO: 6404974f67850bd9 [term: 11] ignored a MsgAppResp message with lower term from de352282a8b96da5 [term: 8]
2020-03-31 18:40:20.511031 W | rafthttp: closed an existing TCP streaming connection with peer 3d7c0fd6445ef729 (stream Message writer)
2020-03-31 18:40:20.511054 I | rafthttp: established a TCP streaming connection with peer 3d7c0fd6445ef729 (stream Message writer)
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 is starting a new election at term 11
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 became candidate at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 received MsgVoteResp from 6404974f67850bd9 at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 [logterm: 8, index: 978] sent MsgVote request to 3d7c0fd6445ef729 at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 [logterm: 8, index: 978] sent MsgVote request to de352282a8b96da5 at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 received MsgVoteResp from 3d7c0fd6445ef729 at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 has received 2 MsgVoteResp votes and 0 vote rejections
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 became leader at term 12
raft2020/03/31 18:40:21 INFO: raft.node: 6404974f67850bd9 elected leader 6404974f67850bd9 at term 12

[104]de352282a8b96da5
2020-03-31 18:40:19.978326 W | rafthttp: closed an existing TCP streaming connection with peer 3d7c0fd6445ef729 (stream Message writer)
2020-03-31 18:40:19.983180 I | rafthttp: established a TCP streaming connection with peer 3d7c0fd6445ef729 (stream Message writer)
2020-03-31 18:40:19.983202 W | rafthttp: closed an existing TCP streaming connection with peer 3d7c0fd6445ef729 (stream MsgApp v2 writer)
2020-03-31 18:40:19.983213 I | rafthttp: established a TCP streaming connection with peer 3d7c0fd6445ef729 (stream MsgApp v2 writer)
raft2020/03/31 18:40:20 INFO: de352282a8b96da5 [logterm: 8, index: 978, vote: 0] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 6)
raft2020/03/31 18:40:20 INFO: de352282a8b96da5 [logterm: 8, index: 978, vote: 0] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 6)
raft2020/03/31 18:40:20 INFO: de352282a8b96da5 [logterm: 8, index: 978, vote: 0] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 6)
raft2020/03/31 18:40:21 INFO: de352282a8b96da5 [term: 8] received a MsgVote message with higher term from 6404974f67850bd9 [term: 12]
raft2020/03/31 18:40:21 INFO: de352282a8b96da5 became follower at term 12
raft2020/03/31 18:40:21 INFO: de352282a8b96da5 [logterm: 8, index: 978, vote: 0] cast MsgVote for 6404974f67850bd9 [logterm: 8, index: 978] at term 12
raft2020/03/31 18:40:21 INFO: raft.node: de352282a8b96da5 lost leader 6404974f67850bd9 at term 12
raft2020/03/31 18:40:21 INFO: raft.node: de352282a8b96da5 elected leader 6404974f67850bd9 at term 12
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,383评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,522评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,852评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,621评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,741评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,929评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,076评论 3 410
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,803评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,265评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,582评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,716评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,395评论 4 333
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,039评论 3 316
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,798评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,027评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,488评论 2 361
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,612评论 2 350

推荐阅读更多精彩内容

  • 寻找一种易于理解的一致性算法(扩展版) 摘要 Raft 是一种为了管理复制日志的一致性算法。它提供了和 Paxos...
    yflau阅读 951评论 0 1
  • 大家好,我叫郭佳轩。今天早上我早早就起来了,我先开始吃饭,吃完饭以后我先写语文作业。语文作业就是两篇课文的生字,写...
    棉花糖小丸子阅读 128评论 0 0
  • 列表 不同于数组,列表的元素是不能改变的,列表是链表。 同一个列表的所有元素都必须是相同类型的。元素类型为T的列表...
    liqing151阅读 205评论 0 0
  • 编程语言是 Java,代码托管在我的 GitHub 上,包括测试用例。欢迎各种批评指正! 题目 —— Revers...
    秋名山菜车手阅读 499评论 1 1
  • 今天,头脑里突然蹦出这个关键词,因此,想用它来写文章了,就写写自己的感受吧。 看到这个标题,想起来一句话:凡事积极...
    悦清儿阅读 177评论 0 0