kubectl TLS handshake timeout

这是个困扰笔者2天的问题,过程中也查阅大量stackoverflow/google/baidu(千篇一律的内存小了/升级/重装等),今天终于想通解决了,故在此记录,望给有相同经历的同学提供一种思路。

先来看下具体问题,集群完成后发现 kubectl version 报错:\color{red}{net/http: TLS handshake timeout},追加 --v 9 查看详细日志后发现 Client 端正常,服务端服务正常响应。

[root@***-24-69-3 ~]# kubectl version --v 9
I0511 09:49:55.099313 2329027 loader.go:372] Config loaded from file:  /etc/kubernetes/admin.conf
I0511 09:49:55.099762 2329027 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.23.6 (linux/amd64) kubernetes/ad33385" 'https://***.24.69.222:6443/version?timeout=32s'
I0511 09:49:55.100226 2329027 round_trippers.go:510] HTTP Trace: Dial to tcp:***.24.69.222:6443 succeed
I0511 09:50:05.100639 2329027 round_trippers.go:570] HTTP Statistics: DNSLookup 0 ms Dial 0 ms TLSHandshake 10000 ms Duration 10000 ms
I0511 09:50:05.100654 2329027 round_trippers.go:577] Response Headers:
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:49:13Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
I0511 09:50:05.100728 2329027 helpers.go:237] Connection error: Get https://***.24.69.222:6443/version?timeout=32s: net/http: TLS handshake timeout
F0511 09:50:05.100742 2329027 helpers.go:118] Unable to connect to the server: net/http: TLS handshake timeout
goroutine 1 [running]:
k8s.io/kubernetes/vendor/k8s.io/klog/v2.stacks(0x1)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1038 +0x8a
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).output(0x3080040, 0x3, 0x0, 0xc00053a230, 0x2, {0x25f2ec7, 0x10}, 0xc00010c400, 0x0)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:987 +0x5fd
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).printDepth(0xc0004386c0, 0x40, 0x0, {0x0, 0x0}, 0x2a, {0xc00011eb20, 0x1, 0x1})
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:735 +0x1ae
k8s.io/kubernetes/vendor/k8s.io/klog/v2.FatalDepth(...)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1518
k8s.io/kubernetes/vendor/k8s.io/kubectl/pkg/cmd/util.fatal({0xc0004386c0, 0x40}, 0xc0001fa120)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:96 +0xc5
k8s.io/kubernetes/vendor/k8s.io/kubectl/pkg/cmd/util.checkErr({0x1fed760, 0xc0001fa120}, 0x1e797d0)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:191 +0x7d7
k8s.io/kubernetes/vendor/k8s.io/kubectl/pkg/cmd/util.CheckErr(...)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:118
k8s.io/kubernetes/vendor/k8s.io/kubectl/pkg/cmd/version.NewCmdVersion.func1(0xc000aecf00, {0xc00043b460, 0x0, 0x2})
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/kubectl/pkg/cmd/version/version.go:79 +0xd1
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute(0xc000aecf00, {0xc00043b420, 0x2, 0x2})
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:860 +0x5f8
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc000395680)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:974 +0x3bc
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute(...)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/vendor/k8s.io/component-base/cli.run(0xc000395680)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/cli/run.go:146 +0x325
k8s.io/kubernetes/vendor/k8s.io/component-base/cli.RunNoErrOutput(...)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/cli/run.go:84
main.main()
        _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubectl/kubectl.go:30 +0x1e

goroutine 6 [chan receive]:
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).flushDaemon(0x0)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1181 +0x6a
created by k8s.io/kubernetes/vendor/k8s.io/klog/v2.init.0
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:420 +0xfb

goroutine 51 [select]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x1febb40, 0xc000568000}, 0x1, 0xc000138360)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:167 +0x13b
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x12a05f200, 0x0, 0x0, 0x0)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(...)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Forever(0x0, 0x0)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:81 +0x28
created by k8s.io/kubernetes/vendor/k8s.io/component-base/logs.InitLogs
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/logs/logs.go:179 +0x85

# 直接执行 curl 访问,问题一致
[root@***-24-69-3 ~]# curl -v -k -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.23.6 (linux/amd64) kubernetes/ad33385" 'https://***.24.69.222:6443/version?timeout=32s'
* About to connect() to ***.24.69.222 port 6443 (#0)
*   Trying ***.24.69.222...
* Connected to ***.24.69.222 (***.24.69.222) port 6443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb

之后笔者用了比较笨的办法恢复了故障的节点,并保留了一台作为故障排查。

正常节点上用命令观察其实能发现故障节点服务一切是正常的,那么这里可以基本上判断 CNI 网络插件 Flannel 是正常服务的,这里就开始回看 kubectl 的 https 协议问题了。

[root@***-24-69-2 ~]# kubectl get no
NAME                              STATUS   ROLES                  AGE     VERSION
***-24-69-2.***    Ready    control-plane,master   4d19h   v1.23.6
***-24-69-3.***    Ready    control-plane,master   4d19h   v1.23.6
***-24-69-30.***   Ready    <none>                 35m     v1.23.6
***-24-69-31.***   Ready    <none>                 15m     v1.23.6
***-24-69-32.***   Ready    <none>                 30s     v1.23.6
***-24-69-4.***    Ready    control-plane,master   4d19h   v1.23.6
***-24-69-5.***    Ready    <none>                 23h     v1.23.6
***-24-69-6.***    Ready    <none>                 22h     v1.23.6

# 24-69-3 观察 flannel 正常
[root@***-24-69-3 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 38:68:dd:4f:e7:58 brd ff:ff:ff:ff:ff:ff
3: em2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 38:68:dd:4f:e7:58 brd ff:ff:ff:ff:ff:ff
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 38:68:dd:4f:e7:58 brd ff:ff:ff:ff:ff:ff
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:9b:fa:a8:00 brd ff:ff:ff:ff:ff:ff
    inet ***.17.0.1/16 brd ***.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
13: bond0.169@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 38:68:dd:4f:e7:58 brd ff:ff:ff:ff:ff:ff
    inet ***.24.69.3/24 brd ***.24.69.255 scope global bond0.169
       valid_lft forever preferred_lft forever
14: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 6a:89:1a:8a:92:1f brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever

这里需要回顾下 HTTPS 的安全通信机制(这里借用了《图解HTTP》的相关内容)

  1. 客户端发送Client Hello报文与服务器进行SSL通信,报文中指明了客户端支持的SSL的版本,加密组件(加密算法及密钥长度)

  2. 服务器可以进行SSL通信时,以Server Hello报文作为回应,报文中也包含了SSL版本及加密组件

  3. 紧接着服务器发送Certificate报文,将自己的公开密钥证书发给客户端

  4. 服务器发送Server Hello Done报文通知客户端,握手部分结束

SSL第一次握手.jpeg
  1. SSL第一次握手结束后,客户端发送Client Key Exchange报文回应,报文中包含使用服务器公开密钥加密的一种被称为Pre-master secret的随机密码串,这个密码串十分重要,它作为后面通信的共享密钥

  2. 接着客户端继续发送Change Cipher Spec报文,该报文提示服务器,在此报文之后的通信会采用Pre-master secret密钥加密

  3. 客户端继续发送Finished报文,此报文包含了之前所有报文的整体校验值,服务器能否正确的解密该报文决定了此次握手协商是否成功

  4. 服务器同样发送Change Cipher Spec报文

  5. 服务器同样发送Finished报文

  6. 服务器和客户端的Finished报文交换完毕后,SSL连接建立完成,通信加密完成受到SSL的保护,之后进行应用层的通信,即发送HTTP请求

ClientKeyExchange.jpeg
  1. 应用层通信,发送HTTP响应

  2. 最后由客户端断开连接,发送close_notify报文,之后进行TCP的四次挥手断开连接

到这里,可以明确在客户端发送 Client Hello 报文与服务器进行 SSL 通信时,并未得到服务端 Server Hello 报文的应答,这里比较奇怪的是 .24.69.222 服务测网卡 IP 可以正常 ping 通,这里在说明下,.24.69.222 是由 Master 节点中的一台虚拟出来的 IP,那么其他正常节点也可以通过该虚拟 IP 完成 HTTPS通信,其实这里问题就比较明显了(特别是高可用集群部署时,参照了 keepalived + lvs 做 LB 的同学们)。

[root@***-24-69-3 bin]# ping ***.24.69.222
PING ***.24.69.222 (***.24.69.222) 56(84) bytes of data.
64 bytes from ***.24.69.222: icmp_seq=1 ttl=64 time=0.140 ms
64 bytes from ***.24.69.222: icmp_seq=2 ttl=64 time=0.151 ms
64 bytes from ***.24.69.222: icmp_seq=3 ttl=64 time=0.084 ms
^C
--- ***.24.69.222 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.084/0.125/0.151/0.029 ms

这里需要引入 IP 地址和硬件地址以及 ARP 的相关知识:

  • 物理地址是数据链路层和物理层使用的地址,而 IP 地址是网络层和以上各层使用的地址,是一种逻辑地址(称 IP 地址是逻辑地址是因为 IP 地址是用软件实现的)。

  • 在发送数据时,数据从高层下到低层,然后才到通信链路上传输。使用 IP 地址的 IP 数据报一旦交给了数据链路层,就被封装成 MAC 帧了。MAC 帧在传送时使用的源地址和目的地址都是硬件地址,这两个硬件地址都写在 MAC 帧的首部中。

到这里就可以引入 ARP 地址解析协议了。

  • 首先当我们知道了一个机器(主机或路由器)的 IP 地址,需要找出其相应的硬件地址。地址解析协议 ARP 就是用来解决这样的问题的。

  • 地址解析协议 ARP 会在主机 ARP 高速缓存中存放一个 从 IP 地址到硬件地址的映射表,并且这个映射表还经常动态更新(新增或超时删除)。

好了,我们再来观察下 ARP 的高速缓存对应的 MAC 物理地址,看看是否有问题。

# 故障节点查看 ARP 高速缓存
[root@***-24-69-3 bin]# arp -e            
Address                  HWtype  HWaddress           Flags Mask            Iface
***.24.69.222            ether   b4:05:5d:7d:89:3a   C                     bond0.169
......

# 222 节点查看对应网卡
# 这里发现了,两者的 MAC 地址竟然不一致
[root@***-24-69-2 ~]# ip a
......
5: bond0.169@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 38:68:dd:4f:e7:e8 brd ff:ff:ff:ff:ff:ff
    inet ***.24.69.2/24 brd ***.24.69.255 scope global bond0.169
       valid_lft forever preferred_lft forever
    inet ***.24.69.222/24 brd ***.24.69.255 scope global secondary bond0.169:1
       valid_lft forever preferred_lft forever

问题定位到,我们接着操作( \color{red}{下面就是具体解法了} )。

[root@***-24-69-3 ~]# arp -d ***.24.69.222
[root@***-24-69-3 ~]# ping ***.24.69.222
[root@***-24-69-3 ~]# arp -e            
Address                  HWtype  HWaddress           Flags Mask            Iface
***.24.69.222            ether   38:68:dd:4f:e7:e8   C                     bond0.169
[root@***-24-69-3 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:49:13Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:43:11Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}

OK,顺利修复,又看到了 Server Version 正常响应了。

参考文献:

《计算机网络(第6版)》谢希仁;

《图解HTTP》上野宣;

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容