k8s替换控制节点-包含etcd

如果你第一个替换的就是第一个控制节点,那么请注意一定要按照如下操作流程进行。
在kubespray的部署方式中,默认会认为序列为1的节点应该执行集群初始化,无论是etcd还是kubeadm。所以如果替换的是第一个节点,应该就是要把该节点的第一序列位改成其他完好的节点。

参考: https://github.com/kubernetes-sigs/kubespray/blob/master/docs/nodes.md#3-edit-cluster-info-configmap-in-kube-public-namespace

### 1) Change control plane nodes order in inventory

from

source-ini
[kube_control_plane]
 node-1
 node-2
 node-3


to

source-ini
[kube_control_plane]
 node-2
 node-3
 node-1

2) Remove old first control plane node from cluster

With the old node still in the inventory, run remove-node.yml. You need to pass -e node=node-1 to the playbook to limit the execution to the node being removed. If the node you want to remove is not online, you should add reset_nodes=false and allow_ungraceful_removal=true to your extra-vars.

3) Edit cluster-info configmap in kube-public namespace

kubectl edit cm -n kube-public cluster-info

Change ip of old kube_control_plane node with ip of live kube_control_plane node (server field). Also, update certificate-authority-data field if you changed certs.

4) Add new control plane node

Update inventory (if needed)

Run cluster.yml with --limit=kube_control_plane


尤其要注意第三个步骤,将原来的cluster-info 从指向master1转为指向master2


<         certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM1ekNDQWMrZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeU1Ea3lOREV6TURVMU1sb1hEVE15TURreU1URXpNRFUxTWxvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBSjk1Cmw4NXhaUU56akI3VE9sY3FmUSs5OHU3cWZWYkFCMmx1c2NXVGR6NEF4N3hBME1id2U4REJTd1BXUnpEbURUczkKYWJha056VGpEeDUralJlUldaU29EdmZHNDhTaTVUeFdybDRROVdYNGpjMXhUQjJCTDNWTklqUFFBTUxuK0hOaAozVkQ3VjJaYkJLaDRySUpIaEZlVERDV3U1S3kweUtGYnFqS2gvUXZDbUJ1QlJkZlVaQkdha1pFbVZYOWlKd3YvClZlazRkb2pyb3Q4emNRajhGazVQd0RUeE0zREc5My8zS3MySnd3RTBJOWhkZTlBdDlPZTRzdmtuUmgyOTdlb0QKMXltZjRmc1YzU2IxbGFSbG82MnpTblRkWjJXWmhXVHFGK0ZsQ1pNTGs2d3M4dkE4VWdwTFl4U2w5N0tEV0srVgpteVEzeHB3ZTR2TjUxdWpYZU84Q0F3RUFBYU5DTUVBd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZNNzExekJHNy9MNGFoVk5hZjBqNVVUYVRpdlBNQTBHQ1NxR1NJYjMKRFFFQkN3VUFBNElCQVFDV3VsMUo5aHVLUldTSmJoVWU3STdiNVBzMXJRWnNucFZmalRrYkdUdXJ4bTZScWZ2QQpuc0x5NTNiM0swSnRYeFJTK0pTeWFtR000Zzcxck9MRGx1SkJJcFVoVzR4VU9SU0duNDM1cmI1TjNRekZ5RnJsCkwxVCt1YytEY0pFUFg0T092SUlvSzhMbTlNaW1FNXJBcW9JWFpLcVZDZ25UWGN0QlpTOUFZOW1NWUVoaHphOCsKWXJjWTZjMjJZSWIxb0oxMVlDbndiVUZkNm9VVW1YYXUxV3Y0MnFJNHRxYlNMQ0VxRTlZNERlREdBdkpGNFZqawp1cG9sYjQzVzVIWE1wb0gyOGxhejVadE5aMzRqbS83RTM0ZkhJZHAzOVZoWE1LVk9pTlhXKysvaWN3ckVEUit2CjRMelZGTzYyR1dnUDNiRWVreW5hN1F1Rmh5OWs5b0dRVlJWSQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
<         server: https://10.120.33.146:6443
---
>         certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM1ekNDQWMrZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeE1EZ3hNakEyTVRreU0xb1hEVE14TURneE1EQTJNVGt5TTFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTUVOCmxrNll5ZWtwMSs0RHQzS1BtajVhSHZZZ2J4Q0krSWpVSmV0TVp2UmNzRXR3TmU1YU9yRWZFZXdOK3NyeFRJUVkKSEJHSkxIbUMzZ0VzQjNjdmdIWDZZaGlKWVZMYVZBVi9aN1puK0J3cFllbVVKaWFkNXBxMTZvUDl4dXlZZHpaZgpybCtYajdMai9HdGFYQXNrNmZSS2hzTXVyMmlBMmpBTkwzRG8yZEdHRUtleVNIQVFBaEZqTEErSk9SdElKZEYzCjZyWndsdDNOM2MweFRHU0Y5OGJqMFl5MDR4cG1qVXp1cHVQUWovOEVTT1JaUkhxS0FoRHJLeU5vWDBHbzhSY1AKKzNoa3dvTnVZbit0dTg3Mzk5dG1lUU5DMXpKQkpZemFVQTAxMkRiSWk5bzltcnNTZklpSEtQTjlqeGlFU1BhTAphWTJUV0FWeGt2UzQxY1V1M2hNQ0F3RUFBYU5DTUVBd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZFaGRIc3BySThhUVJhV3J0NWhaZFBocTJtazRNQTBHQ1NxR1NJYjMKRFFFQkN3VUFBNElCQVFBbVAyaStUdGQ3czVKdEIvWkFoMVBFWkV2V0NLTXpWcHdYaW1NUzlhOWxldjdlRFp0VApzSEdYMzhSeUhDNGdKb2N3S1VXamZ5YlBFUnJkTTY1cUN2SXVONW9nQmphZU1iYjRNTUpnM0d4cE45a3RvaU9PCktsa1hKblVHZm83MkpCNTBTSnpJdGthbHFPelhENkgzbzUxQTNYbHp6MUZENTdhRERFZEkxMUZJY2ozTk4vVkoKaVRzSHZyaVd4MGtDK0V1eXhYWE9ma1p3VEkrSjFnMWx2NkZPYW9ZcWZhYVpVQ3cyTmFLc1dMTG9FT2FiNG15TgptV25pQ1M2Q2h6K2xBa2Q5N0w5ck12WmRKZWxlMEJNWmZXSGZTbzJRSlRvc0dMdDdWY2YrVlRmSE9vQlRBNGlXCmpwLzVINVVZdmJrQUV1SmpVV1hCYTZLNTR5N3JJdEhBeUVidwotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
>         server: https://10.120.34.53:6443

# 直接从cat /etc/kubernetes/admin.conf拷贝,改下管理ip即可

如果不改动该项,重新执行cluster.yaml后,master-1无法加入集群,报错如下“

[root@pro-k8s-master-1 ~]# cat /etc/kubernetes/kubeadm-controlplane.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
discovery:
  bootstrapToken:
    apiServerEndpoint: 10.120.34.53:6443
    token: m49jmj.fv3zqgm57tnwgtor
    unsafeSkipCAVerification: true
  timeout: 5m0s
  tlsBootstrapToken: m49jmj.fv3zqgm57tnwgtor
controlPlane:
  localAPIEndpoint:
    advertiseAddress: 10.120.33.146
    bindPort: 6443
  certificateKey: c8d4ef0b01aa6e54caed3d6fd2a1da2a7ada69b3833aeadaf3bac8a81cd01cfa
nodeRegistration:
  name: pro-k8s-master-1
  criSocket: /var/run/dockershim.sock
[root@pro-k8s-master-1 ~]# /usr/local/bin/kubeadm join --config /etc/kubernetes/kubeadm-controlplane.yaml --ignore-preflight-errors=all
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://10.120.33.146:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp 10.120.33.146:6443: connect: connection refused
To see the stack trace of this error execute with --v=5 or higher

因为加入的时候,是从cm: kube-public cluster-info中读取到的集群证书校验数据 和 kube-api endpoint

node join的2个场景:
以下两种场景的加入都依赖两个需要注意的配置项

  1. kube-public cluster-info configmap,里面存储集群证书校验数据,以及集群的kube-api endpoint,如果是lb vip则不用改,如果是节点ip,那么这里就需要改,否则无法加入节点。

  2. 控制面第一个节点进行kube-adm init时需要指定 --upload-certs

kube-adm init时,会生成ca crt key等证书文件,指定该参数当新node加入节点时,会自动同步crt文件到/etc/kubernetes/ssl 目录下

image.png
  1. 普通 worker加入
image.png
  1. 控制面节点加入加入
kubeadm join 10.120.34.53:6443 --token 5r6s2m.g7wimq9aoist154w     --discovery-token-ca-cert-hash sha256:eadb3051b0ea751f058de8805e3c2569769ae8346a889acb835b542b22840d58     --control-plane --certificate-key 6ed53f56e64f12c0cb7a3024203e2a99e95aaaacc52b58b8100a025ff577257e

# 控制面需要指定 --control-plane 

替换第三个节点

在关闭第三个节点后,etcd 会出现丢失选举的情况,应该有超过15s中

[root@pro-k8s-master-2 ~]# hostname=`hostname`
[root@pro-k8s-master-2 ~]# export ETCDCTL_API=3
[root@pro-k8s-master-2 ~]# export ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-$hostname.pem
[root@pro-k8s-master-2 ~]# export ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-$hostname-key.pem
[root@pro-k8s-master-2 ~]# export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
[root@pro-k8s-master-2 ~]# export ETCDCTL_ENDPOINTS="https://10.120.33.146:2379,https://10.120.34.53:2379,https://10.120.35.101:2379"
[root@pro-k8s-master-2 ~]# # 确认
[root@pro-k8s-master-2 ~]# etcdctl member list
89b51fbdfb2a9906, started, etcd2, https://10.120.34.53:2380, https://10.120.34.53:2379, false
8fc45151ffa61a8e, started, etcd1, https://10.120.33.146:2380, https://10.120.33.146:2379, false
d2fccd85a33c58f9, started, etcd3, https://10.120.35.101:2380, https://10.120.35.101:2379, false
[root@pro-k8s-master-2 ~]# etcdctl endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.120.33.146:2379 | 8fc45151ffa61a8e |  3.4.13 |  100 MB |     false |      false |      1561 |  284432261 |          284432261 |        |
|  https://10.120.34.53:2379 | 89b51fbdfb2a9906 |  3.4.13 |  100 MB |      true |      false |      1561 |  284432262 |          284432261 |        |
| https://10.120.35.101:2379 | d2fccd85a33c58f9 |  3.4.13 |  100 MB |     false |      false |      1561 |  284432262 |          284432262 |        |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@pro-k8s-master-2 ~]#
[root@pro-k8s-master-2 ~]# logout
Connection to pro-k8s-master-2 closed.
[root@deployer ~]# ssh pro-k8s-master-2
Last login: Wed Sep 28 10:47:26 2022 from 10.120.33.122
[root@pro-k8s-master-2 ~]#
[root@pro-k8s-master-2 ~]#
[root@pro-k8s-master-2 ~]#
[root@pro-k8s-master-2 ~]# etcdctl endpoint status -w table
{"level":"warn","ts":"2022-09-28T10:49:44.482Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection closed"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
+----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@pro-k8s-master-2 ~]# etcdctl endpoint status -w table
{"level":"warn","ts":"2022-09-28T10:50:09.362Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection closed"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
+----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@pro-k8s-master-2 ~]#
[root@pro-k8s-master-2 ~]#
[root@pro-k8s-master-2 ~]# hostname=`hostname`
[root@pro-k8s-master-2 ~]# export ETCDCTL_API=3
[root@pro-k8s-master-2 ~]# export ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-$hostname.pem
[root@pro-k8s-master-2 ~]# export ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-$hostname-key.pem
[root@pro-k8s-master-2 ~]# export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
[root@pro-k8s-master-2 ~]# export ETCDCTL_ENDPOINTS="https://10.120.33.146:2379,https://10.120.34.53:2379,https://10.120.35.101:2379"
[root@pro-k8s-master-2 ~]# # 确认
[root@pro-k8s-master-2 ~]# etcdctl member list
89b51fbdfb2a9906, started, etcd2, https://10.120.34.53:2380, https://10.120.34.53:2379, false
8fc45151ffa61a8e, started, etcd1, https://10.120.33.146:2380, https://10.120.33.146:2379, false
d2fccd85a33c58f9, started, etcd3, https://10.120.35.101:2380, https://10.120.35.101:2379, false
[root@pro-k8s-master-2 ~]# etcdctl endpoint status -w table
{"level":"warn","ts":"2022-09-28T10:50:27.214Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://10.120.35.101:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint https://10.120.35.101:2379 (context deadline exceeded)
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.120.33.146:2379 | 8fc45151ffa61a8e |  3.4.13 |  100 MB |     false |      false |      1561 |  284434365 |          284434365 |        |
|  https://10.120.34.53:2379 | 89b51fbdfb2a9906 |  3.4.13 |  100 MB |      true |      false |      1561 |  284434365 |          284434365 |        |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@pro-k8s-master-2 ~]#

需要先确认leader能够再恢复,否则先等一下不要立即进行节点替换

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,874评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,102评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,676评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,911评论 1 290
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,937评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,935评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,860评论 3 416
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,660评论 0 271
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,113评论 1 308
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,363评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,506评论 1 346
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,238评论 5 341
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,861评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,486评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,674评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,513评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,426评论 2 352

推荐阅读更多精彩内容