继续上一篇k8s集群down的补充

master一节点删除后,重新加入节点

上一节在操作master节点的时候,删除了一个master节点,即执行了kubeadm reset,当时删除也是为了定位是否是集群的原因

一、恢复操作:

    1. 重新生成token,执行如下命令
[root@k8s-test-master03 ~]#  kubeadm token create 
in7k87.9iwm83473t1vehpk
[root@k8s-test-master03 ~]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
dc40145602096b410d8b60c26fb4767115cb9ee6cb097f887c728993bcaf6dee

    1. 执行已经剔除的节点3的重置
      kubeadm reset
[root@k8s-test-master03 ~]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0630 15:47:30.957297   21465 reset.go:96] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get node registration: failed to get node name from kubelet config: open /etc/kubernetes/kubelet.conf: no such file or directory
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0630 15:47:34.091929   21465 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
  • 执行加入节点
    kubeadm join 10.2.135.250:8443 --token in7k87.9iwm83473t1vehpk --discovery-token-ca-cert-hash sha256:dc40145602096b410d8b60c26fb4767115cb9ee6cb097f887c728993bcaf6dee --control-plane --certificate-key df65c6f2cc3a0846a3c5f7459a92050e3093ee55a03fd06dcec94e2eca0129eb
    此处使用的是上面生成的token信息
[root@k8s-test-master03 ~]# kubeadm join 10.2.135.250:8443 --token in7k87.9iwm83473t1vehpk     --discovery-token-ca-cert-hash sha256:dc40145602096b410d8b60c26fb4767115cb9ee6cb097f887c728993bcaf6dee     --control-plane --certificate-key df65c6f2cc3a0846a3c5f7459a92050e3093ee55a03fd06dcec94e2eca0129eb
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-test-master03 localhost] and IPs [10.2.135.105 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-test-master03 localhost] and IPs [10.2.135.105 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-test-master03 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.k8s.cluster.local] and IPs [10.4.128.1 10.2.135.105 10.2.135.250]
[certs] Generating "front-proxy-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://10.2.135.105:2379 with maintenance client: context deadline exceeded
To see the stack trace of this error execute with --v=5 or higher

根据关键信息 "error execution phase check-etcd" 可知,可能是在执行加入 etcd 时候出现的错误,导致 master 无法加入原先的 kubernetes 集群。

二、分析问题

  • 查看集群节点列表
    首先通过 Kuberctl 工具检查一下现有的节点信息:
[root@k8s-test-master01 ~]# kubectl get nodes
NAME                STATUS   ROLES    AGE    VERSION
k8s-test-master01   Ready    master   255d   v1.16.2
k8s-test-master02   Ready    master   255d   v1.16.2
k8s-test-node01     Ready    <none>   95d    v1.16.2
k8s-test-node02     Ready    <none>   255d   v1.16.2
k8s-test-node03     Ready    <none>   255d   v1.16.2
k8s-test-node04     Ready    <none>   73d    v1.16.2

可以看到,k8s-test-master03 节点确实不在节点列表中

  • 查看 Kubeadm 配置信息
    在看看 Kubernetes 集群中的 kubeadm 配置信息:
[root@k8s-test-master01 ~]# kubectl describe configmaps kubeadm-config -n kube-system

获取到的内容如下:

Name:         kubeadm-config
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Data
====
ClusterConfiguration:
----
apiServer:
  extraArgs:
    authorization-mode: Node,RBAC
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 10.2.135.250:8443
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.16.2
networking:
  dnsDomain: k8s.cluster.local
  podSubnet: 10.4.192.0/18
  serviceSubnet: 10.4.128.0/18
scheduler: {}

ClusterStatus:
----
apiEndpoints:
  k8s-test-master01:
    advertiseAddress: 10.2.135.103
    bindPort: 6443
  k8s-test-master02:
    advertiseAddress: 10.2.135.104
    bindPort: 6443
  k8s-test-master03:
    advertiseAddress: 10.2.135.105
    bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterStatus

Events:  <none>

可也看到 k8s-test-master03 节点信息还存在与 kubeadm 配置中,说明 etcd 中还存储着 k8s-test-master03 相关信息。

  • 3、分析问题所在及解决方案
    因为集群是通过 kubeadm 工具搭建的,且使用了 etcd 镜像方式与 master 节点一起,所以每个 Master 节点上都会存在一个 etcd 容器实例。当剔除一个 master 节点时 etcd 集群未删除剔除的节点的 etcd 成员信息,该信息还存在 etcd 集群列表中。
    所以,我们需要 进入 etcd 手动删除 etcd 成员信息。

三、解决问题

1、获取 Etcd 镜像列表
首先获取集群中的 etcd pod 列表

[root@k8s-test-master01 ~]# kubectl get pods -n kube-system|grep etcd
etcd-k8s-test-master01                      1/1     Running   0          255d
etcd-k8s-test-master02                      1/1     Running   0          255d

2、进入 Etcd 容器并删除节点信息
选择上面两个 etcd 中任意一个 pod,通过 kubectl 工具进入 pod 内部:

[root@k8s-test-master01 ~]# kubectl exec -it etcd-k8s-test-master01 sh -n kube-system

进入容器后,按照下面步骤执行

# export ETCDCTL_API=3
# 
# 
# alias etcdctl='etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
# etcdctl member list
501e067f89adf844, started, k8s-test-master02, https://10.2.135.104:2380, https://10.2.135.104:2379
5a2eb770e1acc51c, started, k8s-test-master01, https://10.2.135.103:2380, https://10.2.135.103:2379
609d044994869d03, started, k8s-test-master03, https://10.2.135.105:2380, https://10.2.135.105:2379
# [root@k8s-test-master01 ~]# 
[root@k8s-test-master01 ~]# 
[root@k8s-test-master01 ~]# 
[root@k8s-test-master01 ~]# kubectl exec -it etcd-k8s-test-master01 sh -n kube-system
# export ETCDCTL_API=3
# alias etcdctl='etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
# etcdctl member list
501e067f89adf844, started, k8s-test-master02, https://10.2.135.104:2380, https://10.2.135.104:2379
5a2eb770e1acc51c, started, k8s-test-master01, https://10.2.135.103:2380, https://10.2.135.103:2379
609d044994869d03, started, k8s-test-master03, https://10.2.135.105:2380, https://10.2.135.105:2379
# etcdctl member remove 609d044994869d03
Member 609d044994869d03 removed from cluster b40c215cd54426ac
#  etcdctl member list
501e067f89adf844, started, k8s-test-master02, https://10.2.135.104:2380, https://10.2.135.104:2379
5a2eb770e1acc51c, started, k8s-test-master01, https://10.2.135.103:2380, https://10.2.135.103:2379
# exit

3、再次在k8s-test-master03 执行reset

[root@k8s-test-master03 ~]# ip addr|grep 10.
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    inet 10.2.135.105/24 brd 10.2.135.255 scope global ens160
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN qlen 1000
[root@k8s-test-master03 ~]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0630 16:02:10.154744    5030 reset.go:96] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get node registration: failed to get corresponding node: nodes "k8s-test-master03" not found
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0630 16:02:12.130860    5030 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
[root@k8s-test-master03 ~]# cd /etc/kubernetes/
[root@k8s-test-master03 kubernetes]# ls
manifests  pki
[root@k8s-test-master03 kubernetes]# mv manifests manifests_bak

4、再次加入

[root@k8s-test-master03 ~]# kubeadm join 10.2.135.250:8443 --token in7k87.9iwm83473t1vehpk     --discovery-token-ca-cert-hash sha256:dc40145602096b410d8b60c26fb4767115cb9ee6cb097f887c728993bcaf6dee     --control-plane --certificate-key df65c6f2cc3a0846a3c5f7459a92050e3093ee55a03fd06dcec94e2eca0129eb
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-test-master03 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.k8s.cluster.local] and IPs [10.4.128.1 10.2.135.105 10.2.135.250]
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-test-master03 localhost] and IPs [10.2.135.105 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-test-master03 localhost] and IPs [10.2.135.105 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.16" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[mark-control-plane] Marking the node k8s-test-master03 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8s-test-master03 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

This node has joined the cluster and a new control plane instance was created:

* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.

To start administering your cluster from this node, you need to run the following as a regular user:

        mkdir -p $HOME/.kube
        sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
        sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run 'kubectl get nodes' to see this node join the cluster.
  • 备注:
    1、其中在reset以后,未执行rm -rf /etc/kubernetes/manifests ,依然导致启动不了,主要是etcd并没有启动,导致apiserver无法启动,删除后,再次reset以及加入集群,成功。
    2、 kubeadm,kubelet,kubectl版本不对,导致加入不了集群,后面重新安装为同一个版本,否则提示如下错误:
[root@k8s-test-master03 ~]# kubeadm join 10.2.135.250:8443 --token in7k87.9iwm83473t1vehpk     --discovery-token-ca-cert-hash sha256:dc40145602096b410d8b60c26fb4767115cb9ee6cb097f887c728993bcaf6dee     --control-plane --certificate-key afd2a7d1ef6316fcad8adccb9caf6a93741909660140f5700664fd9d063cbd4f
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: this version of kubeadm only supports deploying clusters with the control plane version >= 1.17.0. Current version: v1.16.2

k8s-test-master03 安装的是v1.18.2 ,而原集群安装的是v1.16.2
3、 还有个yum 问题,重置下yum的db即可:rpm --rebuilddb
4、 清理无用镜像:docker system prue -a

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,245评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,749评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,960评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,575评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,668评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,670评论 1 294
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,664评论 3 415
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,422评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,864评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,178评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,340评论 1 344
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,015评论 5 340
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,646评论 3 323
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,265评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,494评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,261评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,206评论 2 352