k8s HA 集群
[toc]
参考网站
K8s HA集群架构与部署
架构要点
集群数据库
使用内部etcd集群
Apiserver 高可用
使用VIP将apiserver暴露给工作程序节点
VIP可以使用keepalived + haproxy或者keeplived + nginx 来实现
本次测试使用的是keepalived + haproxy集群
执行
部署etcd集群
使用内部etcd集群
即各个k8smaster节点上都有etcd服务,形成集群,不用手动安装部署etcd集群,etcd以pod方式运行,Kubernetes会自动部署成集群
Apiserver 高可用
软件介绍
keepalived
Keepalived是基于vrrp协议的一款高可用软件。Keepailived有一台主服务器和多台备份服务器,在主服务器和备份服务器上面部署相同的服务配置,使用一个VIP地址对外提供服务,当主服务器出现故障时,VIP地址会自动漂移到备份服务器
Nginx
默认使用80端口的Web服务
Haproxy
配置服务
keepalived+nginx架构
[图片上传失败...(image-b79bc3-1727511940339)]
Nginx配置
在所有节点执行软件安装
apt install -y nginx
在所有节点改写html文件,此html文件就是nginx向客户端展示的内容,本测试为了直观,3个节点应该写入不同内容,此处写入节点hostname
echo k8s1 > /var/www/html/index.nginx-debian.html
检查nginx服务
root@k8s2:/etc/keepalived# curl 10.203.1.82:80
k8s1
Keepalived配置
在所有节点执行软件安装
apt install -y keepalived
编写配置文件,此处展示Master节点配置,backup节点应修改router_id,state以及priority
root@k8s1:~# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id k8s1 #在一个网络应该是唯一的
}
vrrp_script chk_nginx {
script "/etc/keepalived/nginx_check.sh" #定时检查nginx是否正常运行的脚本
interval 2 #脚本执行间隔,每2s检测一次
weight -5 #脚本结果导致的优先级变更,检测失败(脚本返回非0)则优先级 -5
fall 2 #检测连续2次失败才算确定是真失败。会用weight减少优先级(1-255之间)
rise 1 #检测1次成功就算成功。但不修改优先级
}
vrrp_instance VI_1 {
#指定keepalived的角色,这里指定的不一定就是MASTER,实际会根据优先级调整,另一台为BACKUP
state MASTER
interface ens160 #当前进行vrrp通讯的网卡
virtual_router_id 200 #虚拟路由编号(数字1-255),主从要一致
# mcast_src_ip 192.168.79.191 #
priority 100 #定义优先级,数字越大,优先级越高,MASTER的优先级必须大于BACKUP的优先级
nopreempt
advert_int 1 #设定MASTER与BACKUP负载均衡器之间同步检查的时间间隔,单位是秒
authentication {
auth_type PASS
auth_pass 2222
}
#执行监控的服务。注意这个设置不能紧挨着写在vrrp_script配置块的后面(实验中碰过的坑),
#否则nginx监控失效!!
track_script {
chk_nginx #引用VRRP脚本,即在 vrrp_script 部分指定的名字。
#定期运行它们来改变优先级,并最终引发主备切换。
}
virtual_ipaddress {#VRRP HA 虚拟地址 如果有多个VIP,继续换行填写
10.203.1.85
}
}
在所有节点编写nginx_check.sh脚本,脚本会检测nginx进程,如果进程不存在,尝试开启一次,如果开启不成功,杀死Keepalived
#!/bin/bash
counter=`ps -C nginx --no-heading|wc -l`
echo "$counter"
if [ "${counter}" = 0 ]; then
/etc/init.d/nginx start
sleep 2
counter=`ps -C nginx --no-heading|wc -l`
if [ "${counter}" = 0 ]; then
/etc/init.d/keepalived stop
fi
fi
增加可执行权限到nginx_check.sh脚本
chmod +x /etc/keepalived/nginx_check.sh
开启keepalived服务
systemctl daemon-reload
service keepalived start
测试
访问10.203.1.85
root@k8smaster:~# curl 10.203.1.85
k8s1
在Master节点手动stop nginx服务
/etc/init.d/nginx stop
再次访问10.203.1.85,由于nginx_check.sh脚本会重启nginx服务,所以master还是k8s1
root@k8smaster:~# curl 10.203.1.85
k8s1
shutdown k8s1节点,再次访问10.203.1.85,可以看到VIP以及迁移到k8s2节点
root@k8smaster:~# curl 10.203.1.85
k8s2
开启k8s1节点,再次访问10.203.1.85,可以看到VIP回到k8s1节点
root@k8smaster:~# curl 10.203.1.85
k8s1
keepalived+haproxy架构
工作流程
1.master节点通过apiserver来接收命令
2.haproxy有两个参数:
frontend:
bind :8080
backend
master1:apiserver
master2:apiserver
目的是把apiserver通过frontend端口转发
3.keepalived可以创建VIP并实现failover
4.所以,最后kubectl的命令可以下发到VIP:8080端口,进而转发给apiserver进行工作,无论哪个master节点存活,都能够掌控整个集群
5.Worker node是由Kubernetes集群实现HA的,例如在一个Worker node上运行一个Deployment,当这个Worker node宕机,Kubernetes会自动在另外的Worker node运行此Deployment
Keepalived配置
在所有节点安装Keepalived
apt install -y keepalived
编写配置文件,此处展示Master节点配置,backup节点应修改router_id,state以及priority
root@k8s1:~# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id k8s1 #在一个网络应该是唯一的
}
vrrp_script chk_api {
script "/etc/keepalived/check_apiserver.sh" #定时检查apiserver是否正常运行的脚本
interval 2 #脚本执行间隔,每2s检测一次
weight -5 #脚本结果导致的优先级变更,检测失败(脚本返回非0)则优先级 -5
fall 2 #检测连续2次失败才算确定是真失败。会用weight减少优先级(1-255之间)
rise 1 #检测1次成功就算成功。但不修改优先级
}
vrrp_instance VI_1 {
#指定keepalived的角色,这里指定的不一定就是MASTER,实际会根据优先级调整,另一台为BACKUP
state MASTER
interface ens160 #当前进行vrrp通讯的网卡
virtual_router_id 200 #虚拟路由编号(数字1-255),主从要一致
# mcast_src_ip 192.168.79.191 #
priority 100 #定义优先级,数字越大,优先级越高,MASTER的优先级必须大于BACKUP的优先级
nopreempt
advert_int 1 #设定MASTER与BACKUP负载均衡器之间同步检查的时间间隔,单位是秒
authentication {
auth_type PASS
auth_pass 2222
}
#执行监控的服务。注意这个设置不能紧挨着写在vrrp_script配置块的后面(实验中碰过的坑),
#否则nginx监控失效!!
track_script {
chk_api #引用VRRP脚本,即在 vrrp_script 部分指定的名字。
#定期运行它们来改变优先级,并最终引发主备切换。
}
virtual_ipaddress {#VRRP HA 虚拟地址 如果有多个VIP,继续换行填写
10.203.1.85
}
}
在所有节点编写check_apiserver.sh脚本,脚本会检测apiserver,如果apiserver不存在,杀死Keepalived,VIP就会飘到其他节点
#!/bin/sh
errorExit() {
echo "*** $*" 1>&2
exit 1
}
curl --silent --max-time 2 --insecure https://localhost:6443/ -o /dev/null || errorExit "Error GET https://localhost:6443/"
if ip addr | grep -q 10.203.1.85; then
curl --silent --max-time 2 --insecure https://10.203.1.85:6443/ -o /dev/null || errorExit "Error GET https://10.203.1.85:6443/"
fi
增加可执行权限到nginx_check.sh脚本
chmod +x /etc/keepalived/check_apiserver.sh
开启keepalived服务
systemctl daemon-reload
service keepalived start
Haproxy配置
在所有节点安装haproxy
apt install -y haproxy
编辑配置文件/etc/haproxy/haproxy.cfg,3个节点的配置一致
# /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log /dev/log local0
log /dev/log local1 notice
daemon
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 1
timeout http-request 10s
timeout queue 20s
timeout connect 5s
timeout client 20s
timeout server 20s
timeout http-keep-alive 10s
timeout check 10s
#---------------------------------------------------------------------
# apiserver frontend which proxys to the masters
#---------------------------------------------------------------------
frontend apiserver
bind *:8443
mode tcp
option tcplog
default_backend apiserver
#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
option httpchk GET /healthz
http-check expect status 200
mode tcp
option ssl-hello-chk
balance roundrobin
server k8s1 10.203.1.82:6443 check
server k8s2 10.203.1.83:6443 check
server k8s3 10.203.1.84:6443 check
# [...]
重启服务
systemctl restart haproxy
部署kubernetes HA 集群
所有节点安装docker,kubeadm,kubectl,kubelet
Docker
curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
Kubernetes组件
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
apt update
apt install -y kubelet kubeadm kubectl
Master节点
初始化Master节点
在任意一个节点执行以下命令,--control-plane-endpoint 10.203.1.85:8443就是VIP加上haproxy的frontend端口
kubeadm init --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --control-plane-endpoint 10.203.1.85:8443 --upload-certs
结果如下
root@k8s1:~# kubeadm init --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --control-plane-endpoint 10.203.1.85:8443 --upload-certs
[init] Using Kubernetes version: v1.20.5
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.5. Latest validated version: 19.03
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.203.1.82 10.203.1.85]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s1 localhost] and IPs [10.203.1.82 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s1 localhost] and IPs [10.203.1.82 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 79.519175 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
a30f2492d765a17e244ffc650f09ead393397f7f1d05efbe1c7525eb9c5f721b
[mark-control-plane] Marking the node k8s1 as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node k8s1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: p42ggz.1lc9jebaqoag8ca6
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join 10.203.1.85:8443 --token p42ggz.1lc9jebaqoag8ca6 \
--discovery-token-ca-cert-hash sha256:45297952d1b812be3c4ef88bf8060f5583e7a292e414f1eb82f0aa8bdcd71a3f \
--control-plane --certificate-key a30f2492d765a17e244ffc650f09ead393397f7f1d05efbe1c7525eb9c5f721b
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.203.1.85:8443 --token p42ggz.1lc9jebaqoag8ca6 \
--discovery-token-ca-cert-hash sha256:45297952d1b812be3c4ef88bf8060f5583e7a292e414f1eb82f0aa8bdcd71a3f
其余两个master节点执行以下命令
kubeadm join 10.203.1.85:8443 --token p42ggz.1lc9jebaqoag8ca6 \
--discovery-token-ca-cert-hash sha256:45297952d1b812be3c4ef88bf8060f5583e7a292e414f1eb82f0aa8bdcd71a3f \
--control-plane --certificate-key a30f2492d765a17e244ffc650f09ead393397f7f1d05efbe1c7525eb9c5f721b
结果如下
root@k8s2:~# kubeadm join 10.203.1.85:8443 --token p42ggz.1lc9jebaqoag8ca6 \
> --discovery-token-ca-cert-hash sha256:45297952d1b812be3c4ef88bf8060f5583e7a292e414f1eb82f0aa8bdcd71a3f \
> --control-plane --certificate-key a30f2492d765a17e244ffc650f09ead393397f7f1d05efbe1c7525eb9c5f721b
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.5. Latest validated version: 19.03
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s2 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.203.1.83 10.203.1.85]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s2 localhost] and IPs [10.203.1.83 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s2 localhost] and IPs [10.203.1.83 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[mark-control-plane] Marking the node k8s2 as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node k8s2 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
This node has joined the cluster and a new control plane instance was created:
* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.
To start administering your cluster from this node, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Run 'kubectl get nodes' to see this node join the cluster.
Worker Node
两个worker nodes执行以下命令
kubeadm join 10.203.1.85:8443 --token p42ggz.1lc9jebaqoag8ca6 \
> --discovery-token-ca-cert-hash sha256:45297952d1b812be3c4ef88bf8060f5583e7a292e414f1eb82f0aa8bdcd71a3f
结果
root@k8s4:~# kubeadm join 10.203.1.85:8443 --token p42ggz.1lc9jebaqoag8ca6 \
> --discovery-token-ca-cert-hash sha256:45297952d1b812be3c4ef88bf8060f5583e7a292e414f1eb82f0aa8bdcd71a3f
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.5. Latest validated version: 19.03
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
配置kubectl工具
在master节点执行以下命令
mkdir -p /root/.kube && \
cp /etc/kubernetes/admin.conf /root/.kube/config
如果局域网中有其他节点有下载kubectl工具,可以按照以下步骤配置用于管理此集群
mkdir -p /root/.kube
创建config文件,把Kubernetes master节点/etc/kubernetes/admin.conf文件的内容复制到config文件即可
部署flannel网络
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
修复cs
在所有master节点编辑注释controller-manager以及scheduler的yaml文件中的默认端口,使服务正常
vi /etc/kubernetes/manifests/kube-controller-manager.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-controller-manager
tier: control-plane
name: kube-controller-manager
namespace: kube-system
spec:
containers:
- command:
- kube-controller-manager
- --allocate-node-cidrs=true
- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
- --bind-address=127.0.0.1
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --cluster-cidr=10.244.0.0/16
- --cluster-name=kubernetes
- --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
- --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
- --controllers=*,bootstrapsigner,tokencleaner
- --kubeconfig=/etc/kubernetes/controller-manager.conf
- --leader-elect=true
# - --port=0
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --root-ca-file=/etc/kubernetes/pki/ca.crt
- --service-account-private-key-file=/etc/kubernetes/pki/sa.key
- --service-cluster-ip-range=10.96.0.0/12
- --use-service-account-credentials=true
image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.20.5
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
vi /etc/kubernetes/manifests/kube-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
#- --port=0
image: registry.aliyuncs.com/google_containers/kube-scheduler:v1.20.5
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
查看状态
Nodes
root@k8s1:~# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s1 Ready control-plane,master 4d19h v1.20.5
k8s2 Ready control-plane,master 4d19h v1.20.5
k8s3 Ready control-plane,master 4d19h v1.20.5
k8s4 Ready <none> 4d2h v1.20.5
k8s5 Ready <none> 4d2h v1.20.5
ComponentStatus
root@k8s1:~# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
测试
Master HA测试
测试步骤
在一个master节点执行资源创建
可以创建任意资源,例如deployment等
shutdown 节点
使用kubectl get 命令查看创建的资源,创建成功之后shutdown此master节点
查看状态
此时使用kubectl get node命令可以看到有一个master节点处于NotReady的状态
查看刚刚创建的资源,依旧处于正常状态,也能够进行删除,编辑等
重启节点
重启节点片刻之后,使用kubectl get node能够看到节点恢复Ready状态
资源HA 测试
测试步骤
创建一个deployment资源
查看资源运行情况
使用kubectl get all -o wide查看pod运行在哪个worker node
shutdown 节点
shutdown 查看到的pod运行的worker node,能够看到pod资源转移到另外的worker node
重启节点
重启节点之后,pod资源不会failback回来,节点重新变成Ready状态
Kubernetes HA 集群 & LINSTOR 测试
架构
[图片上传失败...(image-7bb5d8-1727511940339)]
说明
LINSTOR 部署
在此前的Kubernetes HA集群的worker nodes上安装DRBD+LINSTOR,并加入LINSTOR集群,此测试新增一个LINSTOR diskful节点。在Kubernetes集群创建LINSTOR CSI 相关服务,创建storage class
测试思路
- 创建一个LINSTOR 类型的persistent volume claim
- 创建一个使用此pvc的deployment
- 查看Pod运行在哪个worker node,shutdown 该node,查看pod是否会转移到其他worker node,数据是否保存
部署LINSTOR CSI服务
安装软件
在两个worker nodes执行
apt install software-properties-common
add-apt-repository ppa:linbit/linbit-drbd9-stack
apt update
apt install drbd-utils drbd-dkms lvm2
modprobe drbd
echo drbd > /etc/modules-load.d/drbd.conf
apt install linstor-controller linstor-satellite linstor-client
#命令含义
#安装software-properties-common工具,安装之后才能执行第二个命令
#添加DRBD9 ppa源
#更新apt源
#安装DRBD9以及相应软件
#加载DRBD9
#DRBD9开机启动
#安装LINSTOR相关软件
将worker nodes加入LINSTOR 集群
root@k8s5:~# linstor n c k8s5 10.203.1.96
SUCCESS:
Description:
New node 'k8s5' registered.
Details:
Node 'k8s5' UUID is: 20d41b64-f6b4-4712-88df-c151c8f00e37
SUCCESS:
Description:
Node 'k8s5' authenticated
Details:
Supported storage providers: [diskless, lvm, lvm_thin, file, file_thin, openflex_target]
Supported resource layers : [drbd, luks, cache, storage]
Unsupported storage providers:
ZFS: 'cat /sys/module/zfs/version' returned with exit code 1
ZFS_THIN: 'cat /sys/module/zfs/version' returned with exit code 1
SPDK: IO exception occured when running 'rpc.py get_spdk_version': Cannot run program "rpc.py": error=2, No such file or directory
Unsupported resource layers:
NVME: IO exception occured when running 'nvme version': Cannot run program "nvme": error=2, No such file or directory
WRITECACHE: 'modprobe dm-writecache' returned with exit code 1
OPENFLEX: IO exception occured when running 'nvme version': Cannot run program "nvme": error=2, No such file or directory
root@k8s5:~# linstor n c k8s4 10.203.1.95
SUCCESS:
Description:
New node 'k8s4' registered.
Details:
Node 'k8s4' UUID is: db78f129-a23d-4245-a744-534a5365925e
SUCCESS:
Description:
Node 'k8s4' authenticated
Details:
Supported storage providers: [diskless, lvm, lvm_thin, file, file_thin, openflex_target]
Supported resource layers : [drbd, luks, cache, storage]
Unsupported storage providers:
ZFS: 'cat /sys/module/zfs/version' returned with exit code 1
ZFS_THIN: 'cat /sys/module/zfs/version' returned with exit code 1
SPDK: IO exception occured when running 'rpc.py get_spdk_version': Cannot run program "rpc.py": error=2, No such file or directory
Unsupported resource layers:
NVME: IO exception occured when running 'nvme version': Cannot run program "nvme": error=2, No such file or directory
WRITECACHE: 'modprobe dm-writecache' returned with exit code 1
OPENFLEX: IO exception occured when running 'nvme version': Cannot run program "nvme": error=2, No such file or directory
部署LINSTOR CSI 服务
Apply以下yaml文件(yaml文件中的LINSTOR_IP需要根据实际情况修改成LINSTOR controller 的IP)
---
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: linstor-csi-controller
namespace: kube-system
spec:
serviceName: "linstor-csi"
replicas: 1
selector:
matchLabels:
app: linstor-csi-controller
role: linstor-csi
template:
metadata:
labels:
app: linstor-csi-controller
role: linstor-csi
spec:
priorityClassName: system-cluster-critical
serviceAccount: linstor-csi-controller-sa
containers:
- name: csi-provisioner
image: teym88/csi-provisioner:v1.5.0
args:
- "--csi-address=$(ADDRESS)"
- "--v=5"
- "--feature-gates=Topology=true"
- "--timeout=120s"
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
imagePullPolicy: "Always"
volumeMounts:
- name: socket-dir
mountPath: /var/lib/csi/sockets/pluginproxy/
- name: csi-attacher
image: teym88/csi-attacher:v2.1.1
args:
- "--v=5"
- "--csi-address=$(ADDRESS)"
- "--timeout=120s"
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
imagePullPolicy: "Always"
volumeMounts:
- name: socket-dir
mountPath: /var/lib/csi/sockets/pluginproxy/
- name: csi-resizer
image: teym88/csi-resizer:v0.5.0
args:
- "--v=5"
- "--csi-address=$(ADDRESS)"
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
imagePullPolicy: "Always"
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir
- name: csi-snapshotter
image: teym88/csi-snapshotter:v2.0.1
args:
- "-csi-address=$(ADDRESS)"
- "-timeout=120s"
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
imagePullPolicy: Always
volumeMounts:
- name: socket-dir
mountPath: /var/lib/csi/sockets/pluginproxy/
- name: linstor-csi-plugin
image: teym88/piraeus-csi:v0.11.0
args:
- "--csi-endpoint=$(CSI_ENDPOINT)"
- "--node=$(KUBE_NODE_NAME)"
- "--linstor-endpoint=$(LINSTOR_IP)"
- "--log-level=debug"
env:
- name: CSI_ENDPOINT
value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: LINSTOR_IP
value: "http://10.203.1.81:3370"
imagePullPolicy: "Always"
volumeMounts:
- name: socket-dir
mountPath: /var/lib/csi/sockets/pluginproxy/
volumes:
- name: socket-dir
emptyDir: {}
---
kind: ServiceAccount
apiVersion: v1
metadata:
name: linstor-csi-controller-sa
namespace: kube-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: linstor-csi-provisioner-role
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["list", "watch", "create", "update", "patch"]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshots"]
verbs: ["get", "list"]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshotcontents"]
verbs: ["get", "list"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: linstor-csi-provisioner-binding
subjects:
- kind: ServiceAccount
name: linstor-csi-controller-sa
namespace: kube-system
roleRef:
kind: ClusterRole
name: linstor-csi-provisioner-role
apiGroup: rbac.authorization.k8s.io
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: linstor-csi-attacher-role
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
- apiGroups: ["storage.k8s.io"]
resources: ["csinodes"]
verbs: ["get", "list", "watch"]
- apiGroups: ["storage.k8s.io"]
resources: ["volumeattachments"]
verbs: ["get", "list", "watch", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: linstor-csi-attacher-binding
subjects:
- kind: ServiceAccount
name: linstor-csi-controller-sa
namespace: kube-system
roleRef:
kind: ClusterRole
name: linstor-csi-attacher-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: linstor-csi-resizer-role
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "patch"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["persistentvolumeclaims/status"]
verbs: ["patch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["list", "watch", "create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: linstor-csi-resizer-binding
subjects:
- kind: ServiceAccount
name: linstor-csi-controller-sa
namespace: kube-system
roleRef:
kind: ClusterRole
name: linstor-csi-resizer-role
apiGroup: rbac.authorization.k8s.io
---
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: linstor-csi-node
namespace: kube-system
spec:
selector:
matchLabels:
app: linstor-csi-node
role: linstor-csi
template:
metadata:
labels:
app: linstor-csi-node
role: linstor-csi
spec:
priorityClassName: system-node-critical
serviceAccount: linstor-csi-node-sa
containers:
- name: csi-node-driver-registrar
image: teym88/csi-node-driver-registrar:v1.2.0
args:
- "--v=5"
- "--csi-address=$(ADDRESS)"
- "--kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)"
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "rm -rf /registration/linstor.csi.linbit.com /registration/linstor.csi.linbit.com-reg.sock"]
env:
- name: ADDRESS
value: /csi/csi.sock
- name: DRIVER_REG_SOCK_PATH
value: /var/lib/kubelet/plugins/linstor.csi.linbit.com/csi.sock
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: plugin-dir
mountPath: /csi/
- name: registration-dir
mountPath: /registration/
- name: linstor-csi-plugin
image: teym88/piraeus-csi:v0.11.0
args:
- "--csi-endpoint=$(CSI_ENDPOINT)"
- "--node=$(KUBE_NODE_NAME)"
- "--linstor-endpoint=$(LINSTOR_IP)"
- "--log-level=debug"
env:
- name: CSI_ENDPOINT
value: unix:///csi/csi.sock
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: LINSTOR_IP
value: "http://10.203.1.81:3370"
imagePullPolicy: "Always"
securityContext:
privileged: true
capabilities:
add: ["SYS_ADMIN"]
allowPrivilegeEscalation: true
volumeMounts:
- name: plugin-dir
mountPath: /csi
- name: pods-mount-dir
mountPath: /var/lib/kubelet
mountPropagation: "Bidirectional"
- name: device-dir
mountPath: /dev
volumes:
- name: registration-dir
hostPath:
path: /var/lib/kubelet/plugins_registry/
type: DirectoryOrCreate
- name: plugin-dir
hostPath:
path: /var/lib/kubelet/plugins/linstor.csi.linbit.com/
type: DirectoryOrCreate
- name: pods-mount-dir
hostPath:
path: /var/lib/kubelet
type: Directory
- name: device-dir
hostPath:
path: /dev
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: linstor-csi-node-sa
namespace: kube-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: linstor-csi-driver-registrar-role
namespace: kube-system
rules:
- apiGroups: [""]
resources: ["events"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
apiVersion: storage.k8s.io/v1beta1
kind: CSIDriver
metadata:
name: linstor.csi.linbit.com
spec:
attachRequired: true
podInfoOnMount: true
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: linstor-csi-driver-registrar-binding
subjects:
- kind: ServiceAccount
name: linstor-csi-node-sa
namespace: kube-system
roleRef:
kind: ClusterRole
name: linstor-csi-driver-registrar-role
apiGroup: rbac.authorization.k8s.io
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: linstor-csi-snapshotter-role
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["list", "watch", "create", "update", "patch"]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshotclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshotcontents"]
verbs: ["create", "get", "list", "watch", "update", "delete"]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshotcontents/status"]
verbs: ["update"]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshots"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["apiextensions.k8s.io"]
resources: ["customresourcedefinitions"]
verbs: ["create", "list", "watch", "delete"]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshots/status"]
verbs: ["update"]
查看服务状态
执行以下命令
kubectl get all -A | grep linstor
如查看到状态如下,则说明LINSTOR CSI 服务部署成功
root@ubuntu:~/k8sYaml/linstor# kubectl get all -A | grep linstor
kube-system pod/linstor-csi-controller-0 5/5 Running 0 179m
kube-system pod/linstor-csi-node-6gl45 2/2 Running 0 179m
kube-system pod/linstor-csi-node-6s969 2/2 Running 0 179m
kube-system daemonset.apps/linstor-csi-node 2 2 2 2 2 <none> 179m
kube-system statefulset.apps/linstor-csi-controller 1/1 179m
测试
创建LINSTOR storage class
Apply以下yaml文件(如果有两个diskful节点,autoPlace可以设置为2,以此类推,storagePool是之前已经在diskful节点创建好的存储池名字)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: linstor
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: "1"
storagePool: "poola"
Apply之后查看SC状态
root@ubuntu:~/k8sYaml/linstor# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
linstor linstor.csi.linbit.com Delete Immediate false 9s
创建persistent volume claim
Apply以下yaml文件
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fs-pvc5g
spec:
storageClassName: linstor
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
查看pvc和pv
root@ubuntu:~/k8sYaml/linstor# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
fs-pvc5g Bound pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811 5Gi RWO linstor 56m
root@ubuntu:~/k8sYaml/linstor# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811 5Gi RWO Delete Bound default/fs-pvc5g linstor 56m
root@ubuntu:~/k8sYaml/linstor#
创建使用这个pvc的Deployment
Apply以下yaml文件,这里使用nginx作为image,名称是ng1
apiVersion: apps/v1
kind: Deployment
metadata:
name: ng1
spec:
replicas: 2
strategy:
type: Recreate
selector:
matchLabels:
run: ng1
template:
metadata:
labels:
run: ng1
spec:
containers:
- name: ng1
image: nginx
ports:
- containerPort: 80
volumeMounts:
- mountPath: /usr/share/nginx/html
name: linstor-volume
volumes:
- name: linstor-volume
persistentVolumeClaim:
claimName: fs-pvc5g
查看状态
root@ubuntu:~/k8sYaml/linstor# kubectl get all -A -o wide | grep ng1
default pod/ng1-84794695b7-2pg4v 1/1 Running 0 58m 10.244.3.17 k8s4 <none> <none>
default pod/ng1-84794695b7-pt4xx 1/1 Running 0 58m 10.244.3.16 k8s4 <none> <none>
default service/ng1 NodePort 10.110.200.0 <none> 80:31075/TCP 57m run=ng1
default deployment.apps/ng1 2/2 2 2 58m ng1 nginx run=ng1
default replicaset.apps/ng1-84794695b7 2 2 2 58m ng1 nginx pod-template-hash=84794695b7,run=ng1
可以看到Pod已经是running状态,并且是运行在k8s4这个node
查看k8s4的DRBD资源状态
root@k8s4:~# drbdadm status
pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811 role:Primary
disk:Diskless
ubuntu role:Secondary
peer-disk:UpToDate
查看这个volume在宿主机系统中的mount路径
root@k8s4:~# df -h | grep pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811
/dev/drbd1007 4.9G 21M 4.6G 1% /var/lib/kubelet/pods/bc325e27-06f0-4ab6-895f-e8b66e19aa2d/volumes/kubernetes.io~csi/pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811/mount
进入到此路径下添加一个nginx 服务会展示的index.html文件,内容是File from drbd res
root@k8s4:~# cat /var/lib/kubelet/pods/bc325e27-06f0-4ab6-895f-e8b66e19aa2d/volumes/kubernetes.io~csi/pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811/mount/index.html
File from drbd res
Expose 这个Deployment的service
root@ubuntu:~/k8sYaml/linstor# kubectl expose deployment ng1 --port=80 --type=NodePort
service/ng1 exposed
root@ubuntu:~/k8sYaml/linstor# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d19h
loadbalancer-service LoadBalancer 10.100.64.28 <pending> 80:32416/TCP 3d23h
ng1 NodePort 10.110.200.0 <none> 80:31075/TCP 4s
在局域网中找另一个系统来访问这个服务
root@k8smaster:~# curl 10.203.1.85:31075
File from drbd res
k8s4 node failover测试
由于现在Pod是运行在k8s4,所以shutdown此节点,观察Pod会不会转移到k8s5 node,数据保持不变
shutdown k8s4
root@k8s4:~# shutdown now
观察
此时服务会暂时中断,属于正常情况,因为DRBD资源不是dual primary模式,需要一定时间在k8s5 node重新创建Pod
root@k8smaster:~# curl 10.203.1.85:31075
curl: (7) Failed to connect to 10.203.1.85 port 31075: Connection refused
可以看到pod在worker nodes上的变化,Kubernetes集群尝试在k8s5 node创建新的Pod
root@ubuntu:~/k8sYaml/linstor# kubectl get pod -A -o wide | grep ng1
default ng1-84794695b7-2pg4v 1/1 Terminating 0 79m 10.244.3.17 k8s4 <none> <none>
default ng1-84794695b7-hg72j 0/1 ContainerCreating 0 2m5s <none> k8s5 <none> <none>
default ng1-84794695b7-mghwl 0/1 ContainerCreating 0 2m5s <none> k8s5 <none> <none>
default ng1-84794695b7-pt4xx 1/1 Terminating 0 79m 10.244.3.16 k8s4 <none> <none>
观察一段时间后,k8s5 node 上的Pod一直处于creating的状态,查看详细信息
root@ubuntu:~/k8sYaml/linstor# kubectl describe pv pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811
Name: pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: linstor.csi.linbit.com
Finalizers: [kubernetes.io/pv-protection external-attacher/linstor-csi-linbit-com]
StorageClass: linstor
Status: Bound
Claim: default/fs-pvc5g
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 5Gi
Node Affinity:
Required Terms:
Term 0: linbit.com/hostname in [ubuntu]
Term 1: linbit.com/sp-DfltDisklessStorPool in [true]
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: linstor.csi.linbit.com
FSType: ext4
VolumeHandle: pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811
ReadOnly: false
VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=1617072579350-8081-linstor.csi.linbit.com
Events: <none>
root@ubuntu:~/k8sYaml/linstor# kubectl describe pod ng1-84794695b7-hg72j
Name: ng1-84794695b7-hg72j
Namespace: default
Priority: 0
Node: k8s5/10.203.1.96
Start Time: Tue, 30 Mar 2021 14:36:59 +0800
Labels: pod-template-hash=84794695b7
run=ng1
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/ng1-84794695b7
Containers:
ng1:
Container ID:
Image: nginx
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/usr/share/nginx/html from linstor-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qbmdj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
linstor-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: fs-pvc5g
ReadOnly: false
default-token-qbmdj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qbmdj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/ng1-84794695b7-hg72j to k8s5
Warning FailedAttachVolume 14m attachdetach-controller Multi-Attach error for volume "pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811" Volume is already used by pod(s) ng1-84794695b7-pt4xx, ng1-84794695b7-2pg4v
Warning FailedMount 5m33s kubelet Unable to attach or mount volumes: unmounted volumes=[linstor-volume], unattached volumes=[default-token-qbmdj linstor-volume]: timed out waiting for the condition
Warning FailedMount 63s (x5 over 12m) kubelet Unable to attach or mount volumes: unmounted volumes=[linstor-volume], unattached volumes=[linstor-volume default-token-qbmdj]: timed out waiting for the condition
发现如下报错,提示volume已经被mount,而由于现在k8s4 node是shutdown的状态,Pod无法成功删除,所以卡在此状态
Warning FailedAttachVolume 14m attachdetach-controller Multi-Attach error for volume "pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811" Volume is already used by pod(s) ng1-84794695b7-pt4xx, ng1-84794695b7-2pg4v
k8s4 node failback测试
Start k8s4 node
观察
此时由于k8s4重启,之前的旧pod能够正常删除,所以在k8s5上的pod由creating状态变为running
root@ubuntu:~/k8sYaml/linstor# kubectl get pod -A -o wide | grep ng1
default ng1-84794695b7-hg72j 1/1 Running 0 23m 10.244.4.28 k8s5 <none> <none>
default ng1-84794695b7-mghwl 1/1 Running 0 23m 10.244.4.27 k8s5 <none> <none>
nginx 服务恢复并且内容不变
root@k8smaster:~# curl 10.203.1.85:31075
File from drbd res
问题
在k8s4 node failover的情况下,无法在k8s5 node成功running pod,服务不可用
再次测试
原因
由于上一次测试会导致服务不可用,所以需要寻找是否有办法让服务中断之后自动恢复
查看上一次测试PVC状态
yaml文件
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fs-pvc5g
spec:
storageClassName: linstor
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
Describe
root@ubuntu:~/k8sYaml/linstor# kubectl describe pvc fs-pvc5g
Name: fs-pvc5g
Namespace: default
StorageClass: linstor
Status: Bound
Volume: pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: linstor.csi.linbit.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 5Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: ng1-84794695b7-hg72j
ng1-84794695b7-mghwl
Events: <none>
可以看到此pvc有一个属性是accessModes,值是ReadWriteOnce,或许跟此状态有关
查找pvc accessModes
查询到信息如下
1 ReadWriteOnce-该卷可以被单个节点以读写方式挂载
2 ReadOnlyMany-该卷可以被许多节点以只读方式挂载
3 ReadWriteMany-该卷可以被多个节点以读写方式挂载
创建一个新的accessModes为ReadWriteMany的pvc进行测试
创建pvc
Apply以下yaml文件
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fs-pvc1g
spec:
storageClassName: linstor
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
创建一个Deployment 来使用此PVC
Apply以下yaml文件,依旧使用nginx image,名称是ng2
apiVersion: apps/v1
kind: Deployment
metadata:
name: ng2
spec:
replicas: 2
strategy:
type: Recreate
selector:
matchLabels:
run: ng2
template:
metadata:
labels:
run: ng2
spec:
containers:
- name: ng2
image: nginx
ports:
- containerPort: 80
volumeMounts:
- mountPath: /usr/share/nginx/html
name: linstor-volume
volumes:
- name: linstor-volume
persistentVolumeClaim:
claimName: fs-pvc1g
查看状态
root@ubuntu:~/k8sYaml/linstor# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
frontend 1/1 Running 0 28h 10.244.4.10 k8s5 <none> <none>
ng1-84794695b7-hg72j 1/1 Running 0 32m 10.244.4.28 k8s5 <none> <none>
ng1-84794695b7-mghwl 1/1 Running 0 32m 10.244.4.27 k8s5 <none> <none>
ng2-56fb7f7bdf-p7lwh 1/1 Running 0 35s 10.244.3.19 k8s4 <none> <none>
ng2-56fb7f7bdf-zln7l 1/1 Running 0 35s 10.244.3.20 k8s4 <none> <none>
nginx-deployment-59586cc59f-k69xv 1/1 Running 0 4d4h 10.244.4.7 k8s5 <none> <none>
nginx-deployment-59586cc59f-nptx9 1/1 Running 0 4d4h 10.244.4.6 k8s5 <none> <none>
nginx-deployment-59586cc59f-tpkfc 1/1 Running 0 4d4h 10.244.4.8 k8s5 <none> <none>
可以看到ng2 pod跑在k8s4 node
Expose service
root@ubuntu:~/k8sYaml/linstor# kubectl expose deployment ng2 --port=80 --type=NodePort
service/ng2 exposed
root@ubuntu:~/k8sYaml/linstor# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d21h
loadbalancer-service LoadBalancer 10.100.64.28 <pending> 80:32416/TCP 4d1h
ng1 NodePort 10.110.200.0 <none> 80:31075/TCP 109m
ng2 NodePort 10.109.98.75 <none> 80:31012/TCP 4s
ng2-service NodePort 10.106.127.4 <none> 80:30001/TCP 4d1h
在k8s4 node往volume增加index.html文件
root@k8s4:~# cd /var/lib/kubelet/pods/a9539bf6-41b9-4d3c-827d-54b30135de5d/volumes/kubernetes.io~csi/pvc-b9a41c03-d9eb-43c8-942b-a0dabba62e69/mount
root@k8s4:/var/lib/kubelet/pods/a9539bf6-41b9-4d3c-827d-54b30135de5d/volumes/kubernetes.io~csi/pvc-b9a41c03-d9eb-43c8-942b-a0dabba62e69/mount# vi index.html
内容如下
file from rwx drbd res
在其他节点访问此服务
root@k8smaster:~# curl 10.203.1.85:31012
file from rwx drbd res
Shutdown k8s4 node
查看状态
服务中断
root@k8smaster:~# curl 10.203.1.85:31012
curl: (7) Failed to connect to 10.203.1.85 port 31012: Connection refused
等待一段时间后Pod 状态,可以看到这次虽然k8s4 node上的pod也没有完全删除,但是k8s5 node上重新running了两个pod
root@ubuntu:~/k8sYaml/linstor# kubectl get pod -o wide | grep ng2
ng2-56fb7f7bdf-75wz7 1/1 Running 0 21m 10.244.4.30 k8s5 <none> <none>
ng2-56fb7f7bdf-lk6n6 1/1 Running 0 21m 10.244.4.29 k8s5 <none> <none>
ng2-56fb7f7bdf-p7lwh 1/1 Terminating 0 34m 10.244.3.19 k8s4 <none> <none>
ng2-56fb7f7bdf-zln7l 1/1 Terminating 0 34m 10.244.3.20 k8s4 <none> <none>
再次访问服务,恢复
root@k8smaster:~# curl 10.203.1.85:31012
file from rwx drbd res
Start k8s4 node
k8s4 node上的pod正常删除,测试通过
root@ubuntu:~/k8sYaml/linstor# kubectl get pod -o wide | grep ng2
ng2-56fb7f7bdf-75wz7 1/1 Running 0 25m 10.244.4.30 k8s5 <none> <none>
ng2-56fb7f7bdf-lk6n6 1/1 Running 0 25m 10.244.4.29 k8s5 <none> <none>
ng2-56fb7f7bdf-p7lwh 0/1 Terminating 0 38m <none> k8s4 <none> <none>
ng2-56fb7f7bdf-zln7l 0/1 Terminating 0 38m <none> k8s4 <none> <none>
root@ubuntu:~/k8sYaml/linstor# kubectl get pod -o wide | grep ng2
ng2-56fb7f7bdf-75wz7 1/1 Running 0 25m 10.244.4.30 k8s5 <none> <none>
ng2-56fb7f7bdf-lk6n6 1/1 Running 0 25m 10.244.4.29 k8s5 <none> <none>