k8s HA 集群

[toc]

参考网站

https://www.kubernetes.org.cn/6964.html

K8s HA集群架构与部署

image.png

架构要点

集群数据库

使用内部etcd集群

Apiserver 高可用

使用VIP将apiserver暴露给工作程序节点
VIP可以使用keepalived + haproxy或者keeplived + nginx 来实现

本次测试使用的是keepalived + haproxy集群

执行

部署etcd集群

使用内部etcd集群

即各个k8smaster节点上都有etcd服务，形成集群，不用手动安装部署etcd集群，etcd以pod方式运行，Kubernetes会自动部署成集群

Apiserver 高可用

软件介绍

keepalived

Keepalived是基于vrrp协议的一款高可用软件。Keepailived有一台主服务器和多台备份服务器，在主服务器和备份服务器上面部署相同的服务配置，使用一个VIP地址对外提供服务，当主服务器出现故障时，VIP地址会自动漂移到备份服务器

Nginx

默认使用80端口的Web服务

Haproxy

配置服务

keepalived+nginx架构

[图片上传失败...(image-b79bc3-1727511940339)]

Nginx配置

在所有节点执行软件安装

apt install -y nginx

在所有节点改写html文件，此html文件就是nginx向客户端展示的内容，本测试为了直观，3个节点应该写入不同内容，此处写入节点hostname

echo k8s1 > /var/www/html/index.nginx-debian.html

检查nginx服务

root@k8s2:/etc/keepalived# curl 10.203.1.82:80 
k8s1

Keepalived配置

在所有节点执行软件安装

apt install -y keepalived

编写配置文件，此处展示Master节点配置，backup节点应修改router_id，state以及priority

root@k8s1:~# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
 
global_defs {
   router_id k8s1   #在一个网络应该是唯一的
}
 
vrrp_script chk_nginx {
    script "/etc/keepalived/nginx_check.sh" #定时检查nginx是否正常运行的脚本
    interval 2   #脚本执行间隔，每2s检测一次
    weight -5    #脚本结果导致的优先级变更，检测失败（脚本返回非0）则优先级 -5
    fall 2       #检测连续2次失败才算确定是真失败。会用weight减少优先级（1-255之间）
    rise 1       #检测1次成功就算成功。但不修改优先级
}
 
 
 
vrrp_instance VI_1 {
    #指定keepalived的角色,这里指定的不一定就是MASTER，实际会根据优先级调整，另一台为BACKUP
    state MASTER   
    interface ens160        #当前进行vrrp通讯的网卡
    virtual_router_id 200  #虚拟路由编号(数字1-255)，主从要一致
    # mcast_src_ip 192.168.79.191  #
    priority 100  #定义优先级，数字越大，优先级越高，MASTER的优先级必须大于BACKUP的优先级
    nopreempt
    advert_int 1   #设定MASTER与BACKUP负载均衡器之间同步检查的时间间隔，单位是秒
    authentication {
        auth_type PASS
        auth_pass 2222
    }
    #执行监控的服务。注意这个设置不能紧挨着写在vrrp_script配置块的后面（实验中碰过的坑），
    #否则nginx监控失效！！
    track_script {
        chk_nginx    #引用VRRP脚本，即在 vrrp_script 部分指定的名字。
                     #定期运行它们来改变优先级，并最终引发主备切换。
    }
 
    virtual_ipaddress {#VRRP HA 虚拟地址 如果有多个VIP，继续换行填写
        10.203.1.85
    }
}

在所有节点编写nginx_check.sh脚本，脚本会检测nginx进程，如果进程不存在，尝试开启一次，如果开启不成功，杀死Keepalived

#!/bin/bash
counter=`ps -C nginx --no-heading|wc -l`
echo "$counter"
if [ "${counter}" = 0 ]; then
    /etc/init.d/nginx start
    sleep 2
    counter=`ps -C nginx --no-heading|wc -l`
    if [ "${counter}" = 0 ]; then
        /etc/init.d/keepalived stop
    fi
fi

增加可执行权限到nginx_check.sh脚本

chmod +x /etc/keepalived/nginx_check.sh

开启keepalived服务

systemctl daemon-reload
service keepalived start

测试

访问10.203.1.85

root@k8smaster:~# curl 10.203.1.85      
k8s1

在Master节点手动stop nginx服务

/etc/init.d/nginx stop

再次访问10.203.1.85，由于nginx_check.sh脚本会重启nginx服务，所以master还是k8s1

root@k8smaster:~# curl 10.203.1.85      
k8s1

shutdown k8s1节点，再次访问10.203.1.85，可以看到VIP以及迁移到k8s2节点

root@k8smaster:~# curl 10.203.1.85      
k8s2

开启k8s1节点，再次访问10.203.1.85，可以看到VIP回到k8s1节点

root@k8smaster:~# curl 10.203.1.85      
k8s1

keepalived+haproxy架构

image.png

工作流程

1.master节点通过apiserver来接收命令

2.haproxy有两个参数：
frontend：
   bind ：8080
backend
    master1：apiserver
    master2：apiserver
目的是把apiserver通过frontend端口转发

3.keepalived可以创建VIP并实现failover

4.所以，最后kubectl的命令可以下发到VIP：8080端口，进而转发给apiserver进行工作，无论哪个master节点存活，都能够掌控整个集群

5.Worker node是由Kubernetes集群实现HA的，例如在一个Worker node上运行一个Deployment，当这个Worker node宕机，Kubernetes会自动在另外的Worker node运行此Deployment

Keepalived配置

在所有节点安装Keepalived

apt install -y keepalived

编写配置文件，此处展示Master节点配置，backup节点应修改router_id，state以及priority

root@k8s1:~# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
 
global_defs {
   router_id k8s1   #在一个网络应该是唯一的
}
 
vrrp_script chk_api {
    script "/etc/keepalived/check_apiserver.sh" #定时检查apiserver是否正常运行的脚本
    interval 2   #脚本执行间隔，每2s检测一次
    weight -5    #脚本结果导致的优先级变更，检测失败（脚本返回非0）则优先级 -5
    fall 2       #检测连续2次失败才算确定是真失败。会用weight减少优先级（1-255之间）
    rise 1       #检测1次成功就算成功。但不修改优先级
}
 
 
 
vrrp_instance VI_1 {
    #指定keepalived的角色,这里指定的不一定就是MASTER，实际会根据优先级调整，另一台为BACKUP
    state MASTER   
    interface ens160        #当前进行vrrp通讯的网卡
    virtual_router_id 200  #虚拟路由编号(数字1-255)，主从要一致
    # mcast_src_ip 192.168.79.191  #
    priority 100  #定义优先级，数字越大，优先级越高，MASTER的优先级必须大于BACKUP的优先级
    nopreempt
    advert_int 1   #设定MASTER与BACKUP负载均衡器之间同步检查的时间间隔，单位是秒
    authentication {
        auth_type PASS
        auth_pass 2222
    }
    #执行监控的服务。注意这个设置不能紧挨着写在vrrp_script配置块的后面（实验中碰过的坑），
    #否则nginx监控失效！！
    track_script {
        chk_api    #引用VRRP脚本，即在 vrrp_script 部分指定的名字。
                     #定期运行它们来改变优先级，并最终引发主备切换。
    }
 
    virtual_ipaddress {#VRRP HA 虚拟地址 如果有多个VIP，继续换行填写
        10.203.1.85
    }
}

在所有节点编写check_apiserver.sh脚本，脚本会检测apiserver,如果apiserver不存在，杀死Keepalived，VIP就会飘到其他节点

#!/bin/sh

errorExit() {
    echo "*** $*" 1>&2
    exit 1
}

curl --silent --max-time 2 --insecure https://localhost:6443/ -o /dev/null || errorExit "Error GET https://localhost:6443/"
if ip addr | grep -q 10.203.1.85; then
    curl --silent --max-time 2 --insecure https://10.203.1.85:6443/ -o /dev/null || errorExit "Error GET https://10.203.1.85:6443/"
fi

增加可执行权限到nginx_check.sh脚本

chmod +x /etc/keepalived/check_apiserver.sh

开启keepalived服务

systemctl daemon-reload
service keepalived start

Haproxy配置

在所有节点安装haproxy

apt install -y haproxy

编辑配置文件/etc/haproxy/haproxy.cfg，3个节点的配置一致

# /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    log /dev/log local0
    log /dev/log local1 notice
    daemon

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 1
    timeout http-request    10s
    timeout queue           20s
    timeout connect         5s
    timeout client          20s
    timeout server          20s
    timeout http-keep-alive 10s
    timeout check           10s

#---------------------------------------------------------------------
# apiserver frontend which proxys to the masters
#---------------------------------------------------------------------
frontend apiserver
    bind *:8443
    mode tcp
    option tcplog
    default_backend apiserver

#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
    option httpchk GET /healthz
    http-check expect status 200
    mode tcp
    option ssl-hello-chk
    balance     roundrobin
        server k8s1 10.203.1.82:6443 check
        server k8s2 10.203.1.83:6443 check
        server k8s3 10.203.1.84:6443 check
        # [...]

重启服务

systemctl restart haproxy

部署kubernetes HA 集群

所有节点安装docker，kubeadm，kubectl，kubelet

Docker

curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun

Kubernetes组件

curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -

cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF

apt update

apt install -y kubelet kubeadm kubectl

Master节点

初始化Master节点

在任意一个节点执行以下命令，--control-plane-endpoint 10.203.1.85:8443就是VIP加上haproxy的frontend端口

kubeadm init --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --control-plane-endpoint 10.203.1.85:8443 --upload-certs

结果如下

root@k8s1:~# kubeadm init --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --control-plane-endpoint 10.203.1.85:8443 --upload-certs
[init] Using Kubernetes version: v1.20.5
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
        [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.5. Latest validated version: 19.03
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.203.1.82 10.203.1.85]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s1 localhost] and IPs [10.203.1.82 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s1 localhost] and IPs [10.203.1.82 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 79.519175 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
a30f2492d765a17e244ffc650f09ead393397f7f1d05efbe1c7525eb9c5f721b
[mark-control-plane] Marking the node k8s1 as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node k8s1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: p42ggz.1lc9jebaqoag8ca6
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of the control-plane node running the following command on each as root:

  kubeadm join 10.203.1.85:8443 --token p42ggz.1lc9jebaqoag8ca6 \
    --discovery-token-ca-cert-hash sha256:45297952d1b812be3c4ef88bf8060f5583e7a292e414f1eb82f0aa8bdcd71a3f \
    --control-plane --certificate-key a30f2492d765a17e244ffc650f09ead393397f7f1d05efbe1c7525eb9c5f721b

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.203.1.85:8443 --token p42ggz.1lc9jebaqoag8ca6 \
    --discovery-token-ca-cert-hash sha256:45297952d1b812be3c4ef88bf8060f5583e7a292e414f1eb82f0aa8bdcd71a3f

其余两个master节点执行以下命令

kubeadm join 10.203.1.85:8443 --token p42ggz.1lc9jebaqoag8ca6 \
    --discovery-token-ca-cert-hash sha256:45297952d1b812be3c4ef88bf8060f5583e7a292e414f1eb82f0aa8bdcd71a3f \
    --control-plane --certificate-key a30f2492d765a17e244ffc650f09ead393397f7f1d05efbe1c7525eb9c5f721b

结果如下

root@k8s2:~# kubeadm join 10.203.1.85:8443 --token p42ggz.1lc9jebaqoag8ca6 \
>     --discovery-token-ca-cert-hash sha256:45297952d1b812be3c4ef88bf8060f5583e7a292e414f1eb82f0aa8bdcd71a3f \
>     --control-plane --certificate-key a30f2492d765a17e244ffc650f09ead393397f7f1d05efbe1c7525eb9c5f721b
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
        [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.5. Latest validated version: 19.03
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s2 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.203.1.83 10.203.1.85]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s2 localhost] and IPs [10.203.1.83 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s2 localhost] and IPs [10.203.1.83 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[mark-control-plane] Marking the node k8s2 as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node k8s2 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

This node has joined the cluster and a new control plane instance was created:

* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.

To start administering your cluster from this node, you need to run the following as a regular user:

        mkdir -p $HOME/.kube
        sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
        sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run 'kubectl get nodes' to see this node join the cluster.

Worker Node

两个worker nodes执行以下命令

kubeadm join 10.203.1.85:8443 --token p42ggz.1lc9jebaqoag8ca6 \
>     --discovery-token-ca-cert-hash sha256:45297952d1b812be3c4ef88bf8060f5583e7a292e414f1eb82f0aa8bdcd71a3f

结果

root@k8s4:~# kubeadm join 10.203.1.85:8443 --token p42ggz.1lc9jebaqoag8ca6 \
>     --discovery-token-ca-cert-hash sha256:45297952d1b812be3c4ef88bf8060f5583e7a292e414f1eb82f0aa8bdcd71a3f
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
        [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.5. Latest validated version: 19.03
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

配置kubectl工具

在master节点执行以下命令

mkdir -p /root/.kube && \
cp /etc/kubernetes/admin.conf /root/.kube/config

如果局域网中有其他节点有下载kubectl工具，可以按照以下步骤配置用于管理此集群

mkdir -p /root/.kube

创建config文件，把Kubernetes master节点/etc/kubernetes/admin.conf文件的内容复制到config文件即可

部署flannel网络

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

修复cs

在所有master节点编辑注释controller-manager以及scheduler的yaml文件中的默认端口，使服务正常

vi /etc/kubernetes/manifests/kube-controller-manager.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-controller-manager
    tier: control-plane
  name: kube-controller-manager
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-controller-manager
    - --allocate-node-cidrs=true
    - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --bind-address=127.0.0.1
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --cluster-cidr=10.244.0.0/16
    - --cluster-name=kubernetes
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --controllers=*,bootstrapsigner,tokencleaner
    - --kubeconfig=/etc/kubernetes/controller-manager.conf
    - --leader-elect=true
      # - --port=0
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --root-ca-file=/etc/kubernetes/pki/ca.crt
    - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
    - --service-cluster-ip-range=10.96.0.0/12
    - --use-service-account-credentials=true
    image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.20.5
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1

vi /etc/kubernetes/manifests/kube-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-scheduler
    tier: control-plane
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-scheduler
    - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
    - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
    - --bind-address=127.0.0.1
    - --kubeconfig=/etc/kubernetes/scheduler.conf
    - --leader-elect=true
      #- --port=0
    image: registry.aliyuncs.com/google_containers/kube-scheduler:v1.20.5
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10259
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: kube-scheduler
    resources:

查看状态

Nodes

root@k8s1:~# kubectl get node
NAME   STATUS   ROLES                  AGE     VERSION
k8s1   Ready    control-plane,master   4d19h   v1.20.5
k8s2   Ready    control-plane,master   4d19h   v1.20.5
k8s3   Ready    control-plane,master   4d19h   v1.20.5
k8s4   Ready    <none>                 4d2h    v1.20.5
k8s5   Ready    <none>                 4d2h    v1.20.5

ComponentStatus

root@k8s1:~# kubectl get cs 
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE             ERROR
scheduler            Healthy   ok                  
controller-manager   Healthy   ok                  
etcd-0               Healthy   {"health":"true"}

测试

Master HA测试

测试步骤

在一个master节点执行资源创建

可以创建任意资源，例如deployment等

shutdown 节点

使用kubectl get 命令查看创建的资源，创建成功之后shutdown此master节点

查看状态

此时使用kubectl get node命令可以看到有一个master节点处于NotReady的状态
查看刚刚创建的资源，依旧处于正常状态，也能够进行删除，编辑等

重启节点

重启节点片刻之后，使用kubectl get node能够看到节点恢复Ready状态

资源HA 测试

测试步骤

创建一个deployment资源

查看资源运行情况

使用kubectl get all -o wide查看pod运行在哪个worker node

shutdown 节点

shutdown 查看到的pod运行的worker node，能够看到pod资源转移到另外的worker node

重启节点

重启节点之后，pod资源不会failback回来，节点重新变成Ready状态

Kubernetes HA 集群 & LINSTOR 测试

架构

[图片上传失败...(image-7bb5d8-1727511940339)]

说明

LINSTOR 部署

在此前的Kubernetes HA集群的worker nodes上安装DRBD+LINSTOR，并加入LINSTOR集群，此测试新增一个LINSTOR diskful节点。在Kubernetes集群创建LINSTOR CSI 相关服务，创建storage class

测试思路

创建一个LINSTOR 类型的persistent volume claim
创建一个使用此pvc的deployment
查看Pod运行在哪个worker node，shutdown 该node，查看pod是否会转移到其他worker node，数据是否保存

部署LINSTOR CSI服务

安装软件

在两个worker nodes执行

apt install software-properties-common
add-apt-repository ppa:linbit/linbit-drbd9-stack
apt update
apt install drbd-utils drbd-dkms lvm2
modprobe drbd
echo drbd > /etc/modules-load.d/drbd.conf
apt install linstor-controller linstor-satellite  linstor-client
#命令含义
#安装software-properties-common工具，安装之后才能执行第二个命令
#添加DRBD9 ppa源
#更新apt源
#安装DRBD9以及相应软件
#加载DRBD9
#DRBD9开机启动
#安装LINSTOR相关软件

将worker nodes加入LINSTOR 集群

root@k8s5:~# linstor n c k8s5 10.203.1.96
SUCCESS:
Description:
    New node 'k8s5' registered.
Details:
    Node 'k8s5' UUID is: 20d41b64-f6b4-4712-88df-c151c8f00e37
SUCCESS:
Description:
    Node 'k8s5' authenticated
Details:
    Supported storage providers: [diskless, lvm, lvm_thin, file, file_thin, openflex_target]
    Supported resource layers  : [drbd, luks, cache, storage]
    Unsupported storage providers:
        ZFS: 'cat /sys/module/zfs/version' returned with exit code 1
        ZFS_THIN: 'cat /sys/module/zfs/version' returned with exit code 1
        SPDK: IO exception occured when running 'rpc.py get_spdk_version': Cannot run program "rpc.py": error=2, No such file or directory
    
    Unsupported resource layers:
        NVME: IO exception occured when running 'nvme version': Cannot run program "nvme": error=2, No such file or directory
        WRITECACHE: 'modprobe dm-writecache' returned with exit code 1
        OPENFLEX: IO exception occured when running 'nvme version': Cannot run program "nvme": error=2, No such file or directory
        
root@k8s5:~# linstor n c k8s4 10.203.1.95
SUCCESS:
Description:
    New node 'k8s4' registered.
Details:
    Node 'k8s4' UUID is: db78f129-a23d-4245-a744-534a5365925e
SUCCESS:
Description:
    Node 'k8s4' authenticated
Details:
    Supported storage providers: [diskless, lvm, lvm_thin, file, file_thin, openflex_target]
    Supported resource layers  : [drbd, luks, cache, storage]
    Unsupported storage providers:
        ZFS: 'cat /sys/module/zfs/version' returned with exit code 1
        ZFS_THIN: 'cat /sys/module/zfs/version' returned with exit code 1
        SPDK: IO exception occured when running 'rpc.py get_spdk_version': Cannot run program "rpc.py": error=2, No such file or directory
    
    Unsupported resource layers:
        NVME: IO exception occured when running 'nvme version': Cannot run program "nvme": error=2, No such file or directory
        WRITECACHE: 'modprobe dm-writecache' returned with exit code 1
        OPENFLEX: IO exception occured when running 'nvme version': Cannot run program "nvme": error=2, No such file or directory

部署LINSTOR CSI 服务

Apply以下yaml文件（yaml文件中的LINSTOR_IP需要根据实际情况修改成LINSTOR controller 的IP）

---
kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: linstor-csi-controller
  namespace: kube-system
spec:
  serviceName: "linstor-csi"
  replicas: 1
  selector:
    matchLabels:
      app: linstor-csi-controller
      role: linstor-csi
  template:
    metadata:
      labels:
        app: linstor-csi-controller
        role: linstor-csi
    spec:
      priorityClassName: system-cluster-critical
      serviceAccount: linstor-csi-controller-sa
      containers:
        - name: csi-provisioner
          image: teym88/csi-provisioner:v1.5.0
          args:
            - "--csi-address=$(ADDRESS)"
            - "--v=5"
            - "--feature-gates=Topology=true"
            - "--timeout=120s"
          env:
            - name: ADDRESS
              value: /var/lib/csi/sockets/pluginproxy/csi.sock
          imagePullPolicy: "Always"
          volumeMounts:
            - name: socket-dir
              mountPath: /var/lib/csi/sockets/pluginproxy/
        - name: csi-attacher
          image: teym88/csi-attacher:v2.1.1
          args:
            - "--v=5"
            - "--csi-address=$(ADDRESS)"
            - "--timeout=120s"
          env:
            - name: ADDRESS
              value: /var/lib/csi/sockets/pluginproxy/csi.sock
          imagePullPolicy: "Always"
          volumeMounts:
            - name: socket-dir
              mountPath: /var/lib/csi/sockets/pluginproxy/
        - name: csi-resizer
          image: teym88/csi-resizer:v0.5.0
          args:
          - "--v=5"
          - "--csi-address=$(ADDRESS)"
          env:
          - name: ADDRESS
            value: /var/lib/csi/sockets/pluginproxy/csi.sock
          imagePullPolicy: "Always"
          volumeMounts:
          - mountPath: /var/lib/csi/sockets/pluginproxy/
            name: socket-dir
        - name: csi-snapshotter
          image: teym88/csi-snapshotter:v2.0.1
          args:
            - "-csi-address=$(ADDRESS)"
            - "-timeout=120s"
          env:
            - name: ADDRESS
              value: /var/lib/csi/sockets/pluginproxy/csi.sock
          imagePullPolicy: Always
          volumeMounts:
            - name: socket-dir
              mountPath: /var/lib/csi/sockets/pluginproxy/
        - name: linstor-csi-plugin
          image: teym88/piraeus-csi:v0.11.0
          args:
            - "--csi-endpoint=$(CSI_ENDPOINT)"
            - "--node=$(KUBE_NODE_NAME)"
            - "--linstor-endpoint=$(LINSTOR_IP)"
            - "--log-level=debug"
          env:
            - name: CSI_ENDPOINT
              value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: LINSTOR_IP
              value: "http://10.203.1.81:3370"
          imagePullPolicy: "Always"
          volumeMounts:
            - name: socket-dir
              mountPath: /var/lib/csi/sockets/pluginproxy/
      volumes:
        - name: socket-dir
          emptyDir: {}
---

kind: ServiceAccount
apiVersion: v1
metadata:
  name: linstor-csi-controller-sa
  namespace: kube-system

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: linstor-csi-provisioner-role
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["list", "watch", "create", "update", "patch"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshots"]
    verbs: ["get", "list"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshotcontents"]
    verbs: ["get", "list"]

---

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: linstor-csi-provisioner-binding
subjects:
  - kind: ServiceAccount
    name: linstor-csi-controller-sa
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: linstor-csi-provisioner-role
  apiGroup: rbac.authorization.k8s.io

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: linstor-csi-attacher-role
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["csinodes"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["volumeattachments"]
    verbs: ["get", "list", "watch", "update", "patch"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: linstor-csi-attacher-binding
subjects:
  - kind: ServiceAccount
    name: linstor-csi-controller-sa
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: linstor-csi-attacher-role
  apiGroup: rbac.authorization.k8s.io

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: linstor-csi-resizer-role
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "patch"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims/status"]
    verbs: ["patch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["list", "watch", "create", "update", "patch"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: linstor-csi-resizer-binding
subjects:
  - kind: ServiceAccount
    name: linstor-csi-controller-sa
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: linstor-csi-resizer-role
  apiGroup: rbac.authorization.k8s.io

---

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: linstor-csi-node
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: linstor-csi-node
      role: linstor-csi
  template:
    metadata:
      labels:
        app: linstor-csi-node
        role: linstor-csi
    spec:
      priorityClassName: system-node-critical
      serviceAccount: linstor-csi-node-sa
      containers:
        - name: csi-node-driver-registrar
          image: teym88/csi-node-driver-registrar:v1.2.0
          args:
            - "--v=5"
            - "--csi-address=$(ADDRESS)"
            - "--kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)"
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "rm -rf /registration/linstor.csi.linbit.com /registration/linstor.csi.linbit.com-reg.sock"]
          env:
            - name: ADDRESS
              value: /csi/csi.sock
            - name: DRIVER_REG_SOCK_PATH
              value: /var/lib/kubelet/plugins/linstor.csi.linbit.com/csi.sock
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: plugin-dir
              mountPath: /csi/
            - name: registration-dir
              mountPath: /registration/
        - name: linstor-csi-plugin
          image: teym88/piraeus-csi:v0.11.0
          args:
            - "--csi-endpoint=$(CSI_ENDPOINT)"
            - "--node=$(KUBE_NODE_NAME)"
            - "--linstor-endpoint=$(LINSTOR_IP)"
            - "--log-level=debug"
          env:
            - name: CSI_ENDPOINT
              value: unix:///csi/csi.sock
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: LINSTOR_IP
              value: "http://10.203.1.81:3370"
          imagePullPolicy: "Always"
          securityContext:
            privileged: true
            capabilities:
              add: ["SYS_ADMIN"]
            allowPrivilegeEscalation: true
          volumeMounts:
            - name: plugin-dir
              mountPath: /csi
            - name: pods-mount-dir
              mountPath: /var/lib/kubelet
              mountPropagation: "Bidirectional"
            - name: device-dir
              mountPath: /dev
      volumes:
        - name: registration-dir
          hostPath:
            path: /var/lib/kubelet/plugins_registry/
            type: DirectoryOrCreate
        - name: plugin-dir
          hostPath:
            path: /var/lib/kubelet/plugins/linstor.csi.linbit.com/
            type: DirectoryOrCreate
        - name: pods-mount-dir
          hostPath:
            path: /var/lib/kubelet
            type: Directory
        - name: device-dir
          hostPath:
            path: /dev
---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: linstor-csi-node-sa
  namespace: kube-system

---

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: linstor-csi-driver-registrar-role
  namespace: kube-system
rules:
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]

---

apiVersion: storage.k8s.io/v1beta1
kind: CSIDriver
metadata:
  name: linstor.csi.linbit.com
spec:
  attachRequired: true
  podInfoOnMount: true

---

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: linstor-csi-driver-registrar-binding
subjects:
  - kind: ServiceAccount
    name: linstor-csi-node-sa
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: linstor-csi-driver-registrar-role
  apiGroup: rbac.authorization.k8s.io

---

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: linstor-csi-snapshotter-role
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["list", "watch", "create", "update", "patch"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshotclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshotcontents"]
    verbs: ["create", "get", "list", "watch", "update", "delete"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshotcontents/status"]
    verbs: ["update"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshots"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["apiextensions.k8s.io"]
    resources: ["customresourcedefinitions"]
    verbs: ["create", "list", "watch", "delete"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshots/status"]
    verbs: ["update"]

查看服务状态

执行以下命令

kubectl get all -A | grep linstor

如查看到状态如下，则说明LINSTOR CSI 服务部署成功

root@ubuntu:~/k8sYaml/linstor# kubectl get all -A | grep linstor
kube-system            pod/linstor-csi-controller-0                     5/5     Running   0          179m
kube-system            pod/linstor-csi-node-6gl45                       2/2     Running   0          179m
kube-system            pod/linstor-csi-node-6s969                       2/2     Running   0          179m
kube-system   daemonset.apps/linstor-csi-node   2         2         2       2            2           <none>                   179m
kube-system   statefulset.apps/linstor-csi-controller   1/1     179m

测试

创建LINSTOR storage class

Apply以下yaml文件(如果有两个diskful节点，autoPlace可以设置为2，以此类推，storagePool是之前已经在diskful节点创建好的存储池名字)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: linstor
provisioner: linstor.csi.linbit.com
parameters:
  autoPlace: "1"
  storagePool: "poola"

Apply之后查看SC状态

root@ubuntu:~/k8sYaml/linstor# kubectl get sc
NAME      PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
linstor   linstor.csi.linbit.com   Delete          Immediate           false                  9s

创建persistent volume claim

Apply以下yaml文件

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fs-pvc5g
spec:
  storageClassName: linstor
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

查看pvc和pv

root@ubuntu:~/k8sYaml/linstor# kubectl get pvc
NAME       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
fs-pvc5g   Bound    pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811   5Gi        RWO            linstor        56m
root@ubuntu:~/k8sYaml/linstor# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS   REASON   AGE
pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811   5Gi        RWO            Delete           Bound    default/fs-pvc5g   linstor                 56m
root@ubuntu:~/k8sYaml/linstor#

创建使用这个pvc的Deployment

Apply以下yaml文件，这里使用nginx作为image，名称是ng1

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ng1
spec:
  replicas: 2
  strategy:
    type: Recreate
  selector:
    matchLabels:
      run: ng1
  template:
    metadata:
      labels:
        run: ng1
    spec:
      containers:
      - name: ng1
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: /usr/share/nginx/html      
          name: linstor-volume
      volumes:
      - name: linstor-volume
        persistentVolumeClaim:
          claimName: fs-pvc5g

查看状态

root@ubuntu:~/k8sYaml/linstor# kubectl get all -A -o wide | grep ng1
default                pod/ng1-84794695b7-2pg4v                         1/1     Running   0          58m     10.244.3.17   k8s4   <none>           <none>
default                pod/ng1-84794695b7-pt4xx                         1/1     Running   0          58m     10.244.3.16   k8s4   <none>           <none>
default                service/ng1                         NodePort       10.110.200.0     <none>        80:31075/TCP             57m     run=ng1
default                deployment.apps/ng1                         2/2     2            2           58m     ng1                         nginx                                                   run=ng1
default                replicaset.apps/ng1-84794695b7                         2         2         2       58m     ng1                         nginx                                                   pod-template-hash=84794695b7,run=ng1

可以看到Pod已经是running状态，并且是运行在k8s4这个node
查看k8s4的DRBD资源状态

root@k8s4:~# drbdadm status
pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811 role:Primary
  disk:Diskless
  ubuntu role:Secondary
    peer-disk:UpToDate

查看这个volume在宿主机系统中的mount路径

root@k8s4:~# df -h | grep pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811
/dev/drbd1007   4.9G   21M  4.6G   1% /var/lib/kubelet/pods/bc325e27-06f0-4ab6-895f-e8b66e19aa2d/volumes/kubernetes.io~csi/pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811/mount

进入到此路径下添加一个nginx 服务会展示的index.html文件，内容是File from drbd res

root@k8s4:~# cat /var/lib/kubelet/pods/bc325e27-06f0-4ab6-895f-e8b66e19aa2d/volumes/kubernetes.io~csi/pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811/mount/index.html 
File from drbd res

Expose 这个Deployment的service

root@ubuntu:~/k8sYaml/linstor# kubectl expose deployment ng1 --port=80 --type=NodePort   
service/ng1 exposed
root@ubuntu:~/k8sYaml/linstor# kubectl get svc
NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes             ClusterIP      10.96.0.1      <none>        443/TCP        5d19h
loadbalancer-service   LoadBalancer   10.100.64.28   <pending>     80:32416/TCP   3d23h
ng1                    NodePort       10.110.200.0   <none>        80:31075/TCP   4s

在局域网中找另一个系统来访问这个服务

root@k8smaster:~# curl 10.203.1.85:31075      
File from drbd res

k8s4 node failover测试

由于现在Pod是运行在k8s4，所以shutdown此节点，观察Pod会不会转移到k8s5 node，数据保持不变

shutdown k8s4

root@k8s4:~# shutdown now

观察

此时服务会暂时中断，属于正常情况，因为DRBD资源不是dual primary模式，需要一定时间在k8s5 node重新创建Pod

root@k8smaster:~# curl 10.203.1.85:31075      
curl: (7) Failed to connect to 10.203.1.85 port 31075: Connection refused

可以看到pod在worker nodes上的变化，Kubernetes集群尝试在k8s5 node创建新的Pod

root@ubuntu:~/k8sYaml/linstor# kubectl get pod -A -o wide | grep ng1
default                ng1-84794695b7-2pg4v                         1/1     Terminating         0          79m     10.244.3.17   k8s4   <none>           <none>
default                ng1-84794695b7-hg72j                         0/1     ContainerCreating   0          2m5s    <none>        k8s5   <none>           <none>
default                ng1-84794695b7-mghwl                         0/1     ContainerCreating   0          2m5s    <none>        k8s5   <none>           <none>
default                ng1-84794695b7-pt4xx                         1/1     Terminating         0          79m     10.244.3.16   k8s4   <none>           <none>

观察一段时间后，k8s5 node 上的Pod一直处于creating的状态，查看详细信息

root@ubuntu:~/k8sYaml/linstor# kubectl describe pv pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811
Name:              pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811
Labels:            <none>
Annotations:       pv.kubernetes.io/provisioned-by: linstor.csi.linbit.com
Finalizers:        [kubernetes.io/pv-protection external-attacher/linstor-csi-linbit-com]
StorageClass:      linstor
Status:            Bound
Claim:             default/fs-pvc5g
Reclaim Policy:    Delete
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          5Gi
Node Affinity:     
  Required Terms:  
    Term 0:        linbit.com/hostname in [ubuntu]
    Term 1:        linbit.com/sp-DfltDisklessStorPool in [true]
Message:           
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            linstor.csi.linbit.com
    FSType:            ext4
    VolumeHandle:      pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811
    ReadOnly:          false
    VolumeAttributes:      storage.kubernetes.io/csiProvisionerIdentity=1617072579350-8081-linstor.csi.linbit.com
Events:                <none>
root@ubuntu:~/k8sYaml/linstor# kubectl describe pod ng1-84794695b7-hg72j
Name:           ng1-84794695b7-hg72j
Namespace:      default
Priority:       0
Node:           k8s5/10.203.1.96
Start Time:     Tue, 30 Mar 2021 14:36:59 +0800
Labels:         pod-template-hash=84794695b7
                run=ng1
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/ng1-84794695b7
Containers:
  ng1:
    Container ID:   
    Image:          nginx
    Image ID:       
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from linstor-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qbmdj (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  linstor-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  fs-pvc5g
    ReadOnly:   false
  default-token-qbmdj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-qbmdj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason              Age                From                     Message
  ----     ------              ----               ----                     -------
  Normal   Scheduled           14m                default-scheduler        Successfully assigned default/ng1-84794695b7-hg72j to k8s5
  Warning  FailedAttachVolume  14m                attachdetach-controller  Multi-Attach error for volume "pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811" Volume is already used by pod(s) ng1-84794695b7-pt4xx, ng1-84794695b7-2pg4v
  Warning  FailedMount         5m33s              kubelet                  Unable to attach or mount volumes: unmounted volumes=[linstor-volume], unattached volumes=[default-token-qbmdj linstor-volume]: timed out waiting for the condition
  Warning  FailedMount         63s (x5 over 12m)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[linstor-volume], unattached volumes=[linstor-volume default-token-qbmdj]: timed out waiting for the condition

发现如下报错，提示volume已经被mount，而由于现在k8s4 node是shutdown的状态，Pod无法成功删除，所以卡在此状态

Warning  FailedAttachVolume  14m                attachdetach-controller  Multi-Attach error for volume "pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811" Volume is already used by pod(s) ng1-84794695b7-pt4xx, ng1-84794695b7-2pg4v

k8s4 node failback测试

Start k8s4 node

观察

此时由于k8s4重启，之前的旧pod能够正常删除，所以在k8s5上的pod由creating状态变为running

root@ubuntu:~/k8sYaml/linstor# kubectl get pod -A -o wide | grep ng1    
default                ng1-84794695b7-hg72j                         1/1     Running   0          23m     10.244.4.28   k8s5   <none>           <none>
default                ng1-84794695b7-mghwl                         1/1     Running   0          23m     10.244.4.27   k8s5   <none>           <none>

nginx 服务恢复并且内容不变

root@k8smaster:~# curl 10.203.1.85:31075      
File from drbd res

问题

在k8s4 node failover的情况下，无法在k8s5 node成功running pod，服务不可用

再次测试

原因

由于上一次测试会导致服务不可用，所以需要寻找是否有办法让服务中断之后自动恢复

查看上一次测试PVC状态

yaml文件

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fs-pvc5g
spec:
  storageClassName: linstor
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

Describe

root@ubuntu:~/k8sYaml/linstor# kubectl describe pvc fs-pvc5g
Name:          fs-pvc5g
Namespace:     default
StorageClass:  linstor
Status:        Bound
Volume:        pvc-64c11eac-5ac8-4f22-ab22-6cd5f87d2811
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: linstor.csi.linbit.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      5Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       ng1-84794695b7-hg72j
               ng1-84794695b7-mghwl
Events:        <none>

可以看到此pvc有一个属性是accessModes，值是ReadWriteOnce，或许跟此状态有关

查找pvc accessModes

查询到信息如下

1 ReadWriteOnce-该卷可以被单个节点以读写方式挂载
2 ReadOnlyMany-该卷可以被许多节点以只读方式挂载
3 ReadWriteMany-该卷可以被多个节点以读写方式挂载

创建一个新的accessModes为ReadWriteMany的pvc进行测试

创建pvc

Apply以下yaml文件

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fs-pvc1g
spec:
  storageClassName: linstor
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

创建一个Deployment 来使用此PVC

Apply以下yaml文件，依旧使用nginx image，名称是ng2

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ng2
spec:
  replicas: 2
  strategy:
    type: Recreate
  selector:
    matchLabels:
      run: ng2
  template:
    metadata:
      labels:
        run: ng2
    spec:
      containers:
      - name: ng2
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: /usr/share/nginx/html      
          name: linstor-volume
      volumes:
      - name: linstor-volume
        persistentVolumeClaim:
          claimName: fs-pvc1g

查看状态

root@ubuntu:~/k8sYaml/linstor# kubectl get pod -o wide
NAME                                READY   STATUS    RESTARTS   AGE    IP            NODE   NOMINATED NODE   READINESS GATES
frontend                            1/1     Running   0          28h    10.244.4.10   k8s5   <none>           <none>
ng1-84794695b7-hg72j                1/1     Running   0          32m    10.244.4.28   k8s5   <none>           <none>
ng1-84794695b7-mghwl                1/1     Running   0          32m    10.244.4.27   k8s5   <none>           <none>
ng2-56fb7f7bdf-p7lwh                1/1     Running   0          35s    10.244.3.19   k8s4   <none>           <none>
ng2-56fb7f7bdf-zln7l                1/1     Running   0          35s    10.244.3.20   k8s4   <none>           <none>
nginx-deployment-59586cc59f-k69xv   1/1     Running   0          4d4h   10.244.4.7    k8s5   <none>           <none>
nginx-deployment-59586cc59f-nptx9   1/1     Running   0          4d4h   10.244.4.6    k8s5   <none>           <none>
nginx-deployment-59586cc59f-tpkfc   1/1     Running   0          4d4h   10.244.4.8    k8s5   <none>           <none>

可以看到ng2 pod跑在k8s4 node

Expose service

root@ubuntu:~/k8sYaml/linstor# kubectl expose deployment ng2 --port=80 --type=NodePort       
service/ng2 exposed
root@ubuntu:~/k8sYaml/linstor# kubectl get svc
NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes             ClusterIP      10.96.0.1      <none>        443/TCP        5d21h
loadbalancer-service   LoadBalancer   10.100.64.28   <pending>     80:32416/TCP   4d1h
ng1                    NodePort       10.110.200.0   <none>        80:31075/TCP   109m
ng2                    NodePort       10.109.98.75   <none>        80:31012/TCP   4s
ng2-service            NodePort       10.106.127.4   <none>        80:30001/TCP   4d1h

在k8s4 node往volume增加index.html文件

root@k8s4:~# cd /var/lib/kubelet/pods/a9539bf6-41b9-4d3c-827d-54b30135de5d/volumes/kubernetes.io~csi/pvc-b9a41c03-d9eb-43c8-942b-a0dabba62e69/mount
root@k8s4:/var/lib/kubelet/pods/a9539bf6-41b9-4d3c-827d-54b30135de5d/volumes/kubernetes.io~csi/pvc-b9a41c03-d9eb-43c8-942b-a0dabba62e69/mount# vi index.html

内容如下

file from rwx drbd res

在其他节点访问此服务

root@k8smaster:~# curl 10.203.1.85:31012
file from rwx drbd res

Shutdown k8s4 node

查看状态

服务中断

root@k8smaster:~# curl 10.203.1.85:31012
curl: (7) Failed to connect to 10.203.1.85 port 31012: Connection refused

等待一段时间后Pod 状态，可以看到这次虽然k8s4 node上的pod也没有完全删除，但是k8s5 node上重新running了两个pod

root@ubuntu:~/k8sYaml/linstor# kubectl get pod -o wide | grep ng2
ng2-56fb7f7bdf-75wz7                1/1     Running       0          21m    10.244.4.30   k8s5   <none>           <none>
ng2-56fb7f7bdf-lk6n6                1/1     Running       0          21m    10.244.4.29   k8s5   <none>           <none>
ng2-56fb7f7bdf-p7lwh                1/1     Terminating   0          34m    10.244.3.19   k8s4   <none>           <none>
ng2-56fb7f7bdf-zln7l                1/1     Terminating   0          34m    10.244.3.20   k8s4   <none>           <none>

再次访问服务，恢复

root@k8smaster:~# curl 10.203.1.85:31012
file from rwx drbd res

Start k8s4 node

k8s4 node上的pod正常删除，测试通过

root@ubuntu:~/k8sYaml/linstor# kubectl get pod -o wide | grep ng2
ng2-56fb7f7bdf-75wz7                1/1     Running       0          25m    10.244.4.30   k8s5   <none>           <none>
ng2-56fb7f7bdf-lk6n6                1/1     Running       0          25m    10.244.4.29   k8s5   <none>           <none>
ng2-56fb7f7bdf-p7lwh                0/1     Terminating   0          38m    <none>        k8s4   <none>           <none>
ng2-56fb7f7bdf-zln7l                0/1     Terminating   0          38m    <none>        k8s4   <none>           <none>
root@ubuntu:~/k8sYaml/linstor# kubectl get pod -o wide | grep ng2
ng2-56fb7f7bdf-75wz7                1/1     Running   0          25m    10.244.4.30   k8s5   <none>           <none>
ng2-56fb7f7bdf-lk6n6                1/1     Running   0          25m    10.244.4.29   k8s5   <none>           <none>