部署Kubeadm遇到的哪些问题,并且如何解决
https://www.jianshu.com/p/7ccf7769c3a9
k8s集群-CNI网络插件
地址:https://www.jianshu.com/p/1b1d6ab82e2e
1、初始化服务器设置(三台都要)
环境机器:Linux7.6系统
为了方便管理, 将服务器的实例名称改成: k8s-master01-15/k8s-node01-16/k8s-node02-17(其中15/16/17是私网IP的最后三位, 命名规则可以自行定义),测试一下三个服务器是否,可以通过私网相互ping通
修改主机名称
# k8s-master01-15 机器上
hostnamectl set-hostname k8s-master01-15
# k8s-node01-16 机器上
hostnamectl set-hostname k8s-node01-16
# k8s-node02-17 机器上
hostnamectl set-hostname k8s-node02-17
设置/etc/hosts文件
真正的集群应该是使用自己搭建的DNS服务器来进行IP和域名绑定, 这里处于简单考虑, 就直接使`用hosts文件关联IP和主机名了, 在三台服务的/etc/hosts文件中添加相同的三句话
cat << EOF >> /etc/hosts
172.23.199.15 k8s-master01-15
172.23.191.16 k8s-node01-16
172.23.191.17 k8s-node02-17
EOF
前置准备的环境 ( 所有节点 )
1)安装依赖包
yum install -y conntrack ipvsadm ipset jq iptables curl sysstat libseccomp wget vim net-tools git epel-release telnet tree nmap lrzsz dos2unix bind-utils
2)关闭setenforce和firewall、NetworkManager
sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
setenforce 0
systemctl disable firewalld && systemctl stop firewalld
systemctl stop NetworkManager
systemctl disabled NetworkManager
chkconfig NetworkManager off
systemctl restart NetworkManager
3)安装设置Iptables规则为空
yum -y install iptables-services && systemctl start iptables && systemctl enable iptables&& iptables -F && service iptables save
4)关闭swap分区 ( 如果不关闭的话, pod容器可能运行在swap(虚拟内存)中, 影响效率 )
swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
5)针对K8S调整内核参数
# 编辑配置文件
cat > /data/kubernetes.conf <<EOF
net.bridge.bridge-nf-call-iptables=1 # 开启网桥模式
net.bridge.bridge-nf-call-ip6tables=1 # 开启网桥模式
net.ipv4.ip_forward=1
net.ipv4.tcp_tw_recycle=0
vm.swappiness=0 # 禁止使用 swap 空间,只有当系统 OOM 时才允许使用它
vm.overcommit_memory=1 # 不检查物理内存是否够用
vm.panic_on_oom=0 # 开启 OOM
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=1048576
fs.file-max=52706963
fs.nr_open=52706963
net.ipv6.conf.all.disable_ipv6=1 # 关闭IPV6协议
net.netfilter.nf_conntrack_max=2310720
EOF
6)设置yum源
curl -o /etc/yum.repos.d/centos-7.repo http://mirrors.aliyun.com/repo/Centos-7.repo
curl -o /etc/yum.repos.d/docker-ce.repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum clean all && yum makecache
7)生效配置文件
cp /data/kubernetes.conf /etc/sysctl.d/kubernetes.conf
sysctl -p /etc/sysctl.d/kubernetes.conf
8)调整系统时区(时区正常的可以不用设置)
# 设置系统时区为中国/上海
timedatectl set-timezone Asia/Shanghai
# 将当前的 UTC 时间写入硬件时钟
timedatectl set-local-rtc 0
# 重启依赖于系统时间的服务
systemctl restart rsyslog
systemctl restart crond
9)关闭系统不需要的服务(如果有的话)
systemctl stop postfix && systemctl disable postfix
10)设置日志系统
选择systemd journald的日志系统, 而不是rsyslogd
创建日志目录
# 持久化保存日志的目录
mkdir -p /var/log/journal
mkdir -p /etc/systemd/journald.conf.d
编写配置文件
cat > /etc/systemd/journald.conf.d/99-prophet.conf <<EOF
[Journal]
# 持久化保存到磁盘
Storage=persistent
# 压缩历史日志
Compress=yes
SyncIntervalSec=5m
RateLimitInterval=30s
RateLimitBurst=1000
# 最大占用空间 10G
SystemMaxUse=10G
# 单日志文件最大 200M
SystemMaxFileSize=200M
# 日志保存时间 2 周
MaxRetentionSec=2week
# 不将日志转发到syslog
ForwardToSyslog=no
EOF
重启日志系统
systemctl restart systemd-journald
11)kube-proxy开启ipvs的前置条件
# 加载br_netfilter模块
modprobe br_netfilter
# 编写依赖文件
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
# 授权
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
12)安装Docker
# 安装依赖
yum install -y yum-utils device-mapper-persistent-data lvm2
# 配置阿里源
yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# 安装安装最新的 containerd.io
yum install dnf -y
dnf install -y https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.2.6-3.3.el7.x86_64.rpm
# 查看docker-ce-cli版本
yum list docker-ce-cli --showduplicates|sort -r
# 安装docker
yum install -y docker-ce-19.03.8 docker-ce-cli-19.03.8
# 查看docker版本(是否安装成功)
docker --version
# 创建 /etc/docker 目录
mkdir -p /etc/docker
# 配置 daemon.json 在阿里云控制台选择"容器镜像服务", 再选择"镜像加速器"侧边栏, 查看加速器地址
cat > /etc/docker/daemon.json <<EOF
{
"registry-mirrors": ["https://q4xjzq29.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
# 创建目录
mkdir -p /etc/systemd/system/docker.service.d
# 重启docker
systemctl daemon-reload
systemctl enable docker && systemctl restart docker && systemctl status docker
安装Kubeadm(主从配置)
下载kubeadm(三台服务器)
# 配置阿里源
cat > /etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
# 安装 kubelet kubeadm kubectl
yum install -y kubelet-1.20.11 kubectl-1.20.11 kubeadm-1.20.11
# systemctl在enable、disable、mask子命令里面增加了--now选项,可以激活同时启动服务,激活同时停止服务等
systemctl enable --now kubelet
# 查看安装的版本
kubelet --version
如何卸载K8s组件
# 卸载K8s组件前,先执行kubeadm reset命令,清空K8s集群设置
echo y|kubeadm reset
# 卸载管理组件
yum erase -y kubelet kubectl kubeadm kubernetes-cni
下载必须镜像(三台服务器)
正常情况下, 接下来可以直接init操作, 在init操作时, 也会下载一些必须的组件镜像, 这些镜像是在k8s.gcr.io网站上下载的, 但是由于我们国内把该网址墙掉了, 不能直接访问, 于是需要先提前将这些镜像通过其他的方式下载好, 这里比较好的方式就是从另一个网站源下载
kubeadm init主要执行了以下操作:
[init]:指定版本进行初始化操作
[preflight] :初始化前的检查和下载所需要的Docker镜像文件
[kubelet-start] :生成kubelet的配置文件”/var/lib/kubelet/config.yaml”,没有这个文件kubelet无法启动,所以初始化之前的kubelet实际上启动失败。
[certificates]:生成Kubernetes使用的证书,存放在/etc/kubernetes/pki目录中。
[kubeconfig] :生成 KubeConfig 文件,存放在/etc/kubernetes目录中,组件之间通信需要使用对应文件。
[control-plane]:使用/etc/kubernetes/manifest目录下的YAML文件,安装 Master 组件。
[etcd]:使用/etc/kubernetes/manifest/etcd.yaml安装Etcd服务。
[wait-control-plane]:等待control-plan部署的Master组件启动。
[apiclient]:检查Master组件服务状态。
[uploadconfig]:更新配置
[kubelet]:使用configMap配置kubelet。
[patchnode]:更新CNI信息到Node上,通过注释的方式记录。
[mark-control-plane]:为当前节点打标签,打了角色Master,和不可调度标签,这样默认就不会使用Master节点来运行Pod。
[bootstrap-token]:生成token记录下来,后边使用kubeadm join往集群中添加节点时会用到
[addons]:安装附加组件CoreDNS和kube-proxy
查看需要下载的镜像
kubeadm config images list
# 输出结果, 这些都是K8S的必要组件, 但是由于被墙, 是不能直接docker pull下来的
k8s.gcr.io/kube-apiserver:v1.20.15
k8s.gcr.io/kube-controller-manager:v1.20.15
k8s.gcr.io/kube-scheduler:v1.20.15
k8s.gcr.io/kube-proxy:v1.20.15
k8s.gcr.io/pause:3.2
k8s.gcr.io/etcd:3.4.13-0
k8s.gcr.io/coredns:1.7.0
编写pull脚本
cat >/data/script/pull_k8s_images.sh << "EOF"
# 内容为
set -o errexit
set -o nounset
set -o pipefail
##这里定义需要下载的版本
KUBE_VERSION=v1.20.15
KUBE_PAUSE_VERSION=3.2
ETCD_VERSION=3.4.13-0
DNS_VERSION=1.7.0
##这是原来被墙的仓库
GCR_URL=k8s.gcr.io
##这里就是写你要使用的仓库,也可以使用gotok8s
DOCKERHUB_URL=registry.cn-hangzhou.aliyuncs.com/google_containers
##这里是镜像列表
images=(
kube-proxy:${KUBE_VERSION}
kube-scheduler:${KUBE_VERSION}
kube-controller-manager:${KUBE_VERSION}
kube-apiserver:${KUBE_VERSION}
pause:${KUBE_PAUSE_VERSION}
etcd:${ETCD_VERSION}
coredns:${DNS_VERSION}
)
## 这里是拉取和改名的循环语句, 先下载, 再tag重命名生成需要的镜像, 再删除下载的镜像
for imageName in ${images[@]} ; do
docker pull $DOCKERHUB_URL/$imageName
docker tag $DOCKERHUB_URL/$imageName $GCR_URL/$imageName
docker rmi $DOCKERHUB_URL/$imageName
done
EOF
推送脚本到node[1:2]
scp /data/script/pull_k8s_images.sh root@IP地址:/data/script/
赋予执行权限
chmod +x /data/script/pull_k8s_images.sh
执行脚本
bash /data/script/pull_k8s_images.sh
查看下载结果
[root@k8s-master01-15 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.20.1 e3f6fcd87756 14 months ago 118MB
k8s.gcr.io/kube-controller-manager v1.20.1 2893d78e47dc 14 months ago 116MB
k8s.gcr.io/kube-apiserver v1.20.1 75c7f7112080 14 months ago 122MB
k8s.gcr.io/kube-scheduler v1.20.1 4aa0b4397bbb 14 months ago 46.4MB
k8s.gcr.io/etcd 3.4.13-0 0369cf4303ff 18 months ago 253MB
k8s.gcr.io/coredns 1.7.0 bfe3a36ebd25 20 months ago 45.2MB
k8s.gcr.io/pause 3.2 80d28bedfe5d 2 years ago 683kB
直接pull的话会报错超时 (如果没有提示 可忽略)
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.18.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
# 即先从gotok8s仓库下载镜像, 然后重新tag一下, 修改起名字即可。这里使用脚本自动化执行全过程
docker tag k8s.gcr.io/coredns:1.8.0 k8s.gcr.io/coredns/coredns:v1.8.0
docker rmi k8s.gcr.io/coredns:v1.8.0
初始化主节点(只有主节点服务器才需要初始化)
# 生成初始化文件
kubeadm config print init-defaults > kubeadm-config.yaml
编辑文件
vim kubeadm-config.yaml
# 修改项下面标出
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 172.19.199.15 # 本机IP
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: k8s-master01-15 # 本主机名
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {} # 虚拟IP和haproxy端口(如果k8s不做高可用集群,可以不填写)
dns:
type: CoreDNS
etcd:
# 如果只想部署单节点etcd,以下两个可以注释掉
# local:
# dataDir: /var/lib/etcd
# 这里我搭建k8s-外置ETCD集群(可选)
# 部署地址:https://www.jianshu.com/p/fbec19c20454
external:
# 修改etcd服务器地址
endpoints:
- https://172.23.199.15:2379
- https://172.23.199.16:2379
- https://172.23.199.17:2379
# 搭建etcd集群时生成的ca证书
caFile: /root/TLS/etcd/ca.pem
# 搭建etcd集群时生成的客户端证书
certFile: /root/TLS/etcd/server.pem
# 搭建etcd集群时生成的客户端密钥
keyFile: /root/TLS/etcd/server-key.pem
imageRepository: registry.aliyuncs.com/google_containers # 镜像仓库源要根据自己实际情况修改
kind: ClusterConfiguration
kubernetesVersion: v1.20.15 # 修改版本, 与前面版本一致, 也可通过 kubeadm version 查看版本
networking:
dnsDomain: cluster.local
podSubnet: "10.244.0.0/16" # 新增pod子网, 固定该IP即可
serviceSubnet: 10.96.0.0/12
scheduler: {}
# 新增下面设置, 固定即可
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
k8s-外置ETCD集群部署(可选)
地址:https://www.jianshu.com/p/fbec19c20454
运行初始化命令
kubeadm init --config=kubeadm-config.yaml | tee kubeadm-init.log
# 正常运行结果
....
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join IP地址:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:873f80617875dc39a23eced3464c7069689236d460b60692586e7898bf8a254a
如果init运行错误
可以根据错误信息来排错, 主要原因是配置文件 kubeadm-config.yaml 没写好, 还有版本号没对上, IP地址没改, 多余空格等等问题..........
但是修改完之后之后, 如果直接运行init命令, 可能还会报错端口已被占用或者一些文件已经存在的
[root@k8s-node01-15 ~]# kubeadm init --config=kubeadm-config.yaml | tee kubeadm-init.log
W0801 20:05:00.768809 44882 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.18.6
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING FileExisting-tc]: tc not found in system path
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-10259]: Port 10259 is in use
[ERROR Port-10257]: Port 10257 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
原因可能是之前init到一半成功一部分, 但报错后有没有回滚, 那么需要先运行kubeadm reset重新设置为init之前的状态
[root@k8s-node01-15 ~]# kubeadm reset # 或者 echo y|kubeadm reset 跳过输入[y/N]选项
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0801 20:15:00.630170 52554 reset.go:99] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: Get https://IP地址:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s: context deadline exceeded
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0801 20:15:00.534409 52554 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
重设完之后再继续执行上述的init即可, init 知道是否成功
init运行成功后
可以查看最后的输出结果或者查看运行日志kubeadm-init.log, 里面告诉说需要操作下面的步骤
# master上执行
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
export KUBECONFIG=/etc/kubernetes/admin.conf
# 推送node{1..X}机器上,如果/root/.kube/config没有目录要手动创建
scp /etc/kubernetes/admin.conf root@$nodeX:/root/.kube/config
查看当前节点, 发现状态为NotReady
[root@k8s-master01-15 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01-15 NotReady master 20m v1.20.11
提示:状态还是NotReady
是因为还没安装CNI网络插件
将子节点加到主节点下面(在子节点服务器运行)
还是在主节点的init命令的输出日志下, 有子节点的加入命令, 在两台子节点服务器上运行
kubeadm join masterIP地址:6443 --token XXXXXXXXXXXXXXXXXXX --discovery-token-ca-cert-hash sha256:XXXXXXXXXXXXXXXXXXX