系统环境
系统: centos 7.4
内核: 4.4.176-1.el7.elrepo.x86_64
kubernetes环境
kubeadm:
kubelet:
kubectl:
docker: docker-ce-18.09.0-3.el7.x86_64
master: 3台
升级内核
centos 7.4默认的内核是3.10的,我们需要将内核升级到4.4版本。
- CentOS 7 上启用 ELRepo 仓库:
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
- 列出内核可用的安装包:
yum --disablerepo="*" --enablerepo="elrepo-kernel" list available
ml(mainline)为最新版本的内核,lt为长期支持的内核.
- 安装lt版本:
yum --enablerepo=elrepo-kernel install kernel-lt -y
- 设置 GRUB 默认的内核版本:
打开并编辑/etc/default/grub
并设置GRUB_DEFAULT=0
。意思是 GRUB 初始化页面的第一个内核将作为默认内核。
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=0
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto net.ifnames=0 idle=halt console=tty0 console=ttyS0,115200n8"
GRUB_DISABLE_RECOVERY="true"
- 接下来运行下面的命令来重新创建内核配置
grub2-mkconfig -o /boot/grub2/grub.cfg
重启系统,查看内核版本:
# reboot
# uname -sr
Linux 4.4.176-1.el7.elrepo.x86_64
安装docker-ce
初始化系统配置
- 下载配置阿里云EPEL源
wget-O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
- 禁用swap分区
如何禁用由于我系统在初始化的时候就没有swap,我这里就不演示了。
# free -h
total used free shared buff/cache available
Mem: 7.8G 168M 7.2G 396K 492M 7.4G
Swap: 0B 0B 0B
- 安装基础软件包
yum install -y \
conntrack-tools \
psmisc \
nfs-utils \
jq \
socat \
bash-completion \
rsync \
ipset \
ipvsadm \
lsof \
tree \
telnet
- conntrack-tools # ipvs 模式需要
- psmisc # 安装psmisc 才能使用命令killall,它在keepalive的监测脚本中使用到
- nfs-utils # 挂载nfs 共享文件需要 (创建基于 nfs的PV 需要)
- jq # 轻量JSON处理程序,安装docker查询镜像需要
- socat # 用于port forwarding
- bash-completion # bash命令补全工具,需要重新登录服务器生效
- rsync # 文件同步工具
- ipset
- ipvsadm
- lsof
- tree
- telnet
- 设置系统的NTP配置
如果系统的时间出现了不同,可能导致很多问题,比如etcd节点同步、api-server服务等一些列的坑。
systemctl restart ntpd.service && systemctl enable ntpd.service
- 设置ntp每10分钟同步一次
# cat /etc/cron.d/ntp
*/10 * * * * root ntpdate -u 100.100.3.2
如果非阿里云的服务器可以把服务器地址换成:
ntp1.aliyun.com
ntp2.aliyun.com
ntp3.aliyun.com
ntp4.aliyun.com
ntp5.aliyun.com
ntp6.aliyun.com
ntp7.aliyun.com
查看NTP是否同步过:
# ntpstat
synchronised to NTP server (100.100.5.1) at stratum 3
time correct to within 10 ms
polling server every 128 s
- 配置内核模块启动自加载:
# cat /etc/modules-load.d/10-kubeadm.conf
br_netfilter
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack_ipv4
nf_conntrack
- 关闭selinux:
# cat /etc/sysconfig/selinux
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
- 设置Centos Ulimit
# cat /etc/security/limits.d/30-kubeadm.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
- 设置docker的配置
# cat /etc/docker/daemon.json
{
"registry-mirrors": ["http://xxxxxxx.m.daocloud.io"],
"max-concurrent-downloads": 10,
"log-driver": "json-file",
"log-level": "warn",
"log-opts": {
"max-size": "100m",
"max-file": "3"
},
"data-root": "/var/lib/docker"
}
- 配置阿里云kubernetes源
# cat /etc/yum.repos.d/sz-kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
- 安装kubernetes依赖包
yum install kubeadm-1.13.4 kubelet-1.13.4 kubectl-1.13.4 -y
- 设置kubelet开机自启动
systemctl enable kubelet.service
- 设置kubectl自动补全功能:
echo 'source <(kubectl completion bash)' > ~/.bashrc
部署Kubernetes集群 3 Master
- 确定docker的Cgroup Driver
$ docker info|grep 'Cgroup Driver'
Cgroup Driver: cgroupfs
默认kubeadm初始化的kubelet也是cgroup
,如果是systemd可以通过修改kubeadm初始化配置文件修改,或者修改docker的daemon.json文件修改成一致的。部署后可以通过一下命令简单确认下:
$ ps -ef |grep cgroup
xiaomai 5909 3736 0 09:23 pts/0 00:00:00 grep --color=auto cgroup
root 24546 1 3 3月05 ? 09:12:08 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --hostname-override=10.1.65.65 --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.1
编辑kubeadm 的初始化文件
apiVersion: kubeadm.k8s.io/v1beta1
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 72h
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: {本地监听地址}
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: {kubect get nodes 查看的节点名称}
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: "{这里最好用域名,使用haproxy进行代理}:9443"
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.13.4
networking:
dnsDomain: cluster.local
podSubnet: "10.244.0.0/16"#这个地址主要是因为我们使用的网络是flannel
serviceSubnet: 10.96.0.0/12
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
#pod-infra-container-image: registry.aliyuncs.com/google_containers/pause:3.1
maxPods: 90
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
featureGates:
SupportIPVSProxyMode: true
metricsBindAddress: 0.0.0.0:10249
mode: ipvs
- kubeadm 初始化配置可以通过以下方式查看默认初始化配置文件:
kubeadm config print init-defaults
结果如下:
apiVersion: kubeadm.k8s.io/v1beta1
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 1.2.3.4
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: prod-master001
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: ""
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.13.0
networking:
dnsDomain: cluster.local
podSubnet: ""
serviceSubnet: 10.96.0.0/12
scheduler: {}
- 初始化第一个master节点:
kubeadm init --config kubeadm-init.conf
.............
.............
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
piclient] All control plane components are healthy after 22.011339 seconds
▽uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "10.1.65.78" as an annotation
[mark-control-plane] Marking the node 10.1.65.78 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node 10.1.65.78 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: abcdef.0123456789abcdef
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join xxxxxxxxxxxxx:9443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:d6dfc0881eb2f7f62ce6d8270b82eb52e25bf1f24c3d88976b34da0ad83935c0
- 准备初始化第二个和第三个master节点:
可以参考官方
在master-1上执行复制证书到另外两个节点(需要先解决免密登陆问题)
cat copy-certs.sh
#!/bin/bash
USER=root # customizable
CONTROL_PLANE_IPS="10.1.65.77 10.1.65.76"
#CONTROL_PLANE_IPS="10.1.65.77"
for host in ${CONTROL_PLANE_IPS}; do
scp /etc/kubernetes/pki/ca.crt "${USER}"@$host:
scp /etc/kubernetes/pki/ca.key "${USER}"@$host:
scp /etc/kubernetes/pki/sa.key "${USER}"@$host:
scp /etc/kubernetes/pki/sa.pub "${USER}"@$host:
scp /etc/kubernetes/pki/front-proxy-ca.crt "${USER}"@$host:
scp /etc/kubernetes/pki/front-proxy-ca.key "${USER}"@$host:
scp /etc/kubernetes/pki/etcd/ca.crt "${USER}"@$host:etcd-ca.crt
scp /etc/kubernetes/pki/etcd/ca.key "${USER}"@$host:etcd-ca.key
scp /etc/kubernetes/admin.conf "${USER}"@$host:
done
在master-2和master-3上执行以下脚本
#!/bin/bash
# 拷贝k8s多主证书使用
USER=root # customizable
mkdir -p /etc/kubernetes/pki/etcd
mv /${USER}/ca.crt /etc/kubernetes/pki/
mv /${USER}/ca.key /etc/kubernetes/pki/
mv /${USER}/sa.pub /etc/kubernetes/pki/
mv /${USER}/sa.key /etc/kubernetes/pki/
mv /${USER}/front-proxy-ca.crt /etc/kubernetes/pki/
mv /${USER}/front-proxy-ca.key /etc/kubernetes/pki/
mv /${USER}/etcd-ca.crt /etc/kubernetes/pki/etcd/ca.crt
mv /${USER}/etcd-ca.key /etc/kubernetes/pki/etcd/ca.key
mv /${USER}/admin.conf /etc/kubernetes/admin.conf
在master-2上执行:
kubeadm join xxxxxxxxxxxxx:9443 \
--token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash \ sha256:d6dfc0881eb2f7f62ce6d8270b82eb52e25bf1f24c3d88976b34da0ad83935c0 \
--node-name 10.1.65.77 \
--experimental-control-plane
在master-3上执行:
kubeadm join xxxxxxxxxxxxx:9443 \
--token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash \ sha256:d6dfc0881eb2f7f62ce6d8270b82eb52e25bf1f24c3d88976b34da0ad83935c0 \
--node-name 10.1.65.76 \
--experimental-control-plane
- 在master-1上部署flannel网络插件
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
- 查看节点状态信息
执行命令:
kubectl get nodes -o wide
结果:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.1.65.76 Ready master 52m v1.13.4 10.1.65.76 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 docker://18.9.0
10.1.65.77 Ready master 53m v1.13.4 10.1.65.77 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 docker://18.9.0
10.1.65.78 Ready master 56m v1.13.4 10.1.65.78 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 docker://18.9.0
查看etcd节点的健康状态:
docker run --rm -it \
--net host \
-v /etc/kubernetes:/etc/kubernetes registry.aliyuncs.com/google_containers/etcd:3.2.24 etcdctl \
--cert-file /etc/kubernetes/pki/etcd/peer.crt \
--key-file /etc/kubernetes/pki/etcd/peer.key \
--ca-file /etc/kubernetes/pki/etcd/ca.crt \
--endpoints https://10.1.65.78:2379 cluster-health
member 62f2a35688e5547d is healthy: got healthy result from https://10.1.65.77:2379
member b84d28b4d55e3d72 is healthy: got healthy result from https://10.1.65.78:2379
member cae2610c789ed2c3 is healthy: got healthy result from https://10.1.65.76:2379
cluster is healthy
到目前为止master节点的部署已经告一段落了。
- 部署node节点:
部署node之前需要完成和master上一样的升级内核、修改系统的初始化配置,如果有精力最好写成一个ansible-playbook,这样可以减少劳动力。在所有的前提条件都完成了在开始之后的操作,而且也非常简单,只要一句命令:
apiVersion: kubeadm.k8s.io/v1beta1
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
bootstrapToken:
apiServerEndpoint: {API server的地址}:9443
token: abcdef.0123456789abcdef
unsafeSkipCAVerification: true
timeout: 5m0s
tlsBootstrapToken: abcdef.0123456789abcdef
kind: JoinConfiguration
nodeRegistration:
name: 10.1.65.85
criSocket: /var/run/dockershim.sock
---
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta1
imageRepository: registry.aliyuncs.com/google_containers
apiServer:
timeoutForControlPlane: 4m0s
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
pod-infra-container-image: registry.aliyuncs.com/google_containers/pause:3.1
maxPods: 90
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
featureGates:
SupportIPVSProxyMode: true
mode: ipvs
执行加入命令
kubeadm join --config kubeadm-join.yaml