1. 概述
在Ubuntu20.04部署k8s。
2. k8s安装工具对比
由于k8s固有的内部复杂性,采用kubeadm部署比较麻烦,但是也具有较大的灵活性。
这里对比几款中小集群的部署工具,包含MicroK8s
、minikube
、k3s
。
最终采用microk8s,基于ubuntu安装更加方便,且不像minikube采用vm实现,在GPU的支持上也会更加简单。
Ubuntu母公司Canonical维护的 MicroK8s
https://github.com/canonical/microk8s
https://ubuntu.com/kubernetes/installKubernetes官方维护的minikube
https://github.com/kubernetes/minikube
https://minikube.sigs.k8s.io/docs/Rancher维护的k3s
https://github.com/rancher/k3s
https://github.com/k3s-io/k3sMicroK8s vs K3s vs minikube
https://microk8s.io/compare
3. 安装步骤
3.1. 通过snap安装microk8s
sudo snap install microk8s --classic --channel=1.24/stable
安装过程中,需要下载k8s的一些镜像,如果网络存在问题,可以参考#4
- snap支持的k8s版本如下,可根据需要选择
$ snap info microk8s
...
channels:
1.31/stable: v1.31.3 2024-12-03 (7449) 168MB classic
1.31/candidate: v1.31.4 2024-12-17 (7514) 168MB classic
1.31/beta: v1.31.4 2024-12-17 (7514) 168MB classic
1.31/edge: v1.31.4 2024-12-10 (7514) 168MB classic
latest/stable: v1.32.0 2025-01-12 (7537) 171MB classic
latest/candidate: v1.32.0 2024-12-14 (7548) 171MB classic
latest/beta: v1.32.0 2024-12-14 (7548) 171MB classic
latest/edge: v1.32.0 2025-01-14 (7583) 171MB classic
1.32-strict/stable: v1.32.0 2024-12-12 (7549) 171MB -
1.32-strict/candidate: v1.32.0 2024-12-12 (7549) 171MB -
1.32-strict/beta: v1.32.0 2024-12-12 (7549) 171MB -
1.32-strict/edge: v1.32.0 2024-12-12 (7559) 171MB -
1.32/stable: v1.32.0 2024-12-12 (7537) 171MB classic
1.32/candidate: v1.32.0 2024-12-12 (7537) 171MB classic
1.32/beta: v1.32.0 2024-12-12 (7537) 171MB classic
1.32/edge: v1.32.0 2025-01-14 (7584) 171MB classic
...
也可以去官方文档中心查看:snap microk8s channels
https://microk8s.io/docs/setting-snap-channel如何更换k8s版本
sudo snap refresh microk8s --channel=1.26/stable
- 亦可以重新安装microk8s
sudo snap remove microk8s
sudo snap install microk8s --classic --channel=1.24/stable
3.2. 安装必要的插件
microk8s enable dns ingress dashboard
microk8s enable gpu registry
3.3. 查看运行情况
常用命令
# 启动与停止
microk8s start
microk8s stop
sudo snap restart microk8s
# 查看服务运行情况
microk8s status
# 如果k8s启动失败,可以查看错误原因(系统的service启动失败,可能需要修改一些系统参数)
microk8s inspect
sudo journalctl -u snap.microk8s.daemon-kubelite.service -r
# 查看k8s运行的服务,比如pod、deployment、service等
microk8s kubectl get all --all-namespaces
# 可以查看pod启动失败的原因,重点关注events,可能是image pull失败
microk8s kubectl describe pod --all-namespaces
# 通过ctr查看k8s中containerd的image与container
microk8s ctr i ls
microk8s ctr c ls
# 节点管理
microk8s add-node
microk8s remove-node
microk8s join
microk8s leave
microk8s kubectl get nodes
Command Example
$ microk8s status
microk8s is running
high-availability: no
datastore master nodes: 127.0.0.1:19001
datastore standby nodes: none
addons:
enabled:
dashboard # (core) The Kubernetes dashboard
dns # (core) CoreDNS
gpu # (core) Automatic enablement of Nvidia CUDA
ha-cluster # (core) Configure high availability on the current node
helm3 # (core) Helm 3 - Kubernetes package manager
hostpath-storage # (core) Storage class; allocates storage from host directory
ingress # (core) Ingress controller for external access
metrics-server # (core) K8s Metrics Server for API access to service metrics
registry # (core) Private image registry exposed on localhost:32000
storage # (core) Alias to hostpath-storage add-on, deprecated
disabled:
community # (core) The community addons repository
helm # (core) Helm 2 - the package manager for Kubernetes
host-access # (core) Allow Pods connecting to Host services smoothly
mayastor # (core) OpenEBS MayaStor
metallb # (core) Loadbalancer for your Kubernetes cluster
prometheus # (core) Prometheus operator for monitoring and logging
rbac # (core) Role-Based Access Control for authorisation
$ microk8s kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/dashboard-metrics-scraper-6b6f796c8d-knzss 1/1 Running 0 98m
kube-system pod/kubernetes-dashboard-765646474b-66bbr 1/1 Running 0 98m
kube-system pod/calico-node-tfh5v 1/1 Running 0 146m
kube-system pod/calico-kube-controllers-869f4694cd-mddbp 1/1 Running 0 146m
kube-system pod/coredns-66bcf65bb8-zptkw 1/1 Running 0 96m
kube-system pod/metrics-server-5f8f64cb86-vpx8p 1/1 Running 0 99m
kube-system pod/hostpath-provisioner-78cb89d65b-7wmtq 1/1 Running 0 95m
ingress pod/nginx-ingress-microk8s-controller-5f7xx 1/1 Running 0 81m
gpu-operator-resources pod/gpu-operator-node-feature-discovery-worker-lgk7f 1/1 Running 0 80m
gpu-operator-resources pod/gpu-operator-node-feature-discovery-master-84c7c7c6cf-cjdl7 1/1 Running 0 80m
gpu-operator-resources pod/gpu-operator-6d7dc7cfc-h8mk9 1/1 Running 0 80m
container-registry pod/registry-f69889b8c-fs8fs 1/1 Running 0 95m
kube-system pod/hostpath-provisioner-yfzy-nf5468m5-v56hn 0/1 Completed 0 94m
gpu-operator-resources pod/nvidia-container-toolkit-daemonset-75l2w 1/1 Running 0 74m
gpu-operator-resources pod/nvidia-cuda-validator-6phzn 0/1 Completed 0 71m
gpu-operator-resources pod/nvidia-device-plugin-daemonset-grmb8 1/1 Running 0 74m
gpu-operator-resources pod/nvidia-device-plugin-validator-wndtw 0/1 Completed 0 69m
gpu-operator-resources pod/nvidia-operator-validator-v4jlf 1/1 Running 0 74m
gpu-operator-resources pod/nvidia-dcgm-exporter-h5nbk 1/1 Running 0 74m
gpu-operator-resources pod/gpu-feature-discovery-s6vxq 1/1 Running 0 74m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 146m
kube-system service/metrics-server ClusterIP 10.152.183.241 <none> 443/TCP 99m
kube-system service/kubernetes-dashboard ClusterIP 10.152.183.186 <none> 443/TCP 99m
kube-system service/dashboard-metrics-scraper ClusterIP 10.152.183.212 <none> 8000/TCP 99m
kube-system service/kube-dns ClusterIP 10.152.183.10 <none> 53/UDP,53/TCP,9153/TCP 96m
container-registry service/registry NodePort 10.152.183.37 <none> 5000:32000/TCP 95m
gpu-operator-resources service/gpu-operator-node-feature-discovery-master ClusterIP 10.152.183.218 <none> 8080/TCP 80m
gpu-operator-resources service/gpu-operator ClusterIP 10.152.183.105 <none> 8080/TCP 74m
gpu-operator-resources service/nvidia-dcgm-exporter ClusterIP 10.152.183.13 <none> 9400/TCP 74m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/calico-node 1 1 1 1 1 kubernetes.io/os=linux 146m
ingress daemonset.apps/nginx-ingress-microk8s-controller 1 1 1 1 1 <none> 81m
gpu-operator-resources daemonset.apps/gpu-operator-node-feature-discovery-worker 1 1 1 1 1 <none> 80m
gpu-operator-resources daemonset.apps/nvidia-mig-manager 0 0 0 0 0 nvidia.com/gpu.deploy.mig-manager=true 74m
gpu-operator-resources daemonset.apps/nvidia-container-toolkit-daemonset 1 1 1 1 1 nvidia.com/gpu.deploy.container-toolkit=true 74m
gpu-operator-resources daemonset.apps/nvidia-device-plugin-daemonset 1 1 1 1 1 nvidia.com/gpu.deploy.device-plugin=true 74m
gpu-operator-resources daemonset.apps/nvidia-operator-validator 1 1 1 1 1 nvidia.com/gpu.deploy.operator-validator=true 74m
gpu-operator-resources daemonset.apps/nvidia-dcgm-exporter 1 1 1 1 1 nvidia.com/gpu.deploy.dcgm-exporter=true 74m
gpu-operator-resources daemonset.apps/gpu-feature-discovery 1 1 1 1 1 nvidia.com/gpu.deploy.gpu-feature-discovery=true 74m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/calico-kube-controllers 1/1 1 1 146m
kube-system deployment.apps/metrics-server 1/1 1 1 99m
kube-system deployment.apps/dashboard-metrics-scraper 1/1 1 1 99m
kube-system deployment.apps/kubernetes-dashboard 1/1 1 1 99m
kube-system deployment.apps/coredns 1/1 1 1 96m
kube-system deployment.apps/hostpath-provisioner 1/1 1 1 95m
gpu-operator-resources deployment.apps/gpu-operator-node-feature-discovery-master 1/1 1 1 80m
gpu-operator-resources deployment.apps/gpu-operator 1/1 1 1 80m
container-registry deployment.apps/registry 1/1 1 1 95m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/calico-kube-controllers-869f4694cd 1 1 1 146m
kube-system replicaset.apps/metrics-server-5f8f64cb86 1 1 1 99m
kube-system replicaset.apps/dashboard-metrics-scraper-6b6f796c8d 1 1 1 98m
kube-system replicaset.apps/kubernetes-dashboard-765646474b 1 1 1 98m
kube-system replicaset.apps/coredns-66bcf65bb8 1 1 1 96m
kube-system replicaset.apps/hostpath-provisioner-78cb89d65b 1 1 1 95m
gpu-operator-resources replicaset.apps/gpu-operator-node-feature-discovery-master-84c7c7c6cf 1 1 1 80m
gpu-operator-resources replicaset.apps/gpu-operator-6d7dc7cfc 1 1 1 80m
container-registry replicaset.apps/registry-f69889b8c 1 1 1 95m
$ microk8s inspect
Inspecting system
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-kubelite is running
Service snap.microk8s.daemon-k8s-dqlite is running
Service snap.microk8s.daemon-apiserver-kicker is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Inspecting dqlite
Inspect dqlite
WARNING: IPtables FORWARD policy is DROP. Consider enabling traffic forwarding with: sudo iptables -P FORWARD ACCEPT
The change can be made persistent with: sudo apt-get install iptables-persistent
Building the report tarball
Report tarball is at /var/snap/microk8s/5872/inspection-report-20250115_194011.tar.gz
$ snap services
Service Startup Current Notes
microk8s.daemon-apiserver enabled inactive -
microk8s.daemon-apiserver-kicker enabled active -
microk8s.daemon-cluster-agent enabled active -
microk8s.daemon-containerd enabled active -
microk8s.daemon-control-plane-kicker enabled inactive -
microk8s.daemon-controller-manager enabled inactive -
microk8s.daemon-etcd enabled inactive -
microk8s.daemon-flanneld enabled inactive -
microk8s.daemon-k8s-dqlite enabled active -
microk8s.daemon-kubelet enabled inactive -
microk8s.daemon-kubelite enabled active -
microk8s.daemon-proxy enabled inactive -
microk8s.daemon-scheduler enabled inactive -
microk8s.daemon-traefik enabled inactive -
4. 网络代理设置
由于某些原因,国内无法下载dockerhub等境外的docker仓库镜像,所以需要通过网络代理解决。
这里没有在服务器部署clash或vpn服务,而是将服务器(k8s宿主机)的7890端口请求转发到本地。
- 将远程服务器
gpu6
的127.0.0.1:7890
端口代理到本地127.0.0.1:7890
ssh -N -R 127.0.0.1:7890:127.0.0.1:7890 gpu6
- 当前ssh session中执行(可能没用)
export https_proxy=http://127.0.0.1:7890 http_proxy=http://127.0.0.1:7890 all_proxy=socks5://127.0.0.1:7890
- 追加
/var/snap/microk8s/current/args/containerd-env
HTTPS_PROXY=http://127.0.0.1:7890
NO_PROXY=10.0.0.0/8,192.168.0.0/16,127.0.0.1,172.16.0.0/16,.svc,localhost
5. dashboard查看
5.1. 通过dashboard-proxy命令
microk8s dashboard-proxy
5.2. 创建一个NodePort
类型的service
apiVersion: v1
kind: Service
metadata:
name: my-kubernetes-dashboard-svc
namespace: kube-system
spec:
selector:
k8s-app: kubernetes-dashboard
ports:
- protocol: TCP
port: 8443
targetPort: 8443
nodePort: 31212
type: NodePort
- FireFox访问https://dev1.iipharma.cn:31212
或者修改集群内部默认的
kubernetes-dashboard
(默认是ClusterIP,只能集群内部访问)
5.3. 通过kubectl port-forward
反向代理
microk8s kubectl port-forward -n kube-system --address 0.0.0.0 service/kubernetes-dashboard 10443:443
- FireFox访问https://dev1.iipharma.cn:10443
5.4. 创建一个Dashboard Ingress
ingress只负责路由配置,集群入口的端口监听,需要修改ingress-controller。
这个方法比较麻烦,尝试无果
6. 其他
6.1. 配置kubectl
使用本地的kubectl
工具管理k8s集群,而不是直接采用microk8s kubectl
。
cd $HOME
mkdir .kube
cd .kube
microk8s config > config
6.1. 配置ctr
使用本地的ctr
工具管理k8s集群,而不是直接采用microk8s ctr
。
- 通过
-a
指定containerd address
$ ctr -a /var/snap/microk8s/common/run/containerd.sock ns ls
NAME LABELS
k8s.io
$ ctr -a /var/snap/microk8s/common/run/containerd.sock -n k8s.io c ls
CONTAINER IMAGE RUNTIME
002c3a51e352a0794cb898ffd7a428e0ea268f104bdc491a9e6e048ef89f75cb k8s.gcr.io/pause:3.1 io.containerd.runc.v2
033983d9aa9a7e7ab8cf16d52f029c3a3301af541cc0b3b82f5257c41e458c51 k8s.gcr.io/metrics-server/metrics-server:v0.5.2 io.containerd.runc.v2
072a72b3de932103cbc3ace07505efd0bdf1db5c0efc3a95fcce16ab231cdc23 nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.11.0 io.containerd.runc.v2
...
- 也可以配置一个环境变量
export CONTAINERD_ADDRESS=/var/snap/microk8s/common/run/containerd.sock
$ ctr -n k8s.io i ls
...