公司使用docker单节点方式部署rancher,利用rancehr来操作k8s集群,有一天要访问rancher ui时,发现打不开,然后部署的所有容器也都不能使用,立马到服务器上查看情况,发现rancher容器还在,然后尝试进入容器时,报了错cannot exec in a stopped state: unknown,然后尝试查看rancher日志,发现可以查看
E0712 15:47:03.730752 6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.ProjectCatalog: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/projectcatalogs?allowWatchBookmarks=true&resourceVersion=155367341&timeout=30m0s&timeoutSeconds=574: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:03.730790 6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.Catalog: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/catalogs?allowWatchBookmarks=true&resourceVersion=155367339&timeout=30m0s&timeoutSeconds=404: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:02.947639 6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.KontainerDriver: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/kontainerdrivers?allowWatchBookmarks=true&resourceVersion=155367345&timeout=30m0s&timeoutSeconds=481: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:03.730823 6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.Pipeline: Get https://127.0.0.1:6443/apis/project.cattle.io/v3/watch/pipelines?allowWatchBookmarks=true&resourceVersion=155367348&timeout=30m0s&timeoutSeconds=568: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:03.730842 6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v1.Namespace: Get https://127.0.0.1:6443/api/v1/watch/namespaces?allowWatchBookmarks=true&resourceVersion=155367325&timeout=30m0s&timeoutSeconds=449: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:02.947667 6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.RKEK8sSystemImage: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/rkek8ssystemimages?allowWatchBookmarks=true&resourceVersion=155367347&timeout=30m0s&timeoutSeconds=504: dial tcp 127.0.0.1:6443: connect: connection refused
2021/07/12 15:47:03 [FATAL] k3s exited with: exit status 255
通过日志发现,k3s exited with: exit status 255以及127.0.0.1:6443: connect: connection refused,因为6443是kube-apiserver所以估计应该是k8s集群的问题,然后查询了一下255这个状态,在githab上发现
文章地址:https://github.com/rancher/rancher/issues/22841
下面有一个回复(利用chrome浏览器自动翻译)
文章地址:https://github.com/rancher/rancher/issues/22841
以及另外一篇博文跟我的情况比较像
文章地址:https://forums.cnrancher.com/q_988.html
下面是一条回复
文章地址:https://forums.cnrancher.com/q_988.html
估计应该是k3s崩了,于是重启了一下对应机器,发现k3s正常运行了,但是rancher却没有启动,重启rancher的docker容器
docker resatrt rancher
发现443端口被占用
于是通过命令查找占用443端口的进程
netstat -tunlp|grep 443
发现是nginx占用了,但是这台机器并没有安装nginx,于是根据pid查看nginx所在位置
cd /proc/92922/cwd
发现有nginx配置,编辑nginx.conf发现有很多ingress-controller的配置,于是猜测这个nginx是ingress-controller容器的,于是查看ingress-controller的信息
docker inspect ingress_controller
ingress_controller映射的端口
发现其确实占用了443端口,于是先停止ingress-controller,再启动rancher,再重启ingress_controller
docker stop ingress_controller
docker restart rancher
docker restart ingress_controller
问题解决