kubernetes pod status 监控

dashboard

监控项:

  • 各个k8s集群所有pod not running的状态,监控pod的CrashLoopBackOff及一直处于ContainerCreating的状态,可通过grafana做告警
  • 监控pod的重启次数


    k8s-pod-status.png

TechStack

promtheus+grafana

PromQL

datasource接入prometheus,PromQL语句对应上图各项title

  • Container Waiting Reason
(sum(kube_pod_container_status_waiting_reason{reason!="ContainerCreating",namespace=~"$namespace",pod=~"$pod"} ) by (reason,namespace,pod) >0) 
*on(pod) group_right(reason) sum(kube_pod_info) by (pod,node,host_ip,pod_ip,namespace) 
or
(sum(kube_pod_container_status_waiting_reason{reason="ContainerCreating",namespace=~"$namespace",pod=~"$pod"} ) by (reason,namespace,pod) >0) 
-on(pod) group_right(reason) sum(kube_pod_info) by (pod,node,host_ip,pod_ip,namespace) 
  • pod重启次数(Last 5m)
(sum(kube_pod_container_status_restarts_total{namespace=~"$namespace",pod=~"$pod"}) by(namespace,pod) *on(pod) group_right() sum(kube_pod_info) by (pod,node,host_ip,pod_ip,namespace) 
-sum(kube_pod_container_status_restarts_total{namespace=~"$namespace",pod=~"$pod"} offset 5m) by(namespace,pod) *on(pod) group_right() sum(kube_pod_info) by (pod,node,host_ip,pod_ip,namespace))
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容