kubernetes环境手动部署 Prometheus 监控系统安装

kubernetes环境手动部署 Prometheus 监控系统安装文档

前言:文中“实操示例”配置内容,可按需要进行拆解安装配置

一、环境准备

Kubernetes 集群

确保已部署 Kubernetes 集群(版本 ≥1.20),且kubectl工具已配置。

镜像仓库

确认镜像harbor.fq.com/prometheus/node-exporter:v1.8.2和 Prometheus 相关镜像在私有仓库中可用。

命名空间

默认使用default命名空间,可根据需求调整至monitoring(需同步修改所有 YAML 文件中的namespace字段)。

二、创建 RBAC 权限

目标:为 Prometheus 分配访问 Kubernetes API 的权限。

1. 创建 ServiceAccount

# prometheus-serviceaccount.yaml

apiVersion: v1

kind: ServiceAccount

metadata:

  name: prometheus

secrets:

- name: prometheus-token

解释

ServiceAccountprometheus用于 Prometheus 的身份认证。

secrets字段关联一个 Secret(prometheus-token),存储访问凭证。

2. 创建 ClusterRole

# prometheus-clusterrole.yaml

apiVersion: rbac.authorization.k8s.io/v1

kind: ClusterRole

metadata:

  name: prometheus

rules:

  - apiGroups: [""]

    resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods"]

    verbs: ["get", "list", "watch"]

  - apiGroups: [""]

    resources: ["configmaps"]

    verbs: ["get"]

  - nonResourceURLs: ["/metrics"]

    verbs: ["get"]

解释

授予 Prometheus 访问节点、服务、Pod 等资源的权限。

允许读取/metrics端点(非资源 URL)。

3. 创建 ClusterRoleBinding

# prometheus-clusterrolebinding.yaml

apiVersion: rbac.authorization.k8s.io/v1

kind: ClusterRoleBinding

metadata:

  name: prometheus

subjects:

- kind: ServiceAccount

  name: prometheus

  namespace: default

roleRef:

  kind: ClusterRole

  name: prometheus

  apiGroup: rbac.authorization.k8s.io

解释

将prometheusClusterRole 绑定到prometheusServiceAccount,确保权限生效。

4. 生成 ServiceAccount Token

# prometheus-token.yaml

apiVersion: v1

kind: Secret

metadata:

  name: prometheus-token

  annotations:

    kubernetes.io/service-account.name: prometheus

type: kubernetes.io/service-account-token

应用 RBAC 配置

kubectl apply -f prometheus-serviceaccount.yaml

kubectl apply -f prometheus-clusterrole.yaml

kubectl apply -f prometheus-clusterrolebinding.yaml

kubectl apply -f prometheus-token.yaml

☆实操示例

cat prometheus-rabc0227.yaml

---

# 1. 创建 monitoring 命名空间

apiVersion: v1

kind: Namespace

metadata:

  name: monitoring

---

# 2. 创建 Prometheus 使用的 ServiceAccount

apiVersion: v1

kind: ServiceAccount

metadata:

  name: prometheus

  namespace: monitoring

---

# 3. 创建 ClusterRole,定义 Prometheus 的权限

apiVersion: rbac.authorization.k8s.io/v1

kind: ClusterRole

metadata:

  name: prometheus

rules:

- apiGroups: [""]

  resources:

  - nodes

  - nodes/metrics

  - services

  - endpoints

  - pods

  verbs: ["get", "list", "watch"]

- apiGroups: [""]

  resources:

  - configmaps

  verbs: ["get"]

- apiGroups: [""]

  resources:

  - nodes/proxy

  verbs: ["get", "list", "watch"]

- apiGroups: ["networking.k8s.io"]

  resources:

  - ingresses

  verbs: ["get", "list", "watch"]

- nonResourceURLs: ["/metrics"]

  verbs: ["get"]

---

# 4. 将 ClusterRole 绑定到 ServiceAccount

apiVersion: rbac.authorization.k8s.io/v1

kind: ClusterRoleBinding

metadata:

  name: prometheus

roleRef:

  apiGroup: rbac.authorization.k8s.io

  kind: ClusterRole

  name: prometheus

subjects:

- kind: ServiceAccount

  name: prometheus

  namespace: monitoring

---

三、部署 Node Exporter

目标:在每个节点上部署 Node Exporter,收集节点资源指标。

# node-exporter-daemonset.yml

apiVersion: apps/v1

kind: DaemonSet

metadata:

  name: node-exporter

  namespace: kube-system

spec:

  selector:

    matchLabels:

      app: node-exporter

  template:

    metadata:

      labels:

        app: node-exporter

    spec:

      hostNetwork: true

      containers:

        - name: node-exporter

          image: harbor.fq.com/prometheus/node-exporter:v1.8.2

          args:

            - --path.rootfs=/host

          volumeMounts:

            - name: rootfs

              mountPath: /host

      volumes:

        - name: rootfs

          hostPath:

            path: /

解释

DaemonSet确保每个节点运行一个 Node Exporter Pod。

hostNetwork: true使用节点网络,直接暴露节点指标。

hostPath挂载根文件系统,用于收集节点级数据。

部署命令

kubectl apply -f node-exporter-daemonset.yml

☆实操示例

cat node-exporter-daemonset.yml

apiVersion: apps/v1

kind: DaemonSet

metadata:

  name: node-exporter

  namespace: monitoring  # 使用 "monitoring" 命名空间

  labels:

    k8s-app: node-exporter

spec:

  selector:

    matchLabels:

      k8s-app: node-exporter

  template:

    metadata:

      labels:

        k8s-app: node-exporter

      annotations:

        prometheus.io/scrape: "true"  # 允许 Prometheus 抓取数据

        prometheus.io/port: "9100"    # 指定 Node Exporter 端口

    spec:

      hostNetwork: true  # 允许 Pod 使用主机网络

      hostPID: true      # 允许访问主机的 PID 进程

      tolerations:

      - effect: NoSchedule  # 允许调度到 tainted 节点

        operator: Exists

      - effect: NoExecute

        operator: Exists

      securityContext:

        runAsNonRoot: true  # 避免使用 root 权限

        runAsUser: 65534    # 运行时使用 nobody 用户

      containers:

      - name: node-exporter

        image: harbor.fq.com/prometheus/node-exporter:v1.8.2  # 替换为可信赖的镜像地址

        args:

        - --path.rootfs=/host/root  # 设定 rootfs 路径

        - --path.procfs=/host/proc  # 设定 procfs 路径

        - --path.sysfs=/host/sys    # 设定 sysfs 路径

        - --no-collector.wifi        # 禁用 WiFi 采集

        - --no-collector.hwmon      # 禁用硬件监控采集

        ports:

        - containerPort: 9100

          protocol: TCP

        resources:  # 资源请求与限制

          requests:

            memory: "30Mi"

            cpu: "100m"

          limits:

            memory: "50Mi"

            cpu: "200m"

        volumeMounts:  # 挂载主机目录

        - name: proc

          mountPath: /host/proc

          readOnly: true

        - name: sys

          mountPath: /host/sys

          readOnly: true

        - name: rootfs

          mountPath: /host/root

          readOnly: true

      volumes:

      - name: proc

        hostPath:

          path: /proc

      - name: sys

        hostPath:

          path: /sys

      - name: rootfs

        hostPath:

          path: /

---

apiVersion: v1

kind: Service

metadata:

  name: node-exporter

  namespace: monitoring

  labels:

    k8s-app: node-exporter

  annotations:

    prometheus.io/scrape: 'true'  # 允许 Prometheus 采集

    prometheus.io/port: '9100'    # 采集端口

spec:

  selector:

    k8s-app: node-exporter

  ports:

  - name: metrics

    port: 9100

    protocol: TCP

    targetPort: 9100

  type: ClusterIP  # 仅在集群内部可访问

四、部署 Prometheus

目标:部署 Prometheus 主服务,配置抓取规则和持久化存储。

1. 创建持久化存储卷(PV/PVC)

根据集群存储类型(如 NFS、Local PV、云存储),创建 PVC 并挂载到 Prometheus。

示例(需根据实际环境调整):

# prometheus-pvc.yaml

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: prometheus-data

spec:

  accessModes:

    - ReadWriteOnce

  resources:

    requests:

      storage: 50Gi

2. 创建 Prometheus Deployment

# prometheus-deployment.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

  name: prometheus

spec:

  replicas: 1

  selector:

    matchLabels:

      app: prometheus

  template:

    metadata:

      labels:

        app: prometheus

    spec:

      serviceAccountName: prometheus

      containers:

        - name: prometheus

          image: prom/prometheus:v2.42.0

          args:

            - "--config.file=/etc/prometheus/prometheus.yml"

          ports:

            - containerPort: 9090

          volumeMounts:

            - name: config-volume

              mountPath: /etc/prometheus

            - name: data-volume

              mountPath: /prometheus

      volumes:

        - name: config-volume

          configMap:

            name: prometheus-config

        - name: data-volume

          persistentVolumeClaim:

            claimName: prometheus-data

☆实操示例

apiVersion: apps/v1

kind: Deployment

metadata:

  name: prometheus

  namespace: monitoring  # 指定命名空间

  labels:

    app: prometheus

spec:

  replicas: 1  # 生产环境通常建议 1 个实例,使用远程存储提高可用性

  selector:

    matchLabels:

      app: prometheus

  template:

    metadata:

      labels:

        app: prometheus

    spec:

      serviceAccountName: prometheus  # 关联 ServiceAccount,便于 RBAC 访问

      containers:

      - name: prometheus

        image: harbor.fq.com/prometheus/prometheus:v3.1.0  # 使用私有仓库镜像

        args:

        - --config.file=/etc/prometheus/prometheus.yml  # 指定 Prometheus 配置文件

        - --storage.tsdb.path=/prometheus  # 存储 TSDB 数据的位置

        - --web.console.templates=/etc/prometheus/consoles

        - --web.console.libraries=/etc/prometheus/console_libraries

        ports:

        - containerPort: 9090  # Prometheus Web 界面端口

        resources:  # 限制 CPU 和内存,防止资源耗尽

          requests:

            cpu: "500m"

            memory: "1Gi"

          limits:

            cpu: "1"

            memory: "2Gi"

        volumeMounts:

        - name: prometheus-config

          mountPath: /etc/prometheus  # 配置文件挂载点

        - name: prometheus-storage

          mountPath: /prometheus  # TSDB 数据存储路径

        - name: file-sd

          mountPath: /apps/prometheus/file-sd.yaml  # 动态目标发现文件路径

          subPath: file-sd.yaml  # 仅挂载文件,而不是整个目录

      volumes:

      - name: prometheus-config

        configMap:

          name: prometheus-config  # 从 ConfigMap 挂载 Prometheus 配置

      - name: prometheus-storage

        # persistentVolumeClaim:  # 生产环境使用 PVC 持久化存储

        #  claimName: prometheus-pvc

        emptyDir: {}  # 测试环境可使用空目录

      - name: file-sd

        hostPath:

          path: /root/file-sd.yaml  # 使用主机上的动态发现文件

          type: File

---

apiVersion: v1

kind: Service

metadata:

  name: prometheus

  namespace: monitoring

  labels:

    app: prometheus

spec:

  type: NodePort  # 在生产环境中建议使用 LoadBalancer 或 Ingress

  ports:

  - port: 9090

    targetPort: 9090

    nodePort: 30090  # 通过 NodePort 访问 Web 界面

  selector:

    app: prometheus

3. 创建 Prometheus ConfigMap

# prometheus-configmap.yaml

apiVersion: v1

kind: ConfigMap

metadata:

  name: prometheus-config

data:

  prometheus.yml: |

    global:

      scrape_interval: 15s

      evaluation_interval: 15s

    alerting:

      alertmanagers:

        - static_configs:

            - targets: ['alertmanager:9093']

    rule_files:

      - '/etc/prometheus/alert_rules.yml'

    scrape_configs:

      - job_name: 'prometheus'

        static_configs:

          - targets: ['localhost:9090']

      - job_name: 'node-exporter'

        static_configs:

          - targets: ['node-exporter:9100']

      - job_name: 'cadvisor'

        static_configs:

          - targets: ['cadvisor:8080']

      - job_name: 'pushgateway'

        static_configs:

          - targets: ['pushgateway:9091']

      - job_name: 'node-linux'

        static_configs:

          - targets: ['10.255.209.40:9100']

      - job_name: 'kubernetes-apiservers'

        kubernetes_sd_configs:

          - role: endpoints

            kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig

        tls_config:

          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

        scheme: https

        relabel_configs:

          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

            action: keep

            regex: default;kubernetes;https

      - job_name: 'kubernetes-nodes'

        kubernetes_sd_configs:

          - role: node

            kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig

        tls_config:

          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

          insecure_skip_verify: true

        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

        scheme: https

        relabel_configs:

          - source_labels: [_meta_kubernetes_node_ip]

            regex: '(.*):10250'  # Kubernetes 节点的默认 kubelet 端口

            replacement: '${1}:9100'  # Node Exporter 的监听端口

            target_label: __address__

            action: replace

          - action: labelmap

            regex: __meta_kubernetes_node_label_(.+)

      - job_name: 'kubernetes-pods'

        kubernetes_sd_configs:

          - role: pod

            namespaces:

              names:

                - kube-system

                - default

        tls_config:

          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

          insecure_skip_verify: true

        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

        scheme: https

        relabel_configs:

          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]

            action: keep

            regex: true

          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]

            action: replace

            target_label: __metrics_path__

            regex: (.+)

          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]

            action: replace

            target_label: __scheme__

            regex: (.+)

          - source_labels: [__meta_kubernetes_pod_ip]

            action: replace

            target_label: __address__

            regex: (.+)

            replacement: ${1}:9090

      - job_name: 'kubernetes-service-endpoints'

        kubernetes_sd_configs:

          - role: endpoints

            kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig

        tls_config:

          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

          insecure_skip_verify: true

        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

        scheme: https

        relabel_configs:

          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

            action: keep

            regex: true

          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]

            action: replace

            target_label: __scheme__

            regex: (https?)

          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]

            action: replace

            target_label: __metrics_path__

            regex: (.+)

          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]

            action: replace

            target_label: __address__

            regex: ([^:]+)(?::\d+)?;(\d+)

            replacement: $1:$2

          - action: labelmap

            regex: __meta_kubernetes_service_label_(.+)

          - source_labels: [__meta_kubernetes_namespace]

            action: replace

            target_label: kubernetes_namespace

          - source_labels: [__meta_kubernetes_service_name]

            action: replace

            target_label: kubernetes_service_name

应用配置

kubectl apply -f prometheus-pvc.yaml

kubectl apply -f prometheus-configmap.yaml

kubectl apply -f prometheus-deployment.yaml

☆实操示例

cat prometheus-configmap0227.yaml

apiVersion: v1

kind: ConfigMap

metadata:

  name: prometheus-config

  namespace: monitoring

data:

  prometheus.yml: |

    global:

      scrape_interval: 15s

      evaluation_interval: 15s

      scrape_timeout: 10s  # 添加超时时间,避免抓取任务卡住

    alerting:

      alertmanagers:

        - static_configs:

            - targets: ['alertmanager:9093']

    rule_files:

      - '/etc/prometheus/alert_rules.yml'

    scrape_configs:

      # 抓取 Prometheus 自身指标

      - job_name: 'prometheus'

        static_configs:

          - targets: ['localhost:9090']

      # 抓取 Node Exporter 指标

      - job_name: 'node-exporter'

        static_configs:

          - targets: ['node-exporter:9100']

      # 抓取 cAdvisor 指标

      - job_name: 'cadvisor'

        static_configs:

          - targets: ['cadvisor:8080']

      # 抓取 Pushgateway 指标

      - job_name: 'pushgateway'

        static_configs:

          - targets: ['pushgateway:9091']

      # 抓取特定节点的 Node Exporter 指标

      - job_name: 'node-linux'

        static_configs:

          - targets: ['10.255.209.40:9100']

      # 抓取 Kubernetes API Server 指标

      - job_name: 'kubernetes-apiservers'

        kubernetes_sd_configs:

          - role: endpoints

        scheme: https

        tls_config:

          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

          insecure_skip_verify: true  # 生产环境中建议关闭,配置正确的 CA 证书

        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

        relabel_configs:

          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

            action: keep

            regex: default;kubernetes;https

      # 抓取 Kubernetes 节点指标(通过 Node Exporter)

      - job_name: 'kubernetes-nodes'

        kubernetes_sd_configs:

          - role: node

        relabel_configs:

          - source_labels: [__address__]

            regex: '(.*):10250'

            replacement: '${1}:9100'  # 将 kubelet 端口替换为 Node Exporter 端口

            target_label: __address__

          - action: labelmap

            regex: __meta_kubernetes_node_label_(.+)

      # 抓取 Kubernetes Pods 指标

      - job_name: 'kubernetes-pods'

        kubernetes_sd_configs:

          - role: pod

        relabel_configs:

          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]

            action: keep

            regex: true

          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]

            action: replace

            target_label: __metrics_path__

            regex: (.+)

          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]

            action: replace

            target_label: __address__

            regex: ([^:]+)(?::\d+)?;(\d+)

            replacement: $1:$2

          - action: labelmap

            regex: __meta_kubernetes_pod_label_(.+)

          - source_labels: [__meta_kubernetes_namespace]

            action: replace

            target_label: kubernetes_namespace

          - source_labels: [__meta_kubernetes_pod_name]

            action: replace

            target_label: kubernetes_pod_name

      # 抓取 Kubernetes Service Endpoints 指标

      - job_name: 'kubernetes-service-endpoints'

        kubernetes_sd_configs:

          - role: endpoints

        #scheme: https

        #tls_config:

        #  ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

        #  insecure_skip_verify: true  # 生产环境中建议关闭,配置正确的 CA 证书

        #bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

        relabel_configs:

          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

            action: keep

            regex: true

          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]

            action: replace

            target_label: __scheme__

            regex: (https?)

          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]

            action: replace

            target_label: __metrics_path__

            regex: (.+)

          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]

            action: replace

            target_label: __address__

            regex: ([^:]+)(?::\d+)?;(\d+)

            replacement: $1:$2

          - action: labelmap

            regex: __meta_kubernetes_service_label_(.+)

          - source_labels: [__meta_kubernetes_namespace]

            action: replace

            target_label: kubernetes_namespace

          - source_labels: [__meta_kubernetes_service_name]

            action: replace

            target_label: kubernetes_service_name

      - job_name: 'kubernetes-nginx-endpoints'  # 任务名称

        kubernetes_sd_configs:

          - role: endpoints  # 自动发现 Kubernetes Endpoints

        relabel_configs:

          # 只抓取带有 `prometheus.io/scrape: "true"` 注解的 Service

          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

            action: keep

            regex: true

          # 替换抓取协议(http 或 https)

          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]

            action: replace

            target_label: __scheme__

            regex: (https?)

          # 替换指标路径(默认为 /metrics)

          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]

            action: replace

            target_label: __metrics_path__

            regex: (.+)

          # 替换抓取地址和端口

          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]

            action: replace

            target_label: __address__

            regex: ([^:]+)(?::\d+)?;(\d+)

            replacement: $1:$2

          # 将 Kubernetes 标签映射到 Prometheus 标签

          - action: labelmap

            regex: __meta_kubernetes_service_label_(.+)

          # 添加 Kubernetes Namespace 标签

          - source_labels: [__meta_kubernetes_namespace]

            action: replace

            target_label: kubernetes_namespace

          # 添加 Kubernetes Service 名称标签

          - source_labels: [__meta_kubernetes_service_name]

            action: replace

            target_label: kubernetes_service_name

          # 添加 Kubernetes Pod 名称标签

          - source_labels: [__meta_kubernetes_pod_name]

            action: replace

            target_label: kubernetes_pod_name

          # 添加 Kubernetes Node 名称标签

          - source_labels: [__meta_kubernetes_pod_node_name]

            action: replace

            target_label: kubernetes_node_name

        # 如果需要抓取 HTTPS 端点,取消注释以下配置

        # scheme: https

        # tls_config:

        #  ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

        #  insecure_skip_verify: true  # 生产环境中建议关闭,配置正确的 CA 证书

        # bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

      - job_name: 'kube-state-metrics'

        kubernetes_sd_configs:

          - role: endpoints

            namespaces:

              names:

                - kube-system

                - monitoring

                - default

        relabel_configs:

          - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]

            action: keep

            regex: kube-state-metrics

          - source_labels: [__meta_kubernetes_endpoint_port_name]

            action: keep

            regex: http-metrics

        metrics_path: /metrics

        scheme: http

      - job_name: "file_sd"

        file_sd_configs:

        - files:

          - /apps/prometheus/file-sd.yaml

          refresh_interval: 1m

      - job_name: 'redis'

        kubernetes_sd_configs:

          - role: endpoints  # 从 Kubernetes Endpoints 发现服务

        relabel_configs:

          # 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务

          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

            action: keep

            regex: true

          # 替换目标地址为服务的 IP 和指定端口(9121)

          - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]

            action: keep

            regex: Pod;(.*redis.*)  # 仅抓取名称包含 "redis" 的 Pod

          - source_labels: [__meta_kubernetes_pod_ip]

            action: replace

            target_label: __address__

            replacement: $1:9121  # 指定 Redis Exporter 的端口为 9121

          # 添加 Kubernetes 服务的 app 标签

          - source_labels: [__meta_kubernetes_service_label_app]

            action: replace

            target_label: app

          # 添加 Kubernetes 命名空间标签

          - source_labels: [__meta_kubernetes_namespace]

            action: replace

            target_label: namespace

          # 添加 Kubernetes 服务名称标签

          - source_labels: [__meta_kubernetes_service_name]

            action: replace

            target_label: service

          # 添加 Kubernetes Pod 名称标签

          - source_labels: [__meta_kubernetes_pod_name]

            action: replace

            target_label: pod

          # 添加 Kubernetes 节点名称标签

          - source_labels: [__meta_kubernetes_pod_node_name]

            action: replace

            target_label: node

          # 添加实例标签(用于区分不同的 Redis 实例)

          - source_labels: [__meta_kubernetes_pod_ip]

            action: replace

            target_label: instance

      - job_name: 'mysql'

        kubernetes_sd_configs:

          - role: endpoints  # 从 Kubernetes Endpoints 发现服务

        relabel_configs:

          # 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务

          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

            action: keep

            regex: true

          # 替换目标地址为服务的 IP 和指定端口(9104)

          - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]

            action: keep

            regex: Pod;(.*mysql-exporter.*)  # 仅抓取名称包含 "mysql-exporter" 的 Pod

          - source_labels: [__meta_kubernetes_pod_ip]

            action: replace

            target_label: __address__

            replacement: $1:9104  # 指定 MySQL Exporter 的端口为 9104

          # 添加 Kubernetes 服务的 app 标签

          - source_labels: [__meta_kubernetes_service_label_app]

            action: replace

            target_label: app

          # 添加 Kubernetes 命名空间标签

          - source_labels: [__meta_kubernetes_namespace]

            action: replace

            target_label: namespace

          # 添加 Kubernetes 服务名称标签

          - source_labels: [__meta_kubernetes_service_name]

            action: replace

            target_label: service

          # 添加 Kubernetes Pod 名称标签

          - source_labels: [__meta_kubernetes_pod_name]

            action: replace

            target_label: pod

          # 添加 Kubernetes 节点名称标签

          - source_labels: [__meta_kubernetes_pod_node_name]

            action: replace

            target_label: node

          # 添加实例标签(用于区分不同的 MySQL 实例)

          - source_labels: [__meta_kubernetes_pod_ip]

            action: replace

            target_label: instance

4. 暴露 Prometheus 服务

# prometheus-service.yaml

apiVersion: v1

kind: Service

metadata:

  name: prometheus

spec:

  type: NodePort

  ports:

    - port: 9090

      targetPort: 9090

      nodePort: 30090

  selector:

    app: prometheus

应用服务

kubectl apply -f prometheus-service.yaml

五、验证部署

检查 Pod 状态

kubectl get pods -l app=prometheus -n default

kubectl get pods -n kube-system -l app=node-exporter

预期输出:所有 Pod 状态为Running。

访问 Prometheus UI

通过浏览器访问http://<NodeIP>:30090,进入 Prometheus 控制台。

Status > Targets页面,确认kubernetes-nodes和kubernetes-pods任务状态为UP。

查询up{job="kubernetes-nodes"}验证指标抓取是否正常。

六、常见问题排查

权限问题

错误示例:Failed to list *v1.Pod: forbidden

解决:检查 ClusterRoleBinding 是否绑定到正确的 ServiceAccount 和命名空间。

Node Exporter 未启动

检查 DaemonSet 是否部署到所有节点,确认镜像拉取无错误。

Prometheus 无法抓取指标

检查 Prometheus 配置中的scrape_configs是否指向正确的端口(如 Node Exporter 默认端口为9100)。

验证网络连通性:kubectl exec -it prometheus-pod -- curl http://<NodeIP>:9100/metrics。

七、后续优化

配置 Alertmanager:添加告警规则并集成 Alertmanager 实现告警通知。

持久化存储优化:使用高可用存储方案(如 Ceph、Longhorn)保障数据可靠性。

监控 Dashboard:部署 Grafana,导入 Prometheus 数据源并配置监控看板。

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容