Prometheus监控Rabbitmq

1. 部署prometheus

采用kube-prometheus项目部署

2. 监控Rabbitmq

2.1. rabbitmq-exporter:

确保 RabbitMQ 服务暴露监控指标
该 Exporter 会将 RabbitMQ 的指标转换为 Prometheus 格式。

2.1.1. 部署rabbitmq-exporter进行指标抓取

rabbitmq-exporter是prometheus的抓取端,负责抓取rabbitmq的各项指标

vim  rabbitmq-exporter.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rabbitmq-exporter
  namespace: monitoring
  labels:
    app: rabbitmq-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rabbitmq-exporter
  template:
    metadata:
      labels:
        app: rabbitmq-exporter
    spec:
      containers:
        - name: rabbitmq-exporter
          image: kbudde/rabbitmq-exporter:latest
          ports:
            - containerPort: 9419 # RabbitMQ Exporter 服务指标端口
          env:
            - name: RABBIT_URL
              value: "http://10.105.106.145:15672" # 替换为 RabbitMQ 管理接口地址
            - name: RABBIT_USER
              value: "test" # 替换为实际用户名
            - name: RABBIT_PASSWORD
              value: "test@123" # 替换为实际密码
            - name: PUBLISH_PORT
              value: "9419"
---
apiVersion: v1
kind: Service
metadata:
  name: rabbitmq-exporter
  namespace: monitoring
  labels:
    app: rabbitmq-exporter
spec:
  selector:
    app: rabbitmq-exporter
  ports:
    - name: metrics
      protocol: TCP
      port: 9419
      targetPort: 9419
  • 测试抓取数据:
curl http://rabbitmq-exporter-svc:9419/metrics

2.1.2 配置ServiceMonitor和prometheusrules

Prometheus Operator 通过 ServiceMonitor 和 prometheusrules配置监控目标和告警规则。

vim  rabbitmq-ServiceMonitor-prometheusrules.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: rabbitmq-exporter
  namespace: monitoring
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: rabbitmq-exporter
  namespaceSelector:
    matchNames:
      - monitoring
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics
      scheme: http
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: rabbitmq-rules
  namespace: monitoring
spec:
  groups:
    - name: rabbitmq.rules
      rules:
        # 监控 RabbitMQ 队列的消息堆积情况
        - alert: RabbitMQQueueMessageLag
          expr: |
            sum(rabbitmq_queue_messages_ready) by (queue) > 10000
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: RabbitMQ 队列 {{ $labels.queue }} 发生消息堆积
            description: RabbitMQ 队列 {{ $labels.queue }} 中的消息堆积超过 10000 条,当前为 {{ $value }}。

        # 监控 RabbitMQ 磁盘空间使用情况
        - alert: RabbitMQDiskSpaceLow
          expr: rabbitmq_node_disk_free_alarm == 1
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: RabbitMQ 节点磁盘空间不足
            description: RabbitMQ 节点的磁盘空间不足,可能导致消息无法存储。
        # 监控 RabbitMQ 节点的内存使用情况
        - alert: RabbitMQMemoryUsageHigh
          expr: |
            (rabbitmq_node_mem_used / rabbitmq_node_mem_limit) * 100 > 80
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: RabbitMQ 内存使用率过高
            description: RabbitMQ 的内存使用率超过 80%,当前为 {{ $value }}%。

        # 监控 RabbitMQ 的连接数
        - alert: RabbitMQConnectionCountHigh
          expr: |
            rabbitmq_connections > 200
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: RabbitMQ 连接数过多
            description: RabbitMQ 当前的连接数超过 200,可能导致性能问题,当前连接数为 {{ $value }}。

        # 监控 RabbitMQ 消息发布速率
        - alert: RabbitMQTooManyChannels
          expr: rabbitmq_channels > 500
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: RabbitMQ 通道数量过多
            description: 当前 RabbitMQ 的通道数量超过 500,可能导致性能下降。

        # 监控 RabbitMQ 队列的消息未确认数量
        - alert: RabbitMQUnacknowledgedMessagesHigh
          expr: |
            sum(rabbitmq_queue_messages_unacknowledged) by (queue) > 10000
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: RabbitMQ 队列 {{ $labels.queue }} 消息未确认数量过高
            description: RabbitMQ 队列 {{ $labels.queue }} 中的未确认消息超过 10000 条,当前为 {{ $value }}。

        # 监控 RabbitMQ 节点的运行状态
        - alert: RabbitMQDown
          expr: rabbitmq_up == 0
          for: 1m
          labels:
            severity: critical
          annotations:
            summary: RabbitMQ 服务不可用
            description: RabbitMQ 服务未能正常运行超过 1 分钟。

2.2 导入 RabbitMQ Dashboard

模板地址:https://grafana.com/grafana/dashboards/?search=rabbitmq

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容