1. 部署prometheus
采用kube-prometheus项目部署
2. 监控Rabbitmq
2.1. rabbitmq-exporter:
确保 RabbitMQ 服务暴露监控指标
该 Exporter 会将 RabbitMQ 的指标转换为 Prometheus 格式。
2.1.1. 部署rabbitmq-exporter进行指标抓取
rabbitmq-exporter是prometheus的抓取端,负责抓取rabbitmq的各项指标
vim rabbitmq-exporter.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: rabbitmq-exporter
namespace: monitoring
labels:
app: rabbitmq-exporter
spec:
replicas: 1
selector:
matchLabels:
app: rabbitmq-exporter
template:
metadata:
labels:
app: rabbitmq-exporter
spec:
containers:
- name: rabbitmq-exporter
image: kbudde/rabbitmq-exporter:latest
ports:
- containerPort: 9419 # RabbitMQ Exporter 服务指标端口
env:
- name: RABBIT_URL
value: "http://10.105.106.145:15672" # 替换为 RabbitMQ 管理接口地址
- name: RABBIT_USER
value: "test" # 替换为实际用户名
- name: RABBIT_PASSWORD
value: "test@123" # 替换为实际密码
- name: PUBLISH_PORT
value: "9419"
---
apiVersion: v1
kind: Service
metadata:
name: rabbitmq-exporter
namespace: monitoring
labels:
app: rabbitmq-exporter
spec:
selector:
app: rabbitmq-exporter
ports:
- name: metrics
protocol: TCP
port: 9419
targetPort: 9419
- 测试抓取数据:
curl http://rabbitmq-exporter-svc:9419/metrics
2.1.2 配置ServiceMonitor和prometheusrules
Prometheus Operator 通过 ServiceMonitor 和 prometheusrules配置监控目标和告警规则。
vim rabbitmq-ServiceMonitor-prometheusrules.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: rabbitmq-exporter
namespace: monitoring
labels:
release: prometheus
spec:
selector:
matchLabels:
app: rabbitmq-exporter
namespaceSelector:
matchNames:
- monitoring
endpoints:
- port: metrics
interval: 15s
path: /metrics
scheme: http
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: rabbitmq-rules
namespace: monitoring
spec:
groups:
- name: rabbitmq.rules
rules:
# 监控 RabbitMQ 队列的消息堆积情况
- alert: RabbitMQQueueMessageLag
expr: |
sum(rabbitmq_queue_messages_ready) by (queue) > 10000
for: 5m
labels:
severity: warning
annotations:
summary: RabbitMQ 队列 {{ $labels.queue }} 发生消息堆积
description: RabbitMQ 队列 {{ $labels.queue }} 中的消息堆积超过 10000 条,当前为 {{ $value }}。
# 监控 RabbitMQ 磁盘空间使用情况
- alert: RabbitMQDiskSpaceLow
expr: rabbitmq_node_disk_free_alarm == 1
for: 5m
labels:
severity: critical
annotations:
summary: RabbitMQ 节点磁盘空间不足
description: RabbitMQ 节点的磁盘空间不足,可能导致消息无法存储。
# 监控 RabbitMQ 节点的内存使用情况
- alert: RabbitMQMemoryUsageHigh
expr: |
(rabbitmq_node_mem_used / rabbitmq_node_mem_limit) * 100 > 80
for: 5m
labels:
severity: critical
annotations:
summary: RabbitMQ 内存使用率过高
description: RabbitMQ 的内存使用率超过 80%,当前为 {{ $value }}%。
# 监控 RabbitMQ 的连接数
- alert: RabbitMQConnectionCountHigh
expr: |
rabbitmq_connections > 200
for: 5m
labels:
severity: warning
annotations:
summary: RabbitMQ 连接数过多
description: RabbitMQ 当前的连接数超过 200,可能导致性能问题,当前连接数为 {{ $value }}。
# 监控 RabbitMQ 消息发布速率
- alert: RabbitMQTooManyChannels
expr: rabbitmq_channels > 500
for: 5m
labels:
severity: warning
annotations:
summary: RabbitMQ 通道数量过多
description: 当前 RabbitMQ 的通道数量超过 500,可能导致性能下降。
# 监控 RabbitMQ 队列的消息未确认数量
- alert: RabbitMQUnacknowledgedMessagesHigh
expr: |
sum(rabbitmq_queue_messages_unacknowledged) by (queue) > 10000
for: 5m
labels:
severity: warning
annotations:
summary: RabbitMQ 队列 {{ $labels.queue }} 消息未确认数量过高
description: RabbitMQ 队列 {{ $labels.queue }} 中的未确认消息超过 10000 条,当前为 {{ $value }}。
# 监控 RabbitMQ 节点的运行状态
- alert: RabbitMQDown
expr: rabbitmq_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: RabbitMQ 服务不可用
description: RabbitMQ 服务未能正常运行超过 1 分钟。
2.2 导入 RabbitMQ Dashboard
模板地址:https://grafana.com/grafana/dashboards/?search=rabbitmq