Prometheus+Alertmanager监控RabbitMQ并配置企业微信告警

安装Alertmanager

下载地址:https://prometheus.io/download/
下载完成后,将下载中软件包上传至Prometheus服务所在的机器

image.png

解压alertmanager软件包

tar -zxvf alertmanager-0.21.0.linux-amd64.tar.gz -C /data
mv /data/alertmanager-0.21.0.linux-amd64 /data/alertmanager
进入解压后的alertmanager文件夹,修改alertmanager.yml文件,配置报警信息,alertmanager.yml 内容如下:
cat alertmanager.yml 
global:
  resolve_timeout: 5m  #5分钟内没收到告警表示警报已解除
route:                 
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1m
  receiver: 'wechat'
receivers:
- name: 'wechat'
  wechat_configs:
  - corp_id: 'XXX'
    to_party: 'XX'
    agent_id: '1000011'
    api_secret: 'secret'

安装Exporter_rabbitmq

下载地址:https://github.com/kbudde/rabbitmq_exporter/releases
解压运行:

tar -zxvf rabbitmq_exporter-version
cd rabbitmq_exporter-version
RABBIT_USER=USER RABBIT_PASSWORD=PASSWORD OUTPUT_FORMAT=json PUBLISH_PORT=9099 RABBIT_URL=http://XXX:15672 nohup ./rabbitmq_exporter &

RABBIT_USER:Rabbitmq管理插件的用户名
RABBIT_PASSWORD: Rabbitmq管理插件的用户名密码
OUTPUT_FORMAT:数据输出格式为json
PUBLISH_PORT:监听端口
RABBIT_URL:Rabbitmq管理插件的地址
Rabbitmq_exporter起来后配置prometheus.yml添加RabbitMQ监控

- job_name: 'RabbitMQ'
    static_configs:
    - targets: ['47.241.2.144:9099']
      labels:
        instance: RabbitMQ-47.241.2.144
    - targets: ['47.101.150.234:9099']
      labels:
        instance: RabbitMQ-47.101.150.234

我这里监控了两个节点

配置告警规则

在Prometheus.yml下配置规则文件

rule_files:
  - "rule.yml"
cat /data/prometheus/rule.yml
groups:
- name: Rabbitmq
  rules:
  - alert: Rabbitmq-down
    expr: rabbitmq_up{job='RabbitMQ'} != 1
    labels:
      status: High
      team: Rabbitmq_monitor
    annotations:
      description: "Instance: {{ $labels.instance }} is Down ! ! !"
      value: '{{ $value }}'
      summary:  "The host node is down"
- name: Rabbitmq disk free limit 
  rules:
  - alert: Rabbitmq disk free limit   status
    expr: rabbitmq_node_disk_free{job='RabbitMQ'} / 1024 / 1024  <= rabbitmq_node_disk_free_limit{job='RabbitMQ'} / 1024 / 1024 + 200
    labels:
      status: High
      team: Rabbitmq_monitor
    annotations:
      description: "Instance: {{ $labels.instance }} the rmq free disk is to low ! ! !"
      value: '{{ $value }} MB'
      summary:  "The rmq free disk too low"

添加需要的监控项
Prometheus.yml整体配置

cat /data/prometheus/prometheus.yml 
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - 192.168.1.178:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rule.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'RabbitMQ'
    static_configs:
    - targets: ['xx.xx.xx.xx:9099']
      labels:
        instance: RabbitMQ-47.241.2.144
    - targets: ['xx.xx.xx.xx:9099']
      labels:
        instance: RabbitMQ-47.101.150.234
  - job_name: 'Linux'
    static_configs:
    - targets: ['xx.xx.xx.xx:9100']
      labels:
        instance: Linux
  - job_name: 'alertmanager'
    static_configs:
    - targets: ['xx.xx.xx.xx:9093']

最后查看效果测试效果

image.png

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容