一、alertmanager
1.1 创建alertmanager配置文件
vim /root/alertmanager/config.yml
global:
resolve_timeout: 5m
http_config:
follow_redirects: true
smtp_hello: localhost
smtp_require_tls: true
pagerduty_url:'https://events.pagerduty.com/v2/enqueue'
opsgenie_api_url: 'https://api.opsgenie.com/'
wechat_api_url: '[https://qyapi.weixin.qq.com/cgi-bin/ '
wechat_api_corp_id: wxe11111111111ca #企业id
victorops_api_url: 'https://alert.victorops.com/integrations/generic/20131114/alert/'
route:
receiver: zhangsan // 对应下面receivers中的name
group_by:
- groupLabel //分类字段,可自定义,对应告警rules中的字段
continue: false
group_wait: 30s
group_interval: 3m
repeat_interval: 3m
receivers:
- name: zhangsan
wechat_configs:
- send_resolved: true
http_config:
follow_redirects: true
api_secret: <secret> // 申请企业微信应用后生成的密码
corp_id: wxe11111111111ca
message: '{{ template "wechat.default.message" . }}'
api_url: https://qyapi.weixin.qq.com/cgi-bin/
to_user: zhangsan //发送到某一用户也可以 @all 就是群组全员发送
to_party: '{{ template "wechat.default.to_party" . }}'
to_tag: '{{ template "wechat.default.to_tag" . }}'
agent_id: "1000296" //申请企业微信应用id
message_type: text
templates:
- /apps/srv/alertmanager/templates/*.tmpl //告警模板路径
1.2 告警模板
示例测试模板,可根据需求自定义
{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
@警报 【{{ len .Alerts.Firing }}】
{{ range .Alerts }}
<pre>
信息: {{ .Annotations.summary }}
详情: {{ .Annotations.description }}
时间: {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
</pre>
{{ end }}
{{ end -}}
{{ end }}
2 构建alermanager
$ docker search alertmanager
$ docker pull docker.io/prom/alertmanager:latest
$ docker run -d -p 9093:9093 -v /root/alertmanager/config.yml:/etc/alertmanager/config.yml docker.io/prom/alertmanager:latest --config.file=/etc/alertmanager/config.yml
容器成功起起来以后访问 ip:port/#/alerts 可以看见下图 alertmanger就是成功搭建好了二、prometheus配置修改
1.1 修改prometheus配置文件
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 172.21.135.17:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rules/*.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'alertmanager' #指定监控任务alertmanager
static_configs:
- targets:
- ip:9093 #alertmanager所在机器的ip:port
1.2 增加rules告警规则配置
test_rules.yml
groups:
- name: passportHttpCode
rules:
- alert: XX服务http响应码
expr: code{service="1"} != 200
for: 1m
labels:
type: httpCode
object: 总体
title: XX服务http响应码
groupLabel: passportHttpCode
annotations:
summary: XX服务http响应码异常
description: "当前异常响应码数量为 {{ printf \"%.2f\" $value }}\n趋势: http://XXX"
重启prometheus使配置生效,访问 ip:9090/graph 如下图可以看到配置的规则生效了会展示在rules目录里面