本机测试
- 自己机器启动AlertManger 和 prometheus
./prometheus --config.file=prometheus.yml
./alertmanager --config.file alertmanager.yml
附:
- prometheus.yml
global:
scrape_timeout: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
rule_files:
- "first_rules.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'export'
static_configs:
- targets: ['10.211.55.14:9100','10.211.55.15:9100']
- job_name: 'alertmanger'
static_configs:
- targets: ['localhost:9093']
first_rules.yml
groups:
- name: test-rule
rules:
- alert: HighCPU
expr: 100-avg(irate(node_cpu_seconds_total{job="export",mode="idle"}[5m]))by(instance)*100 > 0.1
for: 1m
labels:
severity: warning
annotations:
#summary: High CPU
#console: Thank you
summary: "{{$labels.instance}}: Too many clients detected,{{$labels.job}} xixi"
description: "{{$labels.instance}}: Client num is above 80% (current value is: {{ $value }}"
- alertmanager.yml
global:
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'm13095004177@163.com'
smtp_auth_username: 'm13095004177@163.com'
smtp_auth_password: 'wzl19971123'
smtp_require_tls: false
route:
receiver: mail
receivers:
- name: 'mail'
email_configs:
- to: 'baihuashu97@foxmail.com'
- 两台虚拟机启动node-exporter
/node_exporter
不知道为什么,后面发送邮箱失败了,难道是4h的问题??
staging环境测试
- 报警规则
查看所有规则(大多数都是prom-operater自带的)
kubectl get PrometheusRule -n prometheus
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app: prometheus-operator
chart: prometheus-operator-6.6.1
heritage: Tiller
release: prometheus-operator
name: new-rule
namespace: prometheus
spec:
groups:
- name: new.rules
rules:
- alert: MemoryInsufficient
annotations:
summary: memory is exhausted
description: 'host:{{$labels.node_name}} Address:{{$labels.instance}}: Memory Usage is above 90% (current value is: {{ $value }}'
expr: |
(node_memory_MemTotal_bytes-(node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes))/node_memory_MemTotal_bytes*100>90
for: 3m
labels:
severity: critical
- alert: DiskStorageInsufficient
annotations:
summary: disk storage is exhausted
description: 'host:{{$labels.node_name}} Address:{{$labels.instance}}: Disk Storage Usage is above 90% (current value is: {{ $value }}'
expr: |
(node_filesystem_size_bytes{mountpoint="/"}-node_filesystem_free_bytes{mountpoint="/"})/node_filesystem_size_bytes{mountpoint ="/"}*100<0.1
for: 3m
labels:
severity: critical
- alert: ChoerodonServiceDown
annotations:
summary: Choerodon Service unavailable
description: '{{$labels.pod_name}} is unavailable'
expr: |
up{job="kubernetes-pod-choerodon"}==0
for: 3m
labels:
severity: critical
- alert: NodeDown
annotations:
summary: A node is unavailable
description: 'host:{{$labels.node}} Address:{{$labels.instance}} is unavailable'
expr: |
up{job="node-exporter"}==0
for: 3m
labels:
severity: critical
应用报警规则
kubectl apply -f prometheus-testRules.yaml
- alertmanager 邮箱相关配置
报警规则封装在sercet里面
kbl get Secret -n prometheus
alertmanager.yaml配置文件
global:
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'jugglee@163.com'
smtp_auth_username: 'jugglee@163.com'
smtp_auth_password: 'admin123'
smtp_require_tls: false
route:
receiver: default
routes:
- receiver: mail
match:
alertname: NodeDown
- receiver: mail
match:
alertname: ChoerodonServiceDown
- receiver: mail
match:
alertname: MemoryInsufficient
receivers:
- name: 'default'
email_configs:
- to: 'm13095004177@qq.com'
- name: 'mail'
email_configs:
- to: '986916990@qq.com'
我的做法是将配置文件生成一个sercet,在将默认的那个alertmanager-prometheus-operator-alertmanager中的base64 部分替换.
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
对比两个文件,替换掉.
kbl edit Secret artmanager-main -n prometheus
kbl edit Secret alertmanager-prometheus-operator-alertmanager -n prometheus
拓展:
如果有webhook
receivers:
- name: 'web.hook'
email_configs:
- to: 'sxxx@yy.com.com'
webhook_configs:
- url: 'http://localhost:8060/dingtalk/webhook1/send'
send_resolved: false