1. 部署redis-exporter进行指标抓取
vim redis-exporter.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-exporter
namespace: monitoring
labels:
app: redis-exporter
spec:
replicas: 1
selector:
matchLabels:
app: redis-exporter
template:
metadata:
labels:
app: redis-exporter
spec:
containers:
- name: redis-exporter
image: bitnami/redis-exporter:latest
ports:
- containerPort: 9121
env:
- name: REDIS_ADDR
value: "redis-service.redis.svc.cluster.local:6379" #redis地址
- name: REDIS_PASSWORD
value: "passwprd" #直接在这里设置 Redis 密码
resources:
limits:
memory: "64Mi"
cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
name: redis-exporter
namespace: monitoring
labels:
app: redis-exporter
spec:
ports:
- port: 9121
targetPort: 9121
name: metrics
protocol: TCP
selector:
app: redis-exporter
- 测试抓取数据:
下面ip为部署完成的redis-exporter的svc
curl http://10.104.13.103:9121/metrics
2. 配置ServiceMonitor和prometheusrules
vim redis-ServiceMonitor-prometheusrules.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: redis-exporter
namespace: monitoring
labels:
release: prometheus # 确保这个标签与 Prometheus 的配置匹配
spec:
selector:
matchLabels:
app: redis-exporter # 必须与 redis-exporter 服务的标签匹配
namespaceSelector:
matchNames:
- monitoring # redis-exporter 所在的命名空间
endpoints:
- port: metrics # redis-exporter 的服务端口
interval: 15s
path: /metrics
scheme: http
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: redis-rule
namespace: monitoring
spec:
groups:
- name: redis-exporter
rules:
- alert: RedisDown
expr: redis_up == 0
for: 5m
labels:
severity: critical
annotations:
summary: Redis down (instance {{ $labels.instance }})
description: "Redis 实例已停止运行\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisDisconnectedSlaves
expr: count without (instance, job) (redis_connected_slaves) - sum without (instance, job) (redis_connected_slaves) - 1 > 1
for: 5m
labels:
severity: critical
annotations:
summary: Redis disconnected slaves (instance {{ $labels.instance }})
description: "Redis 没有与所有从节点保持复制。请检查 Redis 的复制状态。\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisReplicationBroken
expr: delta(redis_connected_slaves[1m]) < 0
for: 10m
labels:
severity: page
annotations:
summary: Redis replication broken (instance {{ $labels.instance }})
description: "Redis 实例丢失了一个从节点\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisClusterFlapping
expr: changes(redis_connected_slaves[1m]) > 1
for: 5m
labels:
severity: page
annotations:
summary: Redis cluster flapping (instance {{ $labels.instance }})
description: "Redis 副本连接中检测到变动。这可能是由于从节点与主节点的连接丢失后重新连接(即连接抖动)。\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisMissingBackup
expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
for: 5m
labels:
severity: critical
annotations:
summary: Redis missing backup (instance {{ $labels.instance }})
description: "Redis 超过 24 小时没有备份\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisOutOfSystemMemory
expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90 and on(instance) redis_memory_max_bytes > 0
for: 5m
labels:
severity: page
annotations:
summary: Redis out of system memory (instance {{ $labels.instance }})
description: "Redis 系统内存使用率超过 90%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisOutOfConfiguredMaxmemory
expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 90
for: 5m
labels:
severity: page
annotations:
summary: Redis out of configured maxmemory (instance {{ $labels.instance }})
description: "Redis 配置的最大内存使用率超过 90%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisTooManyConnections
expr: redis_connected_clients > 1000
for: 5m
labels:
severity: warning
annotations:
summary: Redis too many connections (instance {{ $labels.instance }})
description: "Redis 实例的连接数过多\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisNotEnoughConnections
expr: redis_connected_clients < 1
for: 5m
labels:
severity: warning
annotations:
summary: Redis not enough connections (instance {{ $labels.instance }})
description: "Redis 实例的连接数过少,应大于 5\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisRejectedConnections
expr: increase(redis_rejected_connections_total[1m]) > 0
for: 5m
labels:
severity: page
annotations:
summary: Redis rejected connections (instance {{ $labels.instance }})
description: "Redis 拒绝了一些连接请求\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- 可以看到已被成功监控
image.png
- 告警规则也有了
image.png
3. grafana导入dashboard
https://grafana.com/grafana/dashboards/18345-redis-overview/
模板ID:18345
image.png