Prometheus 监控Redis

1. 部署redis-exporter进行指标抓取

vim redis-exporter.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-exporter
  namespace: monitoring
  labels:
    app: redis-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-exporter
  template:
    metadata:
      labels:
        app: redis-exporter
    spec:
      containers:
        - name: redis-exporter
          image: bitnami/redis-exporter:latest
          ports:
            - containerPort: 9121
          env:
            - name: REDIS_ADDR
              value: "redis-service.redis.svc.cluster.local:6379"   #redis地址
            - name: REDIS_PASSWORD
              value: "passwprd"  #直接在这里设置 Redis 密码
          resources:
            limits:
              memory: "64Mi"
              cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
  name: redis-exporter
  namespace: monitoring
  labels:
    app: redis-exporter
spec:
  ports:
    - port: 9121
      targetPort: 9121
      name: metrics
      protocol: TCP
  selector:
    app: redis-exporter
  • 测试抓取数据:
    下面ip为部署完成的redis-exporter的svc
curl http://10.104.13.103:9121/metrics

2. 配置ServiceMonitor和prometheusrules

vim redis-ServiceMonitor-prometheusrules.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: redis-exporter
  namespace: monitoring
  labels:
    release: prometheus  # 确保这个标签与 Prometheus 的配置匹配
spec:
  selector:
    matchLabels:
      app: redis-exporter  # 必须与 redis-exporter 服务的标签匹配
  namespaceSelector:
    matchNames:
      - monitoring  # redis-exporter 所在的命名空间
  endpoints:
    - port: metrics  # redis-exporter 的服务端口
      interval: 15s
      path: /metrics
      scheme: http
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: redis-rule
  namespace: monitoring
spec:
  groups:
    - name: redis-exporter
      rules:
        - alert: RedisDown
          expr: redis_up == 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: Redis down (instance {{ $labels.instance }})
            description: "Redis 实例已停止运行\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: RedisDisconnectedSlaves
          expr: count without (instance, job) (redis_connected_slaves) - sum without (instance, job) (redis_connected_slaves) - 1 > 1
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: Redis disconnected slaves (instance {{ $labels.instance }})
            description: "Redis 没有与所有从节点保持复制。请检查 Redis 的复制状态。\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: RedisReplicationBroken
          expr: delta(redis_connected_slaves[1m]) < 0
          for: 10m
          labels:
            severity: page
          annotations:
            summary: Redis replication broken (instance {{ $labels.instance }})
            description: "Redis 实例丢失了一个从节点\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: RedisClusterFlapping
          expr: changes(redis_connected_slaves[1m]) > 1
          for: 5m
          labels:
            severity: page
          annotations:
            summary: Redis cluster flapping (instance {{ $labels.instance }})
            description: "Redis 副本连接中检测到变动。这可能是由于从节点与主节点的连接丢失后重新连接(即连接抖动)。\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: RedisMissingBackup
          expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: Redis missing backup (instance {{ $labels.instance }})
            description: "Redis 超过 24 小时没有备份\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: RedisOutOfSystemMemory
          expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90 and on(instance) redis_memory_max_bytes > 0
          for: 5m
          labels:
            severity: page
          annotations:
            summary: Redis out of system memory (instance {{ $labels.instance }})
            description: "Redis 系统内存使用率超过 90%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: RedisOutOfConfiguredMaxmemory
          expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 90
          for: 5m
          labels:
            severity: page
          annotations:
            summary: Redis out of configured maxmemory (instance {{ $labels.instance }})
            description: "Redis 配置的最大内存使用率超过 90%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: RedisTooManyConnections
          expr: redis_connected_clients > 1000
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: Redis too many connections (instance {{ $labels.instance }})
            description: "Redis 实例的连接数过多\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: RedisNotEnoughConnections
          expr: redis_connected_clients < 1
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: Redis not enough connections (instance {{ $labels.instance }})
            description: "Redis 实例的连接数过少,应大于 5\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: RedisRejectedConnections
          expr: increase(redis_rejected_connections_total[1m]) > 0
          for: 5m
          labels:
            severity: page
          annotations:
            summary: Redis rejected connections (instance {{ $labels.instance }})
            description: "Redis 拒绝了一些连接请求\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  • 可以看到已被成功监控
image.png
  • 告警规则也有了
image.png

3. grafana导入dashboard

https://grafana.com/grafana/dashboards/18345-redis-overview/
模板ID:18345

image.png
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容