Prometheus+Grafana实现日志监控和告警

接手平台之后总是从用户反馈了解到服务异常,遂打算搞一套简单的报警机制,通过获取入口nginx的日志信息掌握系统状态.
ELK那套有点重,打算后期逐步重构期间和spring cloud相关组件一起补全,所以最后选择了Prometheus+Grafana的方式.

Prometheus可以帮助对收集到的数据进行整合,过滤等处理.
Grafana负责数据的可视化展示和告警信息的发送.
由于Prometheus无法直接获取读取nginx的日志,所以使用了另一个开源工具prometheus-nginxlog-exporter读取nginx日志,并将信息开放,以便Prometheus获取

`prometheus-nginxlog-exporter`安装:

下载rpm包

wget https://github.com/martin-helmich/prometheus-nginxlog-exporter/releases/download/v1.9.2/prometheus-nginxlog-exporter_1.9.2_linux_amd64.rpm

安装rpm

yum localinstall prometheus-nginxlog-exporter_1.10.0_linux_amd64.rpm

增加nginx日志. 这是为了让prometheus获取的日志更容易,也更独立.
nginx.conf :

#add prometheus  log_format  
log_format  prometheus '[$time_local] $request_method "$request" '
                          '$body_bytes_sent $status $request_time $upstream_response_time';
......
 server {
    ......
    # add prometheus access_log  
    access_log  /var/log/nginx/access.log.prometheus prometheus;
    ......
}

编写配置文件prometheus.hcl

listen {
  port = ****
  metrics_endpoint = "/metrics"
}


namespace "yun" {
  format = "[$time_local] $request_method \"$request\" $body_bytes_sent $status $request_time $upstream_response_time"
  source {
    files = [
      "/var/log/nginx/access.log.prometheus"
    ]
  }

  labels {
    app = "yun-platform"
    environment = "pro"
    foo = "bar"
  }

#增加server标签,用来区分不同的后端服务.
  relabel "server" {
    from = "request"
    split = 2
    separator = "/"
    match "(NiuTrans.*)" {
      replacement = "$1"
    }
  }
}

启动prometheus-nginxlog-exporter

nohup prometheus-nginxlog-exporter -config-file prometheus.hcl &

此时直接在浏览器访问配置的地址,就可以获取到暴露的信息了:

image.png

`Prometheus`安装

去官网下载匹配的安装包:
https://prometheus.io/download/#prometheus
修改配置文件prometheus.yml

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["ip:port"]

这里配置的["ip:port"]就是上一步prometheus-nginxlog-exporter暴露的.

启动

nohup ./prometheus --config.file=prometheus.yml &

访问默认的9090端口就能看到Prometheus已经启动了

image.png

在首页的输入框中就可以搜索数据里

image.png

具体的检索语法可以查看官网的文档,这里查询的是过去5分钟的Error率

`Grafana`安装

Grafana的安装直接使用更熟悉的docker-compose了
在准备好docker和docker-compose之后:

创建 Grafana文件夹

mkdir grafana

编辑配置文件 docker-compose.yaml

version: '3.8'
services:
  grafana:
    image: grafana/grafana-enterprise
    container_name: grafana
    restart: unless-stopped
    user: '0'
    ports:
      - '3000:3000'
    volumes:
      - '$PWD/data:/var/lib/grafana'

启动

docker-compose up -d

访问3000端口就能进入grafana了, 默认的密码是 admin 登陆后就可以修改为自己的了

image.png

点击加号,选择添加Dashboard
选择Add a new panel,即可进入编辑页面
在Query模块输入自己需要的查询条件

image.png

在Alert模块配置报警的信息

image.png

这样就能收到告警了.