接手平台之后总是从用户反馈了解到服务异常,遂打算搞一套简单的报警机制,通过获取入口nginx的日志信息掌握系统状态.
ELK那套有点重,打算后期逐步重构期间和spring cloud相关组件一起补全,所以最后选择了Prometheus+Grafana的方式.
-
Prometheus
可以帮助对收集到的数据进行整合,过滤等处理. -
Grafana
负责数据的可视化展示和告警信息的发送.
由于Prometheus
无法直接获取读取nginx的日志,所以使用了另一个开源工具prometheus-nginxlog-exporter
读取nginx日志,并将信息开放,以便Prometheus
获取
prometheus-nginxlog-exporter
安装:
- 下载rpm包
wget https://github.com/martin-helmich/prometheus-nginxlog-exporter/releases/download/v1.9.2/prometheus-nginxlog-exporter_1.9.2_linux_amd64.rpm
- 安装rpm
yum localinstall prometheus-nginxlog-exporter_1.10.0_linux_amd64.rpm
- 增加nginx日志. 这是为了让prometheus获取的日志更容易,也更独立.
nginx.conf :
#add prometheus log_format
log_format prometheus '[$time_local] $request_method "$request" '
'$body_bytes_sent $status $request_time $upstream_response_time';
......
server {
......
# add prometheus access_log
access_log /var/log/nginx/access.log.prometheus prometheus;
......
}
- 编写配置文件
prometheus.hcl
listen {
port = ****
metrics_endpoint = "/metrics"
}
namespace "yun" {
format = "[$time_local] $request_method \"$request\" $body_bytes_sent $status $request_time $upstream_response_time"
source {
files = [
"/var/log/nginx/access.log.prometheus"
]
}
labels {
app = "yun-platform"
environment = "pro"
foo = "bar"
}
#增加server标签,用来区分不同的后端服务.
relabel "server" {
from = "request"
split = 2
separator = "/"
match "(NiuTrans.*)" {
replacement = "$1"
}
}
}
- 启动
prometheus-nginxlog-exporter
nohup prometheus-nginxlog-exporter -config-file prometheus.hcl &
此时直接在浏览器访问配置的地址,就可以获取到暴露的信息了:
Prometheus
安装
- 去官网下载匹配的安装包:
https://prometheus.io/download/#prometheus - 修改配置文件
prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["ip:port"]
这里配置的["ip:port"]就是上一步prometheus-nginxlog-exporter
暴露的.
- 启动
nohup ./prometheus --config.file=prometheus.yml &
访问默认的9090端口就能看到Prometheus
已经启动了
在首页的输入框中就可以搜索数据里
具体的检索语法可以查看官网的文档,这里查询的是过去5分钟的Error率
Grafana
安装
Grafana
的安装直接使用更熟悉的docker-compose了
在准备好docker和docker-compose之后:
- 创建
Grafana
文件夹
mkdir grafana
- 编辑配置文件 docker-compose.yaml
version: '3.8'
services:
grafana:
image: grafana/grafana-enterprise
container_name: grafana
restart: unless-stopped
user: '0'
ports:
- '3000:3000'
volumes:
- '$PWD/data:/var/lib/grafana'
- 启动
docker-compose up -d
访问3000端口就能进入grafana
了, 默认的密码是 admin 登陆后就可以修改为自己的了
点击
加号
,选择添加Dashboard选择Add a new panel,即可进入编辑页面
在Query模块输入自己需要的查询条件
在Alert模块配置报警的信息
这样就能收到告警了.