Prometheus是一个开源的系统监控和报警的工具包,最初由SoundCloud发布。
特点:
- 多维数据模型(有metric名称和键值对确定的时间序列)
- 灵活的查询语言
- 不依赖分布式存储
- 通过pull方式采集时间序列,通过http协议传输
- 支持通过中介网关的push时间序列的方式
- 监控数据通过服务或者静态配置来发现
- 支持图表和dashboard等多种方式
组件:
- Prometheus 主程序,主要是负责存储、抓取、聚合、查询方面。
- Alertmanager 程序,主要是负责实现报警功能。
- Pushgateway 程序,主要是实现接收由Client push过来的指标数据,在指定的时间间隔,由主程序来抓取。
- *_exporter 这类是不同系统已经实现了的集成。
架构:
prometheus部署
1、下载安装包prometheus-2.4.0.linux-amd64.tar.gz
https://github.com/prometheus/prometheus/releases/tag/v2.4.0
2、解压
tar -xvf prometheus-2.4.0.linux-amd64.tar.gz -C /opt/
mv prometheus-2.4.0.linux-amd64 prometheus
3、配置prometheus.yml (默认配置)
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
4、后台nohup 启动
cd /opt/prometheus
nohup ./prometheus -config.file=prometheus.yml & #不利于管理服务进程
5、service脚本启动(/etc/init.d/prometheus)
很多教程都是针对centos7 systemd来写服务启动
#!/bin/bash
#
# Comments to support chkconfig
# chkconfig: 2345 98 02
# description: prometheus service script
#
# Source function library.
. /etc/init.d/functions
### Default variables
prog_name="prometheus"
config_file="/opt/${prog_name}/${prog_name}.yml"
prog_path="/opt/${prog_name}/${prog_name}"
data_path="/opt/${prog_name}/data"
pidfile="/var/run/${prog_name}.pid"
prog_logs="/var/log/${prog_name}.log"
#启动项,监听本地9090端口,支持配置热加载
options="--web.listen-address=localhost:9090 --config.file=${config_file} --web.enable-lifecycle --storage.tsdb.path=${data_path}"
DESC="Prometheus Server"
# Check if requirements are met
[ -x "${prog_path}" ] || exit 1
RETVAL=0
start(){
action $"Starting $DESC..." su -s /bin/sh -c "nohup $prog_path $options >> $prog_logs 2>&1 &" 2> /dev/null
RETVAL=$?
PID=$(pidof ${prog_path})
[ ! -z "${PID}" ] && echo ${PID} > ${pidfile}
echo
[ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog_name
return $RETVAL
}
stop(){
echo -n $"Shutting down $prog_name: "
killproc -p ${pidfile}
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/$prog_name
return $RETVAL
}
restart() {
stop
start
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
status)
status $prog_path
RETVAL=$?
;;
*)
echo $"Usage: $0 {start|stop|restart|status}"
RETVAL=1
esac
exit $RETVAL
#添加可执行权限
chmod +x /etc/init.d/prometheus
#开机自启动
chkconfig prometheus on
6、验证服务是否正常启动
WEB页面访问http://localhost:9090/ ,可以看到Prometheus的graph页面
7、配置热加载
热加载加载方法有两种:
- kill -HUP pid
- curl -X POST http://localhost:9090/-/reload
从 2.0 开始,hot reload 功能是默认关闭的,如需开启,需要在启动 Prometheus 的时候,添加
--web.enable-lifecycle
参数
grafana部署
1、安装rpm包
[https://grafana.com/grafana/download)
2、启动服务
service grafana-server start
3、访问页面http://localhost:3000 默认账号、密码admin/admin
Grafana 配置的内容,之后再贴