1 实现基于prometheus联邦收集node的指标数据
1.1 部署prometheus server、联邦节点方法相同
prometheus server:172.16.111.118
prometheus 联邦节点1:172.16.111.210
prometheus 联邦节点2:172.16.111.49
1.1.1下载安装
下载地址:
https://github.com/prometheus/prometheus/releases
cd /usr/local/src
wget https://github.com/prometheus/prometheus/releases/download/v3.0.1/prometheus-3.0.1.linux-amd64.tar.gz
tar -xvf prometheus-3.0.1.linux-amd64.tar.gz
mkdir /opt/soft
mv prometheus-3.0.1.linux-amd64 /opt/soft/prometheus
1.1.2 配置prometheus系统服务
cat >>/usr/lib/systemd/system/prometheus.service <<EOF
[Unit]
Description="prometheus"
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
ExecStart=/opt/soft/prometheus/prometheus --config.file=/opt/soft/prometheus/prometheus.yml --storage.tsdb.path=/opt/soft/prometheus/data --web.enable-lifecycle --enable-feature=remote-write-receiver --query.lookback-delta=2m --web.enable-admin-api
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=prometheus
[Install]
WantedBy=muti-user.target
EOF
1.1.3 启动服务
systemctl daemon-reload
systemctl enable prometheus.service
systemctl start prometheus.service
1.2 部署node_exporter
部署服务器:172.16.111.210、172.16.111.49
1.2.1 下载安装
下载二进制包:
下载地址:https://github.com/prometheus/node_exporter/releases
mkdir /opt/soft -p
cd /usr/local/src
wget https://github.com/prometheus/node_exporter/releases/download/v1.4.0/node_exporter-1.4.0.linux-amd64.tar.gz
tar -xvf node_exporter-1.4.0.linux-amd64.tar.gz
mv node_exporter-1.4.0.linux-amd64 /opt/soft/node_exporter
1.2.2 配置系统服务
cat >>/usr/lib/systemd/system/node_exporter.service << EOF
[Unit]
Description=node_exporter
After=network.target
[Service]
Type=simple
User=root
ExecStart=/opt/soft/node_exporter/node_exporter --web.listen-address=:9100
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
1.2.3 启动node_exporter服务
systemctl daemon-reload
systemctl enable node-exporter.service
systemctl start node-exporter.service
1.2.4 验证启动状态
systemctl status node-exporter.service
netstat -ntlp|grep 9100
1.2.5 查看node-exporter指标数据
地址:http://172.16.111.210:9100/metrics
1.2.6 常见指标说明
node_boot_time 系统自启动以后的总运行时间
node_cpu 系统CPU使用量
node_disk* 磁盘IO
node_filesystem* 系统文件使用量
node_load1 系统CPU负载
node_memory* 内存使用量
node_network* 网络带宽指标
go_* node exporter中go相关指标
process_* node exporter自身进程相关运行指标
1.3 联邦节点配置监控
联邦节点1 监控node1
vim /opt/soft/prometheus/prometheus.yml
........
- job_name: "上海城建供应链金融综合服务平台"
static_configs:
- targets: ["172.16.111.210:9100","172.16.111.118:9100"]
labels:
zone: "sz"
project: "shcj"
联邦节点2 监控node2
vim /opt/soft/prometheus/prometheus.yml
..............
- job_name: "瑞轩供应链智汇审单"
static_configs:
- targets: ["172.16.111.49:9100"]
labels:
zone: "sz"
project: "rxaidoc"
验证
1.4 server采集联邦节点
数据采集配置
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "上海城建供应链金融综合服务平台"
scrape_interval: 10s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
- '{__name__=~"node.*"}'
static_configs:
- targets:
- '172.16.111.210:9090'
- job_name: "瑞轩供应链智汇审单"
scrape_interval: 10s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
- '{__name__=~"node.*"}'
static_configs:
- targets:
- '172.16.111.49:9090'
## k8s集群prometheus
- job_name: "晶科能源生产k8s"
scrape_interval: 10s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
- '{__name__=~"node.*"}'
static_configs:
- targets:
- '10.0.0.11:9090'
验证状态
查询指标数据
1.5 二进制部署grafana
下载:https://grafana.com/grafana/download
国内镜像源下载:https://mirrors.tuna.tsinghua.edu.cn/grafana/
安装说明:https://grafana.com/docs/grafana/latest/setup-grafana/installation/
1.5.1 下载安装
cd /usr/local/src
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-11.2.0.linux-amd64.tar.gz
tar xvf grafana-enterprise-11.2.0.linux-amd64.tar.gz
mv grafana-v11.2.0 /opt/soft/grafana
1.5.2 启动服务
cd /opt/soft/grafana
nohup ./bin/grafana-server &> output.log &
echo 'nohup /opt/soft/grafana/bin/grafana-server &> output.log &' >> /etc/rc.local
1.5.3 登陆web页面
访问地址:http://172.16.111.118:3000
默认账号/密码:admin/admin
添加数据源
选择prometheus
设置数据源名称,访问prometheus server的URL地址
1.5.4 导入模板
11074
8919
2 总结prometheus单机存储、实现victoriametrics单机远程存储
2.1 prometheus单机存储
Prometheus有着非常高效的时间序列数据存储方法,每个采样数据仅仅占用3.5byte左右空间,上百万条时间序列,30秒间隔,保留60天,大概200多G空间。
2.1.1 本地存储介绍
默认情况下,prometheus将采集到的数据存储在本地的TSDB数据库中,路径默认为prometheus安装目录的data目录,数据写入过程为先把数据写入wal日志并放在内存,然后2小时后将内存数据保存至一个新的block块,同时再把新采集的数据写入内存并在2小时后再保存至一个新的block 块,以此类推。
2.1.2 block简介
每个block为一个data目录中以01开头的存储目录
2.1.3 block特性
block会压缩、合并历史数据块,已经删除过期的块,随着压缩、合并,block的数量会减少,在压缩过程中会发生三件事:定期执行压缩、合并小的block到大的block、清理过期的块。
每个块有4部分组成:
~# tree /opt/soft/prometheus/data/01JPJ3C67DY3K9HFR4NR5HWFPQ
/opt/soft/prometheus/data/01JPJ3C67DY3K9HFR4NR5HWFPQ
├── chunks
│ └── 000001 #数据目录每个大小为512MB超过会被切分为多个
├── index #索引文件,记录存储的数据的索引信息,通过文件内的几个表来查找时序数据
├── meta.json #block元数据信息,包含了样本数、采集数据的起始时间、压缩历史
└── tombstones #逻辑数据,主要记载删除记录和标记要删除的内客,删除标记,可在查询块时排除样本。
2.1.4 本地存储配置参数
--config.file="prometheus.yml" #指定配置文件
--web.listen-address="0.0.0.0:9090" #指定监听地址
--storage.tsdb.path="data/" #指定数存储目录
--storage.tsdb.retention.size=Bl KB,MB,GB,TB,PB,EB #指定chunk 大小,默认512MB
--storage.tsdb.retention.time= #数据保存时长,默认15天
--query.timeout=2m #最大查询超时时间
-query.max-concurrency=20 #最大查询并发数
--web.read-timeout=5m #最大空闲超时时间
--web.max-connections=512 #最大并发连接数
--web.enable-lifecycle #启用API动态加载配置功能
2.2 victoriaMetrics单机远程存储
2.2.1 下载安装
https://github.com/VictoriaMetrics/VictoriaMetrics/releases
mkdir /opt/soft/victoriaMetrics -p
cd /usr/local/src
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.93.1/victoria-metrics-linux-amd64-v1.93.1.tar.gz
tar xvf victoria-metrics-linux-amd64-v1.93.1.tar.gz
mv victoria-metrics-prod /opt/soft/victoriaMetrics/
2.2.2 配置系统服务
cat >> /usr/lib/systemd/system/victoria-metrics-prod.service << EOF
[Unit]
Description=For Victoria-metrics-prod Service
After=network.target
[Service]
ExecStart=/opt/soft/victoriaMetrics/victoria-metrics-prod -httpListenAddr=0.0.0.0:8428 -storageDataPath=/opt/soft/victoriaMetrics/data -retentionPeriod=3
[Install]
WantedBy=multi-user.target
EOF
服务启动
systemctl enable victoria-metrics-prod.service
systemctl start victoria-metrics-prod.service
systemctl status victoria-metrics-prod.service
参数说明
-httpListenAddr=O.0.0.0:8428 #监听地址及端口
-storageDataPath #VictoriaMetrics将所有数据存储在此目录中,默认为执行启动victoria的当前目录下的victoria-metrics-data目录中。
-retentionPeriod #存储数据的保留,较旧的数据会自动删除,默认保留期为1个月,默认单位为m(月),支持的单位有h (hour), d (day), w (week),y (year)。
2.2.3 访问web页
2.2.4 配置Prometheus
# cat /opt/soft/prometheus/prometheus.yml
.......
# VictoriaMetrics单机远程存储配置
remote_write:
- url: http://172.16.111.118:8428/api/v1/write
2.2.5 验证VictoriaMetrics数据
进入web页面
查询node_load1
2.2.6 grafana设置数据源
数据源设置Victoria地址
2.2.7 grafana模板设置
导入模板8919
3 实现prometheus 基于victoriametrics 集群远程存储
3.1 架构
3.2 组件介绍
3.2.1 vminsert
写入组件(写),vminsert负责接收数据写入,并根据对度量名称及其所有标签的一致hash结果将数据分散写入不同的后端vmstorage节点,vminsert默认端口8480
3.2.2 vmstroage
存储原始数据并返回给定时间范围内给定标签过滤器的查询数据,默认端口8482
3.2.3 vmselect
查询组件(读),连接vmstorage,默认端口8481
3.2.4 其他可选组件
vmagent
是一个很小但功能强大的代理,它可以从node_exporter各种来源收集度量数据,并将它们存储在VictoriaMetrics或任何其他支持远程写入协议的与 prometheus兼容的存储系统中,有替代prometheus server的意向。
vmalert
替代Prometheus server,以VictoriaMetrics为数据源,基于兼容Prometheus的告警规则,判断数据是否异常,并将产生的通知发送给alertmanager
vmgateway
读写VictoriaMetrics数据的代理网关,可实现限速和访问控制等功能,目前为企业组件
vmctl
VictoriaMetrics的命令行工具,目前主要用于将prometheus,opentsdb等数据源的数据迁移到VictoriaMetrics
3.3 下载安装
集群主机
vm1:172.16.111.118
vm2:172.16.111.210
vm3:172.16.111.49
https://github.com/VictoriaMetrics/VictoriaMetrics/releases
https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.93.1/victoria-metrics-linux-amd64-v1.93.1-cluster.tar.gz
mkdir /opt/soft/vmstorage -p
cd /usr/local/src
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.93.1/victoria-metrics-linux-amd64-v1.93.1-cluster.tar.gz
tar xvf victoria-metrics-linux-amd64-v1.93.1-cluster.tar.gz
mv vm* /opt/soft/vmstorage/
3.4 配置系统服务
3.4.1 vmstorage-prod
负责数据的持久化,监控端口:API 8482,数据写入端口:8400,数据读取端口:8401
cat >> /usr/lib/systemd/system/vmstorage.service << EOF
[Unit]
Description=Vmstorage Server
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/tmp
ExecStart=/opt/soft/vmstorage/vmstorage-prod -loggerTimezone=Asia/Shanghai -storageDataPath=/opt/soft/victoriaMetrics-cluster/vmstorage-data -httpListenAddr=:8482 -vminsertAddr=:8400 -vmselectAddr=:8401
[Install]
WantedBy=multi-user.targe
EOF
服务启动
systemctl daemon-reload
systemctl enable vmstorage.service
systemctl start vmstorage.service
主要参数
-httpListenAddr string
用于监听http连接的地址(默认“:8482”)
-vminsertAddr string
接受来自vminsert服务的连接的TCP地址(默认“:8400”)
-vmselectAddr string
用于接受来自vmselect服务的连接的TCP地址(默认“:8401”)
3.4.2 vminsert-prod
接收外部的写请求,默认端口8480
cat >> /usr/lib/systemd/system/vminsert.service << EOF
[Unit]
Description=Vminsert Server
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/tmp
ExecStart=/opt/soft/vmstorage/vminsert-prod -httpListenAddr=:8480 -storageNode=172.16.111.118:8400,172.16.111.210:8400,172.16.111.49:8400
[Install]
WantedBy=multi-user.target
EOF
服务启动
systemctl daemon-reload
systemctl enable vminsert.service
systemctl start vminsert.service
3.4.3 vmselect-prod
负责接收外部的读请求,默认端口8481
cat >> /usr/lib/systemd/system/vmselect.service << EOF
[Unit]
Description=Vmselect Server
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/tmp
ExecStart=/opt/soft/vmstorage/vmselect-prod -httpListenAddr=:8481 -storageNode=172.16.111.118:8401,172.16.111.210:8401,172.16.111.49:8401
[Install]
WantedBy=multi-user.target
EOF
启动服务
systemctl daemon-reload
systemctl enable vmselect.service
systemctl start vmselect.service
服务启动验证
vm1、vm2、vm3三台主机执行
# netstat -ntlp |egrep "8482|8400|8401|8480|8481"
tcp 0 0 0.0.0.0:8400 0.0.0.0:* LISTEN 43738/vmstorage-pro
tcp 0 0 0.0.0.0:8401 0.0.0.0:* LISTEN 43738/vmstorage-pro
tcp 0 0 0.0.0.0:8480 0.0.0.0:* LISTEN 46865/vminsert-prod
tcp 0 0 0.0.0.0:8481 0.0.0.0:* LISTEN 45340/vmselect-prod
tcp 0 0 0.0.0.0:8482 0.0.0.0:* LISTEN 43738/vmstorage-pro
可网页访问测试
http://172.16.111.118:8480/metrics
http://172.16.111.118:8481/metrics
http://172.16.111.118:8482/metrics
http://172.16.111.210:8480/metrics
http://172.16.111.210:8481/metricss
http://172.16.111.210:8482/metrics
http://172.16.111.49:8480/metrics
http://172.16.111.49:8481/metrics
http://172.16.111.49:8482/metrics
3.6 配置prometheus
# cat /opt/soft/prometheus/prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# VictoriaMetrics单机远程存储配置
#remote_write:
# - url: http://172.16.111.118:8428/api/v1/write
## VictoriaMetrics集群远程存储配置
remote_write:
- url: http://172.16.111.118:8480/insert/0/prometheus
- url: http://172.16.111.210:8480/insert/0/prometheus
- url: http://172.16.111.49:8480/insert/0/prometheus
....
3.7 grafana设置数据源
设置集群查询地址
http://172.16.111.118:8481/select/0/prometheus,可配置VIP实现高可用
3.8 grafana导入模板
导入模板8919
3.9 开启数据复制
https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#replication-and-data-safety
默认情况下,数据被vminsert的组件基于hash算法分别将数据持久化到不同的vmstroage节点,可以启用vminsert组件支持的-replicationFactor=N复制功能,将数据分别在各节点保存一份完整的副本以实现数据的高可用。