Prometheus可是个好东西,云原生时代监控领域的现象级产品,常与Grafana搭配使用,是当前互联网企业的首选监控解决方案。
一、安装Prometheus
安装主要有YAML、Operater两种,先从YAML开始可以更好的理解细节(Operater最终也是生成的yml文件)。需要考虑几个点:
- 访问权限
- 配置文件
- 存储卷
访问权限相关的配置:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs:
- get
- watch
- list
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- watch
- list
- nonResourceURLs: ["/metrics"]
verbs:
- get
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: smac
labels:
app: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: smac
roleRef:
kind: ClusterRole
name: prometheus
apiGroup: rbac.authorization.k8s.io
配置文件configmap
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: smac
labels:
app: prometheus
data:
cpu-usage.rule: |
#因篇幅过长,此处内容忽略
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-conf
namespace: smac
labels:
app: prometheus
data:
prometheus.yml: |-
#因篇幅过长,此处内容忽略
存储卷相关的配置,建议使用StorageClass,官方不建议使用NFS,极端情况会导致数据丢失,配置如下:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: prometheus-pvc
namespace: smac
labels:
app: prometheus
annotations:
volume.beta.kubernetes.io/storage-class: "local"
finalizers:
- kubernetes.io/pvc-protection
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
后面,就是常规的deployment和service的配置:
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
app: prometheus
name: prometheus
namespace: smac
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
securityContext:
runAsUser: 0
containers:
- name: prometheus
image: prom/prometheus:v2.29.1
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /prometheus
name: prometheus-data-volume
- name: prometheus-conf-volume #注意,此处不能用subPath,会导致configmap的热更新失效
mountPath: /etc/prometheus
- name: prometheus-rules-volume
mountPath: /etc/prometheus/rules
ports:
- containerPort: 9090
protocol: TCP
volumes:
- name: prometheus-data-volume
persistentVolumeClaim:
claimName: prometheus-data-pvc
- name: prometheus-conf-volume
configMap:
name: prometheus-conf
- name: prometheus-rules-volume
configMap:
name: prometheus-rules
---
#service
kind: Service
apiVersion: v1
metadata:
annotations:
prometheus.io/scrape: 'true'
labels:
app: prometheus
name: prometheus-service
namespace: smac
spec:
ports:
- port: 9090
targetPort: 9090
selector:
app: prometheus
type: NodePort
二、配置热更新
接下来,我们要在prometheus中添加一个job。修改configmap中的prometheus.yml,增加如下内容:
scrape_configs:
...
- job_name: "demo-service"
metrics_path: "/actuator/prometheus"
static_configs:
- targets: ["10.233.97.135:8080"]
嗯?发现并没有生效诶?难道需要重启?有没有热更新的方式?
于是一通搜索,得到以下结论:
Prometheus支持热更新,在启动时通过参数--web.enable-lifecycle开启,之后通过 curl -X POST http://localhost:9090/-/reload 即可更新。
于是调整配置如下:
containers:
- name: prometheus
args:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.enable-lifecycle'
调整之后进行了重新部署,然后修改configmap的内容,按照上面的命令执行就可以了。但是,手动更新依然很麻烦,能不能自动更细呢?于是,又一通搜索,发现了一款神器:configmap-reload
,于是赶紧配置上:
containers:
- name: prometheus
image: prom/prometheus:v2.29.1
args:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.enable-lifecycle'
...
- name: prometheus-configmap-reloader
image: 'jimmidyson/configmap-reload:v0.3.0'
args:
- '--webhook-url=http://localhost:9090/-/reload'
- '--volume-dir=/etc/prometheus' #此处的volume-dir应该与volumes的定义完全一致
volumeMounts:
#注意,此处不能用subPath,会导致configmap的热更新失效
- name: prometheus-conf-volume
mountPath: /etc/prometheus
调整之后,发现容器组里多了一个prometheus-configmap-reloader的pod。此时再尝试,修改configmap后过一小会儿(大概10s,不要问我为什么,你懂的),新增的配置项生效了,我们从Targets中发现了‘demo-service’。
注意:configmap如果使用subPath进行挂载,将无法自动更新。
三、Job的服务发现
上面我们添加一个job,targets中是指定的ip:port,在k8s中显然是不实用的,我们必须要实现动态获取target。好在prometheus已经支持了该功能,原理是通过apiserver轮询pod信息。通过kubernetes_sd_configs,可以实现各种资源的服务发现,下面是pod的配置示例:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
这个是官方例子,具体label的清单以及含义,请参照官网:Prometheus#Configuration#kubernetes_sd_config
我的实战配置如下:
scrape_configs:
- job_name: "demo-service"
metrics_path: "/actuator/prometheus"
static_configs:
- targets: ["demo-service:8080"]
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: demo-service
action: keep
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: application
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
此配置的效果是从所有pod中寻找label:app=demo-serivce的pod,将pod的地址和端口替换上面配置中targets的内容,并添加一个application=demo-service的label。
至此,基本满足平时的使用了,再往后就是高可用HA、第三方存储、PromQL的实战等高级内容,敬请期待~!