各种大神的分享,就不赘述了,刚学习Prometheus 没多久,这里另辟蹊径简单分享下如何排查问题
大佬请忽略
issue 1
Grafana 可以采集到数据,为何监控不显示(spring boot 监控)
问题描述:
我们公司需要监控5个服务
http://ip:9090/targets
但是监控只显示了2个
排查过程
- 可以确认的是,至少有2个还是可以的,公司服务spring boot 配置的 io.micrometer 都是一样的,不存在依赖包的问题
-
Grafana 五个服务都可以采集到的,先看看具体的采集数据
直接在浏览器访问
(http://ip:port/actuator/prometheus)
找一个成功的,一个不成功的
使用文本编译器打开
成功的:
# TYPE jvm_gc_max_data_size_bytes gauge
# HELP jvm_gc_max_data_size_bytes Max size of long-lived heap memory pool
jvm_gc_max_data_size_bytes 1.71966464E9
# TYPE jvm_classes_unloaded_classes counter
# HELP jvm_classes_unloaded_classes The total number of classes unloaded since the Java virtual machine has started execution
jvm_classes_unloaded_classes_total 200.0
# TYPE jvm_buffer_count_buffers gauge
# HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool
jvm_buffer_count_buffers{id="direct"} 12.0
jvm_buffer_count_buffers{id="mapped"} 0.0
# TYPE log4j2_events counter
# HELP log4j2_events Number of fatal level log events
log4j2_events_total{level="warn"} 686.0
log4j2_events_total{level="debug"} 0.0
log4j2_events_total{level="error"} 59010.0
log4j2_events_total{level="trace"} 0.0
log4j2_events_total{level="fatal"} 0.0
log4j2_events_total{level="info"} 3.2804589E7
# TYPE jvm_memory_committed_bytes gauge
# HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use
jvm_memory_committed_bytes{area="heap",id="G1 Survivor Space"} 7340032.0
jvm_memory_committed_bytes{area="heap",id="G1 Old Gen"} 8.76609536E8
jvm_memory_committed_bytes{area="nonheap",id="Metaspace"} 1.55648E8
jvm_memory_committed_bytes{area="heap",id="G1 Eden Space"} 8.35715072E8
jvm_memory_committed_bytes{area="nonheap",id="Code Cache"} 1.26222336E8
jvm_memory_committed_bytes{area="nonheap",id="Compressed Class Space"} 1.933312E7
# TYPE system_cpu_count gauge
# HELP system_cpu_count The number of processors available to the Java virtual machine
system_cpu_count 48.0
不成功的:
# TYPE grpc_client_requests_sent_messages counter
# HELP grpc_client_requests_sent_messages The total number of requests sent
grpc_client_requests_sent_messages_total{method="",methodType="",service="com"} 6793741.0
# TYPE zipkin_reporter_spans counter
# HELP zipkin_reporter_spans Spans reported
zipkin_reporter_spans_total 312568.0
# TYPE zipkin_reporter_messages_bytes counter
# HELP zipkin_reporter_messages_bytes Total bytes of messages reported
zipkin_reporter_messages_bytes_total 2.07240049E8
# TYPE grpc_server_requests_received_messages counter
# HELP grpc_server_requests_received_messages The total number of requests received
grpc_server_requests_received_messages_total{method="",methodType="SERVER_STREAMING",service=""} 0.0
grpc_server_requests_received_messages_total{method="add",methodType="UNARY",service=""} 0.0
备注:代码涉及到公司的都删了
不成功的,没有jvm,Grafana 解析采集数据的逻辑我也不懂,学成了再详细分析
-
看jvm_memory_used_bytes
地址:
http://ip:9090/graph
输入jvm_memory_used_bytes
可以看到 jvm 相关的,只有一对,两个服务的,验证了,其他三个都没 jvm_memory_used_bytes 相关的数据 到此可以推断出
- spring boot 是有数据的
- 数据缺失jvm_memory_used_bytes
- 查看spring boot 代码,去pom里面找找线索
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
这部分没问题,再看配置:
management:
# https://docs.spring.io/spring-boot/docs/current/reference/html/actuator.html
endpoints:
web:
exposure:
include: '*'
endpoint:
info:
enabled: true
metrics:
enabled: true
prometheus:
enabled: true
health:
show-details: always
看着也没问题
整个 application.yml 看下来,最近一段时间的改变就是加了 grpc
grpc:
server:
port:
security:
enabled: false
client:
看http://ip:port/actuator/metrics 成功的的
{"names":["http.server.requests","jvm.buffer.count","jvm.buffer.memory.used","jvm.buffer.total.capacity","jvm.classes.loaded","jvm.classes.unloaded","jvm.gc.live.data.size","jvm.gc.max.data.size","jvm.gc.memory.allocated","jvm.gc.memory.promoted","jvm.gc.pause","jvm.memory.committed","jvm.memory.max","jvm.memory.used","jvm.threads.daemon","jvm.threads.live","jvm.threads.peak","jvm.threads.states","log4j2.events","process.cpu.usage","process.files.max","process.files.open","process.start.time","process.uptime","spring.data.repository.invocations","system.cpu.count","system.cpu.usage","system.load.average.1m","tomcat.sessions.active.current","tomcat.sessions.active.max","tomcat.sessions.alive.max","tomcat.sessions.created","tomcat.sessions.expired","tomcat.sessions.rejected"]}
看http://ip:port/actuator/metrics 不成功的的
{"names":["grpc.client.processing.duration","grpc.client.requests.sent","grpc.client.responses.received","grpc.server.processing.duration","grpc.server.requests.received","grpc.server.responses.sent","http.server.requests","spring.data.repository.invocations","tomcat.sessions.active.current","tomcat.sessions.active.max","tomcat.sessions.alive.max","tomcat.sessions.created","tomcat.sessions.expired","tomcat.sessions.rejected","zipkin.reporter.messages","zipkin.reporter.messages.total","zipkin.reporter.queue.bytes","zipkin.reporter.queue.spans","zipkin.reporter.spans","zipkin.reporter.spans.dropped","zipkin.reporter.spans.total"]}
感觉是spring boot 提供的数据受到了 grpc 的影响,具体原因,研究中
后续研究明白更新
- 不成功的,研究下来,有两个配置了 grpc,还有一个没有使用grpc ,采集的数据也没有jvm
application.yml 看下来,prometheus 的配置有问题,具体啥问题就不分享了,如果有同学也遇到配置问题,建议照着其他大佬分享的配置,重新配置下就好
这里分享个厉害的:
Spring Boot (十九):使用 Spring Boot Actuator 监控应用
https://blog.csdn.net/ityouknow/article/details/102693719
可以通过这些接口排查问题
本次分享仅为新手(包括我)提供一下排查思路,抛石子引砖