你做得这个SpringBoot监控，看起来真炫酷！（下篇）

使用场景：

根据个人经验和实践，总结如下：

有自然(物理)上界的浮动值的监测，例如物理内存、集合、映射、数值等。

有逻辑上界的浮动值的监测，例如积压的消息、(线程池中)积压的任务等，其实本质也是集合或者映射的监测。

举个相对实际的例子，假设我们需要对登录后的用户发送一条短信或者推送，做法是消息先投放到一个阻塞队列，再由一个线程消费消息进行其他操作：

publicclassGaugeMain{privatestaticfinalMeterRegistry MR =newSimpleMeterRegistry();privatestaticfinalBlockingQueue QUEUE =newArrayBlockingQueue<>(500);privatestaticBlockingQueue REAL_QUEUE;static{REAL_QUEUE = MR.gauge("messageGauge", QUEUE, Collection::size); }publicstaticvoidmain(String[] args)throwsException{ consume();Message message =newMessage();message.setUserId(1L);message.setContent("content"); REAL_QUEUE.put(message); }privatestaticvoidconsume()throwsException{newThread(() -> {while(true) {try{ Message message = REAL_QUEUE.take();//handle message System.out.println(message);}catch(InterruptedException e) {//no-op } } }).start(); }}

上面的例子代码写得比较糟糕，只为了演示相关使用方式，切勿用于生产环境。

# TimeGauge

TimeGauge是Gauge的特化类型，相比Gauge，它的构建器中多了一个TimeUnit类型的参数，用于指定ToDoubleFunction入参的基础时间单位。这里简单举个使用例子：

publicclassTimeGaugeMain{privatestaticfinalSimpleMeterRegistryR= newSimpleMeterRegistry();publicstaticvoid main(String[] args)throwsException{AtomicIntegercount= newAtomicInteger();TimeGauge.Builder timeGauge =TimeGauge.builder("timeGauge",count,TimeUnit.SECONDS,AtomicInteger::get);timeGauge.register(R);count.addAndGet(10086);print();count.set(1);print(); }privatestaticvoidprint()throwsException{Search.in(R).meters().forEach(each -> {StringBuilderbuilder = newStringBuilder();builder.append("name:") .append(each.getId().getName()).append(",tags:") .append(each.getId().getTags()).append(",type:").append(each.getId().getType()).append(",value:").append(each.measure());System.out.println(builder.toString()); }); } }//输出name:timeGauge,tags:[],type:GAUGE,value:[Measurement{statistic='VALUE', value=10086.0}]name:timeGauge,tags:[],type:GAUGE,value:[Measurement{statistic='VALUE', value=1.0}]

# DistributionSummary

Summary(摘要)主要用于跟踪事件的分布，在Micrometer中，对应的类是DistributionSummary(分发摘要)。它的使用方式和Timer十分相似，但是它的记录值并不依赖于时间单位。

常见的使用场景：使用DistributionSummary测量命中服务器的请求的有效负载大小。使用MeterRegistry创建DistributionSummary实例如下：

DistributionSummarysummary = registry.summary("response.size");

通过建造器流式创建如下：

DistributionSummary summary = DistributionSummary.builder("response.size").description("a description of what this summary does")// 可选.baseUnit("bytes")// 可选.tags("region","test")// 可选.scale(100)// 可选.register(registry);

DistributionSummary中有很多构建参数跟缩放和直方图的表示相关，见下一节。

使用场景：

根据个人经验和实践，总结如下：

1、不依赖于时间单位的记录值的测量，例如服务器有效负载值，缓存的命中率等。

举个相对具体的例子：

publicclassDistributionSummaryMain {privatestaticfinal DistributionSummary DS = DistributionSummary.builder("cacheHitPercent").register(newSimpleMeterRegistry());privatestaticfinal LoadingCache CACHE = CacheBuilder.newBuilder().maximumSize(1000) .recordStats().expireAfterWrite(60, TimeUnit.SECONDS).build(newCacheLoader() {@OverridepublicStringload(Strings) throws Exception {returnselectFromDatabase(); } });publicstaticvoidmain(String[] args) throws Exception{Stringkey ="doge";Stringvalue = CACHE.get(key); record(); }privatestaticvoidrecord()throws Exception{ CacheStats stats = CACHE.stats();BigDecimal hitCount =newBigDecimal(stats.hitCount());BigDecimal requestCount =newBigDecimal(stats.requestCount());DS.record(hitCount.divide(requestCount,2,BigDecimal.ROUND_HALF_DOWN).doubleValue()); }}

直方图和百分数配置

直方图和百分数配置适用于Summary和Timer，这部分相对复杂，等研究透了再补充。

# 基于SpirngBoot、Prometheus、Grafana集成

集成了Micrometer框架的JVM应用使用到Micrometer的API收集的度量数据位于内存之中，因此，需要额外的存储系统去存储这些度量数据，需要有监控系统负责统一收集和处理这些数据，还需要有一些UI工具去展示数据，一般大佬只喜欢看炫酷的图表或者动画。

常见的存储系统就是时序数据库，主流的有Influx、Datadog等。比较主流的监控系统(主要是用于数据收集和处理)就是Prometheus(一般叫普罗米修斯，下面就这样叫吧)。而展示的UI目前相对用得比较多的就是Grafana。

另外，Prometheus已经内置了一个时序数据库的实现，因此，在做一套相对完善的度量数据监控的系统只需要依赖目标JVM应用，Prometheus组件和Grafana组件即可。下面花一点时间从零开始搭建一个这样的系统，使用CentOS7。

# SpirngBoot中使用Micrometer

SpringBoot中的spring-boot-starter-actuator依赖已经集成了对Micrometer的支持，其中的metrics端点的很多功能就是通过Micrometer实现的，prometheus端点默认也是开启支持的，实际上actuator依赖的spring-boot-actuator-autoconfigure中集成了对很多框架的开箱即用的API。

其中prometheus包中集成了对Prometheus的支持，使得使用了actuator可以轻易地让项目暴露出prometheus端点，作为Prometheus收集数据的客户端，Prometheus(服务端软件)可以通过此端点收集应用中Micrometer的度量数据。

我们先引入spring-boot-starter-actuator和spring-boot-starter-web，实现一个Counter和Timer作为示例。依赖：

org.springframework.bootspring-boot-dependencies2.1.0.RELEASEpomimportorg.springframework.bootspring-boot-starter-weborg.springframework.bootspring-boot-starter-actuatororg.springframework.bootspring-boot-starter-aoporg.projectlomboklombok1.16.22io.micrometermicrometer-registry-prometheus1.1.0

接着编写一个下单接口和一个消息发送模块，模拟用户下单之后向用户发送消息：

//实体@DatapublicclassMessage{privateString orderId;privateLong userId;privateString content; }@DatapublicclassOrder{privateString orderId;privateLong userId;privateInteger amount;privateLocalDateTime createTime; }//控制器和服务类@RestControllerpublicclassOrderController{@AutowiredprivateOrderService orderService;@PostMapping(value ="/order")publicResponseEntitycreateOrder(@RequestBody Order order){returnResponseEntity.ok(orderService.createOrder(order)); } }@Slf4j@ServicepublicclassOrderService{privatestaticfinalRandom R =newRandom();@AutowiredprivateMessageService messageService;publicBooleancreateOrder(Order order){//模拟下单try{intms = R.nextInt(50) +50; TimeUnit.MILLISECONDS.sleep(ms);log.info("保存订单模拟耗时{}毫秒...", ms);}catch(Exception e) {//no-op }//记录下单总数Metrics.counter("order.count","order.channel", order.getChannel()).increment();//发送消息Message message =newMessage();message.setContent("模拟短信..."); message.setOrderId(order.getOrderId()); message.setUserId(order.getUserId()); messageService.sendMessage(message);returntrue; } }@Slf4j@ServicepublicclassMessageServiceimplementsInitializingBean{privatestaticfinalBlockingQueue QUEUE =newArrayBlockingQueue<>(500);privatestaticBlockingQueue REAL_QUEUE;privatestaticfinalExecutor EXECUTOR = Executors.newSingleThreadExecutor();privatestaticfinalRandom R =newRandom();static{REAL_QUEUE = Metrics.gauge("message.gauge", Tags.of("message.gauge","message.queue.size"), QUEUE, Collection::size); }publicvoidsendMessage(Message message){try{ REAL_QUEUE.put(message);}catch(InterruptedException e) {//no-op } }@OverridepublicvoidafterPropertiesSet()throwsException{ EXECUTOR.execute(() -> {while(true) {try{ Message message = REAL_QUEUE.take();log.info("模拟发送短信,orderId:{},userId:{},内容:{},耗时:{}毫秒", message.getOrderId(), message.getUserId(),message.getContent(), R.nextInt(50));}catch(Exception e) {thrownewIllegalStateException(e); } } }); } }//切面类@Component@AspectpublicclassTimerAspect{@Around(value ="execution(* club.throwable.smp.service.*Service.*(..))")publicObjectaround(ProceedingJoinPoint joinPoint)throwsThrowable{ Signature signature = joinPoint.getSignature(); MethodSignature methodSignature = (MethodSignature) signature; Method method = methodSignature.getMethod();Timer timer = Metrics.timer("method.cost.time","method.name", method.getName());ThrowableHolder holder =newThrowableHolder(); Object result = timer.recordCallable(() -> {try{returnjoinPoint.proceed();}catch(Throwable e) { holder.throwable = e; }returnnull; });if(null!= holder.throwable) {throwholder.throwable; }returnresult; }privateclassThrowableHolder{ Throwable throwable; }}

yaml的配置如下：

server:port:9091management:server:port:10091endpoints:web:exposure:include:'*'base-path:/management

注意多看spring官方文档关于Actuator的详细描述，在SpringBoot-2.x之后，配置Web端点暴露的权限控制和1.x有很大的不同。

总结一下就是：除了shutdown端点之外，其他端点默认都是开启支持的这里仅仅是开启支持，并不是暴露为Web端点，端点必须暴露为Web端点才能被访问，禁用或者开启端点支持的配置方式如下：

management.endpoint.${端点ID}.enabled=true/false可以查

可以查看actuator-api文档查看所有支持的端点的特性，这个是2.1.0.RELEASE版本的官方文档，不知道日后链接会不会挂掉。端点只开启支持，但是不暴露为Web端点，是无法通过http://{host}:{management.port}/{management.endpoints.web.base-path}/{endpointId}访问的。

暴露监控端点为Web端点的配置是：

management.endpoints.web.exposure.include=info,healthmanagement.endpoints.web.exposure.exclude=prometheus

management.endpoints.web.exposure.exclude用于指定不暴露为Web端点的监控端点，指定多个的时候用英文逗号分隔management.endpoints.web.exposure.include默认指定的只有info和health两个端点，我们可以直接指定暴露所有的端点：management.endpoints.web.exposure.include=*，如果采用YAML配置，记得要加单引号’‘。暴露所有Web监控端点是一件比较危险的事情，如果需要在生产环境这样做，请务必先确认http://{host}:{management.port}不能通过公网访问(也就是监控端点访问的端口只能通过内网访问，这样可以方便后面说到的Prometheus服务端通过此端口收集数据)。

# Prometheus的安装和配置

Prometheus目前的最新版本是2.5，鉴于笔者没深入玩过Docker，这里还是直接下载它的压缩包解压安装。

wgethttps://github.com/prometheus/prometheus/releases/download/v2.5.0/prometheus-2.5.0.linux-amd64.tar.gztar xvfz prometheus-*.tar.gzcd prometheus-*

先编辑解压出来的目录下的prometheus配置文件prometheus.yml，主要修改scrape_configs节点的属性：

scrape_configs:# The job name is addedasa label`job=<job_name>`toanytimeseries scrapedfromthisconfig.- job_name:'prometheus'# metrics_path defaults to'/metrics'# scheme defaults to'http'. # 这里配置需要拉取度量信息的URL路径，这里选择应用程序的prometheus端点metrics_path:/management/prometheus static_configs: # 这里配置host和port- targets: ['localhost:10091']

配置拉取度量数据的路径为localhost:10091/management/metrics，此前记得把前一节提到的应用在虚拟机中启动。接着启动Prometheus应用：

# 参数 --storage.tsdb.path=存储数据的路径，默认路径为./data./prometheus--config.file=prometheus.yml

Prometheus引用的默认启动端口是9090，启动成功后，日志如下：

此时，访问ttp://${虚拟机host}:9090/targets就能看到当前Prometheus中执行的Job

访问ttp://${虚拟机host}:9090/graph以查找到我们定义的度量Meter和spring-boot-starter-actuator中已经定义好的一些关于JVM或者Tomcat的度量Meter。

我们先对应用的/order接口进行调用，然后查看一下监控前面在应用中定义的rder_count_total``ethod_cost_time_seconds_sum

可以看到，Meter的信息已经被收集和展示，但是显然不够详细和炫酷，这个时候就需要使用Grafana的UI做一下点缀。

# Grafana的安装和使用

Grafana的安装过程如下：

wgethttps://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.3.4-1.x86_64.rpmsudo yum localinstall grafana-5.3.4-1.x86_64.rpm

安装完成后，通过命令service grafana-server start启动即可，默认的启动端口为3000，通过ttp://${host}:3000即可。初始的账号密码都为admin，权限是管理员权限。接着需要在Home面板添加一个数据源，目的是对接Prometheus服务端从而可以拉取它里面的度量数据。数据源添加面板如下：

其实就是指向Prometheus服务端的端口就可以了。接下来可以天马行空地添加需要的面板，就下单数量统计的指标，可以添加一个Graph的面板

配置面板的时候，需要在基础(General)中指定Title：

接着比较重要的是Metrics的配置，需要指定数据源和Prometheus的查询语句：

最好参考一下Prometheus的官方文档，稍微学习一下它的查询语言PromQL的使用方式，一个面板可以支持多个PromQL查询。

前面提到的两项是基本配置，其他配置项一般是图表展示的辅助或者预警等辅助功能，这里先不展开，可以取Grafana的官网挖掘一下使用方式。然后我们再调用一下下单接口，过一段时间，图表的数据就会自动更新和展示：

接着添加一下项目中使用的Timer的Meter，便于监控方法的执行时间，完成之后大致如下：

你做得这个SpringBoot监控，看起来真炫酷！（下篇）

你做得这个SpringBoot监控，看起来真炫酷！（下篇）

相关阅读更多精彩内容

友情链接更多精彩内容