手写链路追踪-日志追踪性能分析

1. 前文回顾

前面七篇文章已经让《手写链路追踪》完成了全链路用户信息可视化,并且能监控API性能。
前文1:手写链路追踪
前文2:手写链路追踪优化-自动全局追踪代替局部手动追踪
前文3:手写链路追踪优化-多线程追踪
前文4:手写链路追踪优化-MDC改造
前文5:手写链路追踪优化-自定义用户信息追踪
前文6:手写链路追踪优化-自定义用户信息异步追踪
前文7:手写链路追踪优化-全局API性能监控

前面有朋友跟我说他goland的项目也自带了链路追踪,不过他们一般把链路追踪关了,因为他们觉得他们的这个链路追踪消耗太大了。
我当时文章里面只塞了一人traceId, 消耗小到可以忽略不计,但是后面我又加入了用户信息全链路可视化,这时我每个pattern都需要为用户的三个信息[username, userId, clientIp]判断,当这三个值非空的时候才会打印到日志上,这是为了避免影响其他不相关的线程日志,例如main线程。

用户信息在每个线程里面都有判断,而且日志默认就是阻塞日志,阻塞日志的处理过程跟阻塞线程是同步的,所以日志输出效率是影响api性能的。

所以这时候我不得不开始担心我的全链路用户信息可视化对API性能的影响了。
接下来我们一起来分析一下这个影响有大多,我们能不能接受这个性能损耗。

2.代码调整测试

2.1 把阻塞线程和异步线程的循环打印日志次数增加到10万次

如果只有十次打印,样本处理时间太小,统计日志pattern判断导致的额外损耗还不如时间精度统计误差大

package com.sandwich.logtracing.service.impl;

import com.sandwich.logtracing.service.LoginService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;

/**
 * @Author 公众号: IT三明治
 * @Date 2025/8/30
 * @Description:
 */
@Slf4j
@Service
public class LoginServiceImpl implements LoginService {

    @Override
    @Async
    public void asyncLoginSteps(String username) {
        for (int i=1; i<= 100000; i++) {
            log.info("processing async login for user {}, login step {} done", username, i);
        }
    }

    @Override
    public void blockedThreadLoginSteps(String username) {
        for (int i=1; i<= 100000; i++) {
            log.info("processing login for user {}, login step {} done", username, i);
        }
    }
}

2.2 把API请求耗时添加到response header返回给客户端,方便客户端统计耗时

前文我把日志链路追踪的处理全部都移到拦截器了,filter的注册我已经注释,所以这次这个header response的处理我不打算再注册一个新的filter了, 拦截器我试了一下效果也不太好。
这次我用@ControllerAdvice注解重写ResponseBodyAdvice.beforeBodyWrite方法,这样就可以在API的controller完成后,拦截response,给它加上耗时的header数据返回给客户端,代码如下

package com.sandwich.logtracing.advice;

import com.sandwich.logtracing.util.MdcUtils;
import lombok.extern.slf4j.Slf4j;
import org.jetbrains.annotations.NotNull;
import org.springframework.core.MethodParameter;
import org.springframework.http.MediaType;
import org.springframework.http.converter.HttpMessageConverter;
import org.springframework.http.server.ServerHttpRequest;
import org.springframework.http.server.ServerHttpResponse;
import org.springframework.web.bind.annotation.ControllerAdvice;
import org.springframework.web.servlet.mvc.method.annotation.ResponseBodyAdvice;

/**
 * @Author 公众号: IT三明治
 * @Date 2025/9/20
 * @Description:
 */
@Slf4j
@ControllerAdvice
public class ResponseTimeAdvice implements ResponseBodyAdvice<Object> {

    @Override
    public boolean supports(@NotNull MethodParameter returnType, @NotNull Class<? extends HttpMessageConverter<?>> converterType) {
        return true;
    }

    @NotNull
    @Override
    public Object beforeBodyWrite(@NotNull Object body, @NotNull MethodParameter returnType,
                                  @NotNull MediaType selectedContentType,
                                  @NotNull Class<? extends HttpMessageConverter<?>> selectedConverterType,
                                  @NotNull ServerHttpRequest request, ServerHttpResponse response) {

        log.info("beforeBodyWrite===========");
        // calculate request time and set to MDC
        MdcUtils.setRequestTime();
        // add api duration to response header
        response.getHeaders().add("x-request-duration", MdcUtils.getRequestTime());

        return body;
    }
}

由上可以看出来处理过程很简单,说到这里我想重点介绍一下@ControllerAdvice,它对于我们在spring项目中写一些全局捕获处理非常实用。

2.2.1 @ControllerAdvice定义

@ControllerAdvice是Spring MVC框架中的一个特殊注解,它是一个组合注解,用于创建全局性的控制器增强组件。

2.2.2 @ControllerAdvice本质

[@ControllerAdvice](file://org\springframework\web\bind\annotation\ControllerAdvice.java#L98-L239) 本质上是一个 [@Component](file://org\springframework\stereotype\Component.java#L42-L52) 注解,这意味着它会被Spring自动扫描并注册为Bean:

@Target(ElementType.TYPE)
@Retention(RetentionPolicy.RUNTIME)
@Documented
@Component
public @interface ControllerAdvice {
    // ...
}

2.2.3 主要作用

  • 全局异常处理
    这里适合捕获全局异常,集中处理,避免非已知异常信息直接返回到客户端
@ControllerAdvice
public class GlobalExceptionHandler {
    
    @ExceptionHandler(Exception.class)
    public ResponseEntity<String> handleException(Exception e) {
        return ResponseEntity.status(500).body("Error: " + e.getMessage());
    }
}
  • 全局数据绑定
@ControllerAdvice
public class GlobalBindingAdvice {
    
    @InitBinder
    public void initBinder(WebDataBinder binder) {
        // 为所有控制器方法添加数据绑定规则
        binder.addCustomFormatter(new DateFormatter("yyyy-MM-dd"));
    }
}
  • 全局模型属性
@ControllerAdvice
public class GlobalModelAttributeAdvice {
    
    @ModelAttribute
    public void addGlobalAttributes(Model model) {
        // 为所有控制器方法添加模型属性
        model.addAttribute("appName", "My Application");
    }
}
  • 响应体增强(ResponseBodyAdvice)
@ControllerAdvice
public class ResponseWrapperAdvice implements ResponseBodyAdvice<Object> {
    
    @Override
    public boolean supports(MethodParameter returnType, 
                           Class<? extends HttpMessageConverter<?>> converterType) {
        return true;
    }
    
    @Override
    public Object beforeBodyWrite(Object body, MethodParameter returnType,
                                 MediaType selectedContentType,
                                 Class<? extends HttpMessageConverter<?>> selectedConverterType,
                                 ServerHttpRequest request, ServerHttpResponse response) {
        // 在响应体写入之前进行处理
        return wrapResponse(body);
    }
}

这个响应体增加就是我们当前用到的,我们只是在response增加response header信息

2.2.4 @ControllerAdvice工作原理

  • 组件发现
    Spring启动时会扫描所有带有 [@ControllerAdvice](file://org\springframework\web\bind\annotation\ControllerAdvice.java#L98-L239) 注解的类,并将它们注册为Spring Bean。

  • 功能分发
    根据类中实现的不同接口和注解,[@ControllerAdvice](file://org\springframework\web\bind\annotation\ControllerAdvice.java#L98-L239) 会执行不同的功能:

@ExceptionHandler 方法:处理异常
@InitBinder方法:数据绑定
@ModelAttribute方法:添加模型属性
@ResponseBodyAdvice 接口:响应体处理

2.2.5 在当前项目中的应用

  1. 被Spring自动发现和注册
  2. 对所有控制器的响应进行统一处理
  3. 在每个API响应返回前自动添加处理时间相关的响应头

它的优势如下

  1. 集中管理:避免在每个控制器中重复编写相同逻辑
  2. 无侵入性:对现有控制器代码无影响
  3. 可复用性:一次编写,全局生效
  4. 易于维护:统一修改,全局生效

@ControllerAdvice 是Spring MVC中实现横切关注点(如日志、异常处理、响应包装等)的重要机制。
这里扯远了,以下是我们项目API的处理过程

2.3 拦截器调整

前面把request time set到MDC移到ResponseTimeAdvice.beforeBodyWrite了,所以postHandle就不需要再set一遍了,同时我加了一些“======”的日志,是为了验证完成上文的时序图,确保它的准确性,因为AI给我提供了错误的时序。

package com.sandwich.logtracing.interceptor;

import com.sandwich.logtracing.util.MdcUtils;
import com.sandwich.logtracing.util.RandomStrUtils;
import com.sandwich.logtracing.util.ThreadLocalContext;
import io.micrometer.common.util.StringUtils;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import lombok.extern.slf4j.Slf4j;
import org.jetbrains.annotations.NotNull;
import org.springframework.lang.Nullable;
import org.springframework.stereotype.Component;
import org.springframework.web.servlet.HandlerInterceptor;
import org.springframework.web.servlet.ModelAndView;

/**
 * @Author 公众号: IT三明治
 * @Date 2025/9/9
 * @Description:
 */
@Slf4j
@Component
public class MDCInterceptor implements HandlerInterceptor {

    @Override
    public boolean preHandle(HttpServletRequest request, @NotNull HttpServletResponse response, @NotNull Object handler) {
        log.info("preHandle===========");
        MdcUtils.setStartTime();
        // 从header中获取用户信息
        String traceId = request.getHeader("x-request-correlation-id");
        //if the request header don't have a trace id,then generate a random one
        if (StringUtils.isBlank(traceId)) {
            traceId = RandomStrUtils.generateRandomString(15);
        }
        MdcUtils.setTraceId(traceId);
        String userId = request.getHeader("X-User-ID");
        if (StringUtils.isNotBlank(userId)) {
            // 添加用户ID到MDC
            MdcUtils.setUserId(userId);
        }
        String username = request.getHeader("X-Username");

        if (StringUtils.isNotBlank(username)) {
            // 添加用户名到MDC
            MdcUtils.setUserName(username);
        }

        // 可以添加其他请求相关信息
        String clientIp = getClientIp( request);
        MdcUtils.setClientIp(clientIp);

        ThreadLocalContext.setTraceId(traceId);
        ThreadLocalContext.setUserId(userId);
        ThreadLocalContext.setUserName(username);
        ThreadLocalContext.setClientIp(clientIp);

        return true;
    }

    @Override
    public void postHandle(@NotNull HttpServletRequest request, @NotNull HttpServletResponse response,
                           @NotNull Object handler, @NotNull ModelAndView modelAndView) {
        log.info("postHandle===========");
    }

    @Override
    public void afterCompletion(@NotNull HttpServletRequest request, @NotNull HttpServletResponse response,
                                @NotNull Object handler, @Nullable Exception ex) {
        log.info("afterCompletion===========");

        log.info("API [{}] process finished, request time [{}]ms, response code [{}]",
                request.getRequestURI(), MdcUtils.getRequestTime(), response.getStatus());
        
        ThreadLocalContext.clearAll();
        // 请求完全结束后清理MDC,防止内存泄漏
        MdcUtils.clearAll();
    }

    private String getClientIp(HttpServletRequest request) {
        String ip = request.getHeader("X-Forwarded-For");
        if (ip == null || ip.isEmpty() || "unknown".equalsIgnoreCase(ip)) {
            ip = request.getHeader("X-Real-IP");
        }
        if (ip == null || ip.isEmpty() || "unknown".equalsIgnoreCase(ip)) {
            ip = request.getRemoteAddr();
        }
        return ip;
    }
}

2.4 调整logback配置

其实我们前面的测试一直打印两份日志,一份是同步打印到标准输出的console, 另外一份是异步打印到 {LOGHOME}/{APP_NAME}-info.log做备份日志。

 <!-- 开发环境配置 -->
    <springProfile name="dev">
        <root level="INFO">
            <appender-ref ref="CONSOLE"/>
           <appender-ref ref="ASYNC_ALL"/>
        </root>
    </springProfile>

我们开发测试过程中一般都是用console日志调试。生产环境分两种应用场景, 一种是有接触应用,就是我们需要登陆物理机或者VM部署,这种情况我们一般用后台启动方式(nohub),这种情况我们不需要关注console日志, 只需要打印一份日志即可,这里是被异步包装器ch.qos.logback.classic.AsyncAppender包装过的日志,它是异步的。如果需要同步日志,请不要用异步包装器包装。

 <!-- 开发环境配置 -->
    <springProfile name="dev">
        <root level="INFO">
            <appender-ref ref="ASYNC_ALL"/>
        </root>
    </springProfile>

一种是无接触应用,例如云环境GCP或者GKE环境,用一个GCE VM或者一个docker image部署一个应用,部署完就不再登录访问, 重启通过reboot vm或者pod来实现的,这种就可以不需要nohub方式启动,这种情况下console日志就非常有用,因为GCP会自动将标前台准输出console的日志更新到Log explorer(Google Cloud Platform 提供的一个强大的日志管理和分析工具。它是Google Cloud Logging(以前称为Stackdriver Logging)的一部分) 。如果用nohub的方式启动应用,我们就需要手动方式设置Stackdriver Logging,把日志输出路径的日志文件同步到Log explorer。两种方式都是可以的。

有点扯远了,只是为了解释一下console日志在生产环境的使用价值。
我这里备份日志用了异步,不太适合用来测试消耗。我这里console的日志没有用异步包装器,它是阻塞日志,适合用来对比加了手写上链路追踪日志和不加两种情况下的性能对比。
所以只我会在测试用注释掉ASYNC_ALL日志,只使用console日志测试。

 <!-- 开发环境配置 -->
    <springProfile name="dev">
        <root level="INFO">
            <appender-ref ref="CONSOLE"/>
<!--            <appender-ref ref="ASYNC_ALL"/>-->
        </root>
    </springProfile>

完整配置如下

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <!-- 属性配置 -->
    <property name="LOG_HOME" value="./logs"/>
    <property name="APP_NAME" value="log-tracing"/>
    <property name="LOG_PATTERN" value="%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n"/>
    <property name="MAX_FILE_SIZE" value="100MB"/>
    <property name="MAX_HISTORY" value="30"/>

    <!-- 控制台输出 -->
    <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>${LOG_PATTERN}</pattern>
            <charset>UTF-8</charset>
        </encoder>
    </appender>

    <!-- INFO级别日志文件 -->
    <appender name="INFO_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>${LOG_HOME}/${APP_NAME}-info.log</file>
        <filter class="ch.qos.logback.classic.filter.LevelFilter">
            <level>INFO</level>
            <onMatch>ACCEPT</onMatch>
            <onMismatch>DENY</onMismatch>
        </filter>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <fileNamePattern>${LOG_HOME}/${APP_NAME}-info.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
            <maxFileSize>${MAX_FILE_SIZE}</maxFileSize>
            <maxHistory>${MAX_HISTORY}</maxHistory>
            <totalSizeCap>10GB</totalSizeCap>
        </rollingPolicy>
        <encoder>
            <pattern>${LOG_PATTERN}</pattern>
            <charset>UTF-8</charset>
        </encoder>
    </appender>

    <!-- WARN级别日志文件 -->
    <appender name="WARN_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>${LOG_HOME}/${APP_NAME}-warn.log</file>
        <filter class="ch.qos.logback.classic.filter.LevelFilter">
            <level>WARN</level>
            <onMatch>ACCEPT</onMatch>
            <onMismatch>DENY</onMismatch>
        </filter>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <fileNamePattern>${LOG_HOME}/${APP_NAME}-warn.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
            <maxFileSize>${MAX_FILE_SIZE}</maxFileSize>
            <maxHistory>${MAX_HISTORY}</maxHistory>
            <totalSizeCap>5GB</totalSizeCap>
        </rollingPolicy>
        <encoder>
            <pattern>${LOG_PATTERN}</pattern>
            <charset>UTF-8</charset>
        </encoder>
    </appender>

    <!-- ERROR级别日志文件 -->
    <appender name="ERROR_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>${LOG_HOME}/${APP_NAME}-error.log</file>
        <filter class="ch.qos.logback.classic.filter.LevelFilter">
            <level>ERROR</level>
            <onMatch>ACCEPT</onMatch>
            <onMismatch>DENY</onMismatch>
        </filter>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <fileNamePattern>${LOG_HOME}/${APP_NAME}-error.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
            <maxFileSize>${MAX_FILE_SIZE}</maxFileSize>
            <maxHistory>${MAX_HISTORY}</maxHistory>
            <totalSizeCap>5GB</totalSizeCap>
        </rollingPolicy>
        <encoder>
            <pattern>${LOG_PATTERN}</pattern>
            <charset>UTF-8</charset>
        </encoder>
    </appender>

    <!-- 全量日志文件 -->
    <appender name="ALL_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>${LOG_HOME}/${APP_NAME}-all.log</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <fileNamePattern>${LOG_HOME}/${APP_NAME}-all.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
            <maxFileSize>${MAX_FILE_SIZE}</maxFileSize>
            <maxHistory>${MAX_HISTORY}</maxHistory>
            <totalSizeCap>20GB</totalSizeCap>
        </rollingPolicy>
        <encoder>
            <pattern>${LOG_PATTERN}</pattern>
            <charset>UTF-8</charset>
        </encoder>
    </appender>

    <!-- 异步包装器 -->
    <appender name="ASYNC_INFO" class="ch.qos.logback.classic.AsyncAppender">
        <discardingThreshold>0</discardingThreshold>
        <queueSize>512</queueSize>
        <includeCallerData>false</includeCallerData>
        <appender-ref ref="INFO_FILE"/>
    </appender>

    <appender name="ASYNC_WARN" class="ch.qos.logback.classic.AsyncAppender">
        <discardingThreshold>0</discardingThreshold>
        <queueSize>512</queueSize>
        <includeCallerData>false</includeCallerData>
        <appender-ref ref="WARN_FILE"/>
    </appender>

    <appender name="ASYNC_ERROR" class="ch.qos.logback.classic.AsyncAppender">
        <discardingThreshold>0</discardingThreshold>
        <queueSize>512</queueSize>
        <includeCallerData>false</includeCallerData>
        <appender-ref ref="ERROR_FILE"/>
    </appender>

    <appender name="ASYNC_ALL" class="ch.qos.logback.classic.AsyncAppender">
        <discardingThreshold>0</discardingThreshold>
        <queueSize>512</queueSize>
        <includeCallerData>false</includeCallerData>
        <appender-ref ref="ALL_FILE"/>
    </appender>

    <!-- 开发环境配置 -->
    <springProfile name="dev">
        <root level="INFO">
            <appender-ref ref="CONSOLE"/>
<!--            <appender-ref ref="ASYNC_ALL"/>-->
        </root>
    </springProfile>

    <!-- 测试环境配置 -->
    <springProfile name="test">
        <root level="INFO">
            <appender-ref ref="CONSOLE"/>
            <appender-ref ref="ASYNC_INFO"/>
            <appender-ref ref="ASYNC_WARN"/>
            <appender-ref ref="ASYNC_ERROR"/>
        </root>
    </springProfile>

    <!-- 生产环境配置 -->
    <springProfile name="prod">
        <root level="INFO">
            <appender-ref ref="ASYNC_INFO"/>
            <appender-ref ref="ASYNC_WARN"/>
            <appender-ref ref="ASYNC_ERROR"/>
        </root>
    </springProfile>

    <!-- 默认配置 -->
    <springProfile name="default">
        <root level="INFO">
            <appender-ref ref="CONSOLE"/>
            <appender-ref ref="ASYNC_ALL"/>
        </root>
    </springProfile>

    <!-- 应用特定包的日志级别 -->
    <logger name="com.sandwich.logtracing" level="DEBUG" additivity="false">
        <appender-ref ref="CONSOLE"/>
        <appender-ref ref="ASYNC_ALL"/>
    </logger>

    <!-- Spring框架日志级别 -->
    <logger name="org.springframework" level="INFO"/>

    <!-- 数据库相关日志级别 -->
    <logger name="org.hibernate" level="WARN"/>
    <logger name="org.springframework.jdbc" level="INFO"/>

</configuration>

2.5 调整yml的日志pattern

前文我们证明过yml的pattern优先级更高,所以切换pattern我放在yml测试。
先设置不用手写链路追踪的pattern

spring:
  application:
    name: log-tracing


# 日志配置
logging:
  level:
    com.sandwich.logtracing: DEBUG
    org.springframework: INFO
  pattern:
    console: "%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n"
#    console: "%d{yyyy-MM-dd HH:mm:ss.SSS} %X{traceId:-} %replace(%X{userId}){'^(?!$)(.+)$','[userId: $1] '}%replace(%X{username}){'^(?!$)(.+)$','[username: $1] '}%replace(%X{clientIp}){'^(?!$)(.+)$','[clientIp: $1] '}[%thread] %-5level %logger{50} - %msg%n"
#    file: "%d{yyyy-MM-dd HH:mm:ss.SSS} %X{traceId:-} %replace(%X{userId}){'^(?!$)(.+)$','[userId: $1] '}%replace(%X{username}){'^(?!$)(.+)$','[username: $1] '}%replace(%X{clientIp}){'^(?!$)(.+)$','[clientIp: $1] '}[%thread] %-5level %logger{50} - %msg%n"
#    file: "%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n"
#  file:
#    name: ./logs/log-tracing.log
  logback:
    rollingpolicy:
      max-file-size: 100MB
      max-history: 30

2.6 修改用于登录的shell客户端(login.sh)

新增一个loginForLoop的function,用来模拟循环多次登录,相当于增加测试样本,取测试平均值,避免因为样本过少,出现较大的误差。
这个function我采用自动请求,自动统计的方式,我们无需过多关注过程,只需要关注shell输出日志结果即可。

#!/bin/bash

# Define the API endpoint
API_URL="http://localhost:8080/test/login"

function generate_random_string() {
    # 使用openssl生成随机字符串(如果已安装)
    if command -v openssl &> /dev/null; then
        openssl rand -base64 20 | tr -dc 'a-zA-Z0-9' | fold -w 15 | head -n 1
    else
        # 使用系统方法生成
        local chars="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
        local result=""
        result=$(printf "%s" "${chars:$((RANDOM % ${#chars})):1}"{1..15} | tr -d '\n')
        echo "$result"
    fi
}

function normalLogin() {
    # 生成15位随机字符串作为traceId
    traceId=$(generate_random_string)
    echo "Generated traceId from client side: $traceId"
    response=$(curl -X POST $API_URL \
        -H "Content-Type: application/json" \
        -H "x-request-correlation-id: $traceId" \
        -H "X-User-ID: 123456" \
        -H "X-Username: Sandwich" \
        -d '{"username": "Sandwich", "password": "test"}')
    echo "Response from login API:"
    echo "$response" | python -m json.tool
}

function loginForLoop() {
    echo "===================================="
    echo "Executing loginForLoop function"
    echo "===================================="

    # 初始化统计变量
    local total_duration=0
    local count=0
    local durations=()

    for i in {1..100}
    do
        echo "Request #$i:"
        # 生成15位随机字符串作为traceId
        traceId=$(generate_random_string)
        echo "Generated traceId from client side: $traceId"

        # 使用curl的-w参数获取响应头信息,并将响应保存到临时文件
        response_headers=$(mktemp)
        response_body=$(mktemp)

        curl -X POST $API_URL \
            -H "Content-Type: application/json" \
            -H "x-request-correlation-id: $traceId" \
            -H "X-User-ID: 123456" \
            -H "X-Username: Sandwich" \
            -d '{"username": "Sandwich", "password": "test"}' \
            -w "\n%{http_code}\n%{time_total}s\n" \
            -D "$response_headers" \
            -o "$response_body"

        # 从响应头中提取需要的信息
        request_duration=$(grep -i "x-request-duration" "$response_headers" | cut -d: -f2 | tr -d ' \r\n')
        trace_id_header=$(grep -i "x-trace-id" "$response_headers" | cut -d: -f2 | tr -d ' \r\n')

        echo "Parsed response headers:"
        if [ -n "$request_duration" ]; then
            echo "  X-Request-Duration: $request_duration"
            # 收集持续时间数据
            durations+=($request_duration)
            total_duration=$((total_duration + request_duration))
            count=$((count + 1))
        fi
        if [ -n "$trace_id_header" ]; then
            echo "  X-Trace-Id: $trace_id_header"
        fi

        echo "Response body:"
        if [ -s "$response_body" ]; then
            cat "$response_body" | python -m json.tool 2>/dev/null || cat "$response_body"
        else
            echo "No response body"
        fi

        # 清理临时文件
        rm -f "$response_headers" "$response_body"

        echo "------------------------------------"
        sleep 1
    done

    # 计算并显示统计信息
    if [ $count -gt 0 ]; then
        # 计算平均值
        average_duration=$((total_duration / count))

        # 计算最小值和最大值
        min_duration=${durations[0]}
        max_duration=${durations[0]}

        for duration in "${durations[@]}"; do
            if [ $duration -lt $min_duration ]; then
                min_duration=$duration
            fi
            if [ $duration -gt $max_duration ]; then
                max_duration=$duration
            fi
        done

        echo "===================================="
        echo "Performance Statistics:"
        echo "  Total Requests: $count"
        echo "  Average Duration: ${average_duration}ms"
        echo "  Min Duration: ${min_duration}ms"
        echo "  Max Duration: ${max_duration}ms"
        echo "  Total Duration: ${total_duration}ms"
        echo "===================================="
    else
        echo "No valid duration data collected"
    fi
}

loginForLoop

3.测试验证

3.1 启动项目,执行循环登录请求

Administrator@USER-20230930SH MINGW64 /d/git/java/log-tracing/shell (master)
$ ./login.sh 
====================================
Executing loginForLoop function
====================================
省略 
request #1 
~
request #97
------------------------------------
Request #98:
Generated traceId from client side: ZvzG7eW99C2Fe3n
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   144    0   100  100    44    112     49 --:--:-- --:--:-- --:--:--   161

200
0.888474s
Parsed response headers:
  X-Request-Duration: 866
Response body:
{
    "responseCode": 200,
    "message": "success",
    "data": "Sandwich login success",
    "traceId": "ZvzG7eW99C2Fe3n"
}
------------------------------------
Request #99:
Generated traceId from client side: 2pniImSeJl1MoSE
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   144    0   100  100    44    115     50 --:--:-- --:--:-- --:--:--   165

200
0.869046s
Parsed response headers:
  X-Request-Duration: 847
Response body:
{
    "responseCode": 200,
    "message": "success",
    "data": "Sandwich login success",
    "traceId": "2pniImSeJl1MoSE"
}
------------------------------------
Request #100:
Generated traceId from client side: ES6HQvOV36wbZvG
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   144    0   100  100    44     98     43  0:00:01  0:00:01 --:--:--   141

200
1.015368s
Parsed response headers:
  X-Request-Duration: 1001
Response body:
{
    "responseCode": 200,
    "message": "success",
    "data": "Sandwich login success",
    "traceId": "ES6HQvOV36wbZvG"
}
------------------------------------
====================================
Performance Statistics:
  Total Requests: 100
  Average Duration: 991ms
  Min Duration: 819ms
  Max Duration: 1313ms
  Total Duration: 99119ms
====================================

由以上日志结尾的统计结果可以知道,100次请求的平均低耗时为991ms,最低耗时为819ms, 最高耗时为1313ms

2025-09-20 21:37:09.790 [http-nio-8080-exec-9] INFO  c.s.logtracing.service.impl.LoginServiceImpl - processing login for user Sandwich, login step 99997 done
2025-09-20 21:37:09.790 [http-nio-8080-exec-9] INFO  c.s.logtracing.service.impl.LoginServiceImpl - processing login for user Sandwich, login step 99998 done
2025-09-20 21:37:09.790 [http-nio-8080-exec-9] INFO  c.s.logtracing.service.impl.LoginServiceImpl - processing login for user Sandwich, login step 99999 done
2025-09-20 21:37:09.790 [http-nio-8080-exec-9] INFO  c.s.logtracing.service.impl.LoginServiceImpl - processing login for user Sandwich, login step 100000 done
2025-09-20 21:37:09.790 [http-nio-8080-exec-9] INFO  c.sandwich.logtracing.controller.LoginController - user Sandwich login success  
<!-- ↑ [1]这是API请求结束的日志 -->
2025-09-20 21:37:09.790 [http-nio-8080-exec-9] INFO  com.sandwich.logtracing.advice.ResponseTimeAdvice - beforeBodyWrite===========
<!-- ↑ [2] 这是ResponseTimeAdvice.beforeBodyWrite 处理response header的日志 -->
2025-09-20 21:37:09.791 [http-nio-8080-exec-9] INFO  c.sandwich.logtracing.interceptor.MDCInterceptor - postHandle===========
<!-- ↑ [3] 这是MDCInterceptor.postHandle 方法的日志 -->
2025-09-20 21:37:09.791 [http-nio-8080-exec-9] INFO  c.sandwich.logtracing.interceptor.MDCInterceptor - afterCompletion===========
<!-- ↑ [4] 这是MDCInterceptor.afterCompletion方法的日志 -->
2025-09-20 21:37:09.791 [http-nio-8080-exec-9] INFO  c.sandwich.logtracing.interceptor.MDCInterceptor - API [/test/login] process finished, request time [1001]ms, response code [200]
2025-09-20 21:37:09.791 [http-nio-8080-exec-9] INFO  com.sandwich.logtracing.util.ThreadLocalContext - traceId removed from thread local
2025-09-20 21:37:09.791 [http-nio-8080-exec-9] INFO  com.sandwich.logtracing.util.ThreadLocalContext - user id removed from thread local
2025-09-20 21:37:09.791 [http-nio-8080-exec-9] INFO  com.sandwich.logtracing.util.ThreadLocalContext - user name removed from thread local
2025-09-20 21:37:09.791 [http-nio-8080-exec-9] INFO  com.sandwich.logtracing.util.ThreadLocalContext - client ip removed from thread local
Disconnected from the target VM, address: '127.0.0.1:64206', transport: 'socket'
2025-09-20 21:42:41.866 [SpringApplicationShutdownHook] INFO  o.s.boot.web.embedded.tomcat.GracefulShutdown - Commencing graceful shutdown. Waiting for active requests to complete
2025-09-20 21:42:41.903 [tomcat-shutdown] INFO  o.s.boot.web.embedded.tomcat.GracefulShutdown - Graceful shutdown complete

以上日志验证了前面的时序图正确,beforeBodyWrite在API结束后,postHandle开始前执行,所以在beforeBodyWrite统计API request duration是正确的。

3.2 修改日志pattern,测试手写链路追踪的日志耗时

  • 修改pattern
spring:
  application:
    name: log-tracing


# 日志配置
logging:
  level:
    com.sandwich.logtracing: DEBUG
    org.springframework: INFO
  pattern:
    console: "%d{yyyy-MM-dd HH:mm:ss.SSS} %X{traceId:-} %replace(%X{userId}){'^(?!$)(.+)$','[userId: $1] '}%replace(%X{username}){'^(?!$)(.+)$','[username: $1] '}%replace(%X{clientIp}){'^(?!$)(.+)$','[clientIp: $1] '}[%thread] %-5level %logger{50} - %msg%n"
#    console: "%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n"
#    file: "%d{yyyy-MM-dd HH:mm:ss.SSS} %X{traceId:-} %replace(%X{userId}){'^(?!$)(.+)$','[userId: $1] '}%replace(%X{username}){'^(?!$)(.+)$','[username: $1] '}%replace(%X{clientIp}){'^(?!$)(.+)$','[clientIp: $1] '}[%thread] %-5level %logger{50} - %msg%n"
#    file: "%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n"
#  file:
#    name: ./logs/log-tracing.log
  logback:
    rollingpolicy:
      max-file-size: 100MB
      max-history: 30
  • 重启项目,再次测试
Administrator@USER-20230930SH MINGW64 /d/git/java/log-tracing/shell (master)
$ ./login.sh 
====================================
Executing loginForLoop function
====================================
省略 
request #1 
~
request #97
------------------------------------
Request #98:
Generated traceId from client side: hGQ8yI251AgqIGh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   144    0   100  100    44     96     42  0:00:01  0:00:01 --:--:--   138

200
1.038238s
Parsed response headers:
  X-Request-Duration: 1017
Response body:
{
    "responseCode": 200,
    "message": "success",
    "data": "Sandwich login success",
    "traceId": "hGQ8yI251AgqIGh"
}
------------------------------------
Request #99:
Generated traceId from client side: EzbgdwE6l0nmuRn
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   144    0   100  100    44     82     36  0:00:01  0:00:01 --:--:--   118

200
1.211156s
Parsed response headers:
  X-Request-Duration: 1194
Response body:
{
    "responseCode": 200,
    "message": "success",
    "data": "Sandwich login success",
    "traceId": "EzbgdwE6l0nmuRn"
}
------------------------------------
Request #100:
Generated traceId from client side: M4lxpOCPitv4Swo
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   144    0   100  100    44     94     41  0:00:01  0:00:01 --:--:--   135

200
1.063418s
Parsed response headers:
  X-Request-Duration: 1043
Response body:
{
    "responseCode": 200,
    "message": "success",
    "data": "Sandwich login success",
    "traceId": "M4lxpOCPitv4Swo"
}
------------------------------------
====================================
Performance Statistics:
  Total Requests: 100
  Average Duration: 1032ms
  Min Duration: 822ms
  Max Duration: 1216ms
  Total Duration: 103259ms
====================================

由以上日志结尾的统计结果可以知道,100次请求的平均低耗时为1032ms,最低耗时为822ms, 最高耗时为1216ms

3.3 比较两次统计结果

以下是没有手写链路追踪100次API请求耗时的统计结果

====================================
Performance Statistics:
  Total Requests: 100
  Average Duration: 991ms
  Min Duration: 819ms
  Max Duration: 1313ms
  Total Duration: 99119ms
====================================

以下是有手写链路追踪日志的100次API请求耗时的统计结果

====================================
Performance Statistics:
  Total Requests: 100
  Average Duration: 1032ms
  Min Duration: 822ms
  Max Duration: 1216ms
  Total Duration: 103259ms
====================================

由两次统计数据可以得出以下结果:

  • 两次耗时结果非常接近
  • 平均耗时和总耗时都是有手写链路追踪日志的更大,增加了链路追踪日志后,平均耗时增加了(1032-991)/991 = 4.1%
  • 最大耗时和最小耗时比较离散,有手写链路追踪日志的API请求Max duration甚至比没有链路追踪日志的Max duration更小,没有太大参考价值

幸好我们样本够大,不然十次八次,可能还看不出来追踪日志有额外的损耗。但是这个损耗影响非常小,每个API请求的主线程(异步线程虽然也是10万次,位是异步线程不阻塞,在此性能测试中不是需要重点考虑的)打印10万次业务日志,100次请求,就是1000万次业务日志,大概100s总耗时,才多出4.1s的额外开销,大多数项目还是可以接受的。

4.总结

本文主要通过技术手段,对日志全链路用户信息可视化(注意:这个时候已经不仅是一个简单的traceId了)日志耗时的统计,对比没有链路追踪日志耗时的统计,算出平均额外开销,大概是4.1%,这个开销倒不至于让人接受不了而关掉它。大概这就是手写的好处,我们手写的话只需要关注我们业务需求,不需要引入重型框架,重型框架需要考虑的是面面俱到,最终导致系统有更大的开销。可能有人好奇为什么我不在client side直接算耗时,因为这个额外消耗本来就很小,我要减少网络传输,拦截器,DispatcherServlet等等其他条件影响,减小误差,request duration只算API业务中纯打日志的耗时。我总是期待有人能给我更多的指点,请把您的好建议留在评论区,您的点赞,爱心和关注都是我更新的强大动力。谢谢批阅。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容