resilience4j 是一款 java 平台轻量级容错库,支持熔断、限流、重试等功能。由于Netflix Hystrix 闭源,我们急需一款功能强大的容错工具库,来保护我们的环境。resilience4j 提供了spring boot 的starter,所以集成resilience4j很简单,但是也有一些坑。因此记录一下。
github:https://github.com/resilience4j/resilience4j
官方文档:https://resilience4j.readme.io/docs/circuitbreaker
我使用的是resilience4j 最新版本 0.16.0,spring boot 2.1.6。
相关依赖
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot2</artifactId>
<version>0.16.0</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
配置resilience4j,在spring 中的application.yml 中增加如下配置:
resilience4j.circuitbreaker:
instances:
backendA:
registerHealthIndicator: true
ringBufferSizeInClosedState: 10
ringBufferSizeInHalfOpenState: 3
waitDurationInOpenState: 10s
failureRateThreshold: 50
eventConsumerBufferSize: 2
backendB:
registerHealthIndicator: true
ringBufferSizeInClosedState: 10
ringBufferSizeInHalfOpenState: 3
waitDurationInOpenState: 10s
failureRateThreshold: 50
eventConsumerBufferSize: 2
recordFailurePredicate: io.github.robwin.exception.RecordFailurePredicate
resilience4j.retry:
instances:
backendA:
maxRetryAttempts: 3
waitDuration: 10s
enableExponentialBackoff: true
exponentialBackoffMultiplier: 2
retryExceptions:
- org.springframework.web.client.HttpServerErrorException
ignoreExceptions:
- com.learn.demo.resilience.ResilienceException
- java.io.IOException
backendB:
maxRetryAttempts: 3
waitDuration: 10s
retryExceptions:
- com.learn.demo.resilience.ResilienceException
ignoreExceptions:
- java.io.IOException
resilience4j.bulkhead:
instances:
backendA:
maxConcurrentCall: 10
maxWaitDuration: 20ms
backendB:
maxWaitDuration: 10ms
maxConcurrentCall: 20
resilience4j.thread-pool-bulkhead:
instances:
backendC:
threadPoolProperties:
maxThreadPoolSize: 1
coreThreadPoolSize: 1
queueCapacity: 1
resilience4j.ratelimiter:
instances:
backendA:
limitForPeriod: 10
limitRefreshPeriod: 1s
timeoutDuration: 0
registerHealthIndicator: true
eventConsumerBufferSize: 100
backendB:
limitForPeriod: 6
limitRefreshPeriod: 500ms
timeoutDuration: 3s
server:
port: 9090
注意
resilience4j 的配置在0.16.0 发生了一下变化。网上一些文档都是基于0.13.0版本。还有0.16.0版本配置不能被IDEA 识别,也不能自动联想,所以使用0.16.0版本尽量参考官网的配置。
熔断
配置介绍
配置 | 默认值 | 描述 |
---|---|---|
failureRateThreshold | 50 | 故障率阈值(百分比),超过该阈值,CircuitBreaker 触发熔断 |
ringBufferSizeInHalfOpenState10CircuitBreaker | 半开时环形缓冲区的大小。当CircuitBreaker从断开状态转换为半开状态,使用此环形缓冲器以确定是否健康。 | |
ringBufferSizeInClosedState | 100 | CircuitBreaker关闭时环形缓冲区的大小,在计算故障率之前,需要填充环形缓冲区。 |
waitDurationInOpenState | 60s | CircuitBreaker在从open打开到半打开之前应该等待的时间。 |
automaticTransitionFromOpenToHalfOpenEnabled | false | 是否自动有open 切换到half open |
recordExceptions | null | 失败异常 |
ignoreExceptions | null | 需要忽略的异常 |
recordFailure | throwable -> true | 自定义失败判定条件函数,默认所有异常都为失败 |
注解
@CircuitBreaker(name = "backendA")
code
@Retry(name = "backendA")
@CircuitBreaker(name = "backendA")
@RateLimiter(name = "backendA")
@Service
public class ServiceA {
@Bulkhead(name = "backendA")
public String sucess(){
System.out.println("sucess a");
return "sucess a";
}
@Bulkhead(name = "backendA")
public String fail() throws ResilienceException {
throw new ResilienceException("fail a");
}
}
多次失败达到阈值之后,再调用会抛出异常
io.github.resilience4j.circuitbreaker.CallNotPermittedException: CircuitBreaker 'backendA' is OPEN and does not permit further calls
rest 调用返回结果:
{
"timestamp": "2019-07-10T09:45:21.065+0000",
"status": 500,
"error": "Internal Server Error",
"message": "CircuitBreaker 'backendA' is OPEN and does not permit further calls",
"path": "/serviceA/failed"
}
这种情况可以配合spring aop,拦截异常,返回特性http 状态码或者信息,让前端捕获特定http 状态码,以此来呈现熔断之后的ui。
限流
配置
配置 | 默认值 | 描述 |
---|---|---|
timeoutDuration | 5s | 线程等待时间 |
limitRefreshPeriod | 500ns | 令牌刷新周期 |
limitForPeriod | 50 | 一个周期内令牌数 |
注解
@RateLimiter(name = "backendA")
code
同上
达到限制时抛出异常
io.github.resilience4j.ratelimiter.RequestNotPermitted: RateLimiter 'backendA' does not permit further calls
注意
经过试验,当@RateLimiter和@CircuitBreaker一起使用时,需要熔断器增加忽略RequestNotPermitted异常。否则会触发熔断。
ignoreExceptions:
- io.github.resilience4j.ratelimiter.RequestNotPermitted
隔离(bulkhead)
其主要作用是限制并发次数
配置
配置 | 默认值 | 描述 |
---|---|---|
maxConcurrentCalls | 25 | 最大并发数 |
maxWaitDuration | 0 | 饱和时等待时间 |
注解
@Bulkhead(name = "backendA")
重试
配置
配置 | 默认值 | 描述 |
---|---|---|
maxAttempts | 3 | 最大尝试次数 |
waitDuration | 500ms | 等待间隔 |
intervalFunction | numOfAttempts - > waitDuration | 故障后等待时间函数,默认为waitDuration,也可以通过函数修改为指数增长 |
retryOnResultPredicate | 判断结果是否重试Predicate | |
retryOnExceptionPredicate | 判断异常是否重试Predicate | |
retryExceptions | 重试异常列表 | |
ignoreExceptions | 忽略异常列表 |
注解
@Retry(name = "backendA")
resilience4j 为我们提供了很棒的容错库,简单易用。