1.线上告警
[org.apache.dubbo.common.threadpool.support.AbortPolicyWithReport] [rejectedExecution] [NettyServerWorker-13-3] [] [] --- [DUBBO] Thread pool is EXHAUSTED! Thread Name: DubboServerHandler-10.11.99.213:10393, Pool Size: 200 (active: 200, core: 200, max: 200, largest: 200), Task: 5199 (completed: 4999), Executor status:(isShutdown:false, isTerminated:false, isTerminating:false)
2.dump线程信息
发现非常多的线程处于等待状态
"xxx-sync-pool-1-thread-1353" #2091 prio=5 os_prio=0 tid=0x00007f32d0033000 nid=0x85f waiting on condition [0x00007f329ac3d000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000003b7c24680> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
at cn.xxx.jc.pbets.biz.shared.service.bet.impl.xxxxx.lambda$initEvalCache$54(EvaluationResultServiceImpl.java:829)
at cn.xxx.jc.pbets.biz.shared.service.bet.impl.xxxxx$$Lambda$2953/1878309232.get(Unknown Source)
at cn.xxx.jc.pbets.common.util.DistributedLockUtils.lock(DistributedLockUtils.java:42)
at cn.xxx.jc.pbets.biz.shared.service.bet.impl.xxxxx.initEvalCache(EvaluationResultServiceImpl.java:724)
at cn.xxx.jc.pbets.biz.shared.service.bet.impl.xxxxx.lambda$null$10(EvaluationResultServiceImpl.java:345)
at cn.xxx.jc.pbets.biz.shared.service.bet.impl.xxxxx$$Lambda$2952/1040750602.run(Unknown Source)
at org.apache.skywalking.apm.plugin.wrapper.SwRunnableWrapper.run(SwRunnableWrapper.java:43)
at org.apache.skywalking.apm.plugin.transmittable.thread.local.v2x.wrapper.SwRunnableWrapper.run(SwRunnableWrapper.java:30)
at com.alibaba.ttl.TtlRunnable.run(TtlRunnable.java:60)
at cn.xxx.jc2.common.util.thread.ThreadMdcUtil.lambda$wrap$3(ThreadMdcUtil.java:72)
at cn.xxx.jc2.common.util.thread.ThreadMdcUtil$$Lambda$2598/832064415.run(Unknown Source)
at org.apache.skywalking.apm.toolkit.trace.RunnableWrapper.run$original$VFOzA03Q(RunnableWrapper.java:34)
at org.apache.skywalking.apm.toolkit.trace.RunnableWrapper.run$original$VFOzA03Q$accessor$q4svxsz0(RunnableWrapper.java)
at org.apache.skywalking.apm.toolkit.trace.RunnableWrapper$auxiliary$wVM5ggPv.call(Unknown Source)
at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86)
at org.apache.skywalking.apm.toolkit.trace.RunnableWrapper.run(RunnableWrapper.java)
at org.apache.skywalking.apm.plugin.wrapper.SwRunnableWrapper.run(SwRunnableWrapper.java:43)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
3.从链路里查看使用的线程池配置
从配置里可以看到线程池核心数12,最大数50,队列1000
而dump里线程都创建到1353了,说明等待任务到过1000,而且创建了触发了最大线程的创建和销毁
业务代码里非常多的异步处理
因此怀疑因为dubbo线程的业务代码,因为并行处理的任务太多,导致更多的任务进了队列在排队等候处理,即使dubbo线程池是200个线程,但是每个所有dubbo请求到业务代码里,都只有12个线程在处理,导致大量dubbo请求在等待,宽进窄出
4.优化
线程池核心数设置为100,最大数设置为150,队列设置为100,因为绝大部分的异步处理,都是在调别的服务获取数据等,io密集型处理
发布观察,问题得到解决