Spring Cloud Gateway中netty线程池优化

一、背景描述

最近测试同学对系统进行压测。报出一个问题：几乎所有接口的成绩都不太好。甚至一些仅仅是主键查询，并且数据量不大的接口也是如此。排查过程中：跳过gateway网关，直接通过目标服务器ip进行压测发现成绩提升明显。初步判断是网关问题。网上翻阅资料发现一个优化点，就是netty本身的线程池配置。

二、线程池配置

要设置可同时工作的线程数需要设置netty中的reactor.netty.ioWorkerCount参数。该参数无法直接配置，需要通过System.setProperty设置，故我们可以创建以下配置类来配置该参数：

@Configuration
public static class ReactNettyConfiguration {

    @Bean
    public ReactorResourceFactory reactorClientResourceFactory() {
        System.setProperty("reactor.netty.ioSelectCount","1");

        // 这里工作线程数为2-4倍都可以。看具体情况
        int ioWorkerCount = Math.max(Runtime.getRuntime().availableProcessors()*3, 4));
        System.setProperty("reactor.netty.ioWorkerCount",String.valueOf(ioWorkerCount);
        return new ReactorResourceFactory();
    }
}

我这里版本是reactor-netty-core-1.0.3，版本不一样的话可能参数key不太一样。可以看一下LoopResources 中写的key。

Runtime.getRuntime().availableProcessors()获取的是cpu核心线程数也就是计算资源，而不是CPU物理核心数，对于支持超线程的CPU来说，单个物理处理器相当于拥有两个逻辑处理器，能够同时执行两个线程。

三、源码分析

package reactor.netty.resources;

import io.netty.channel.Channel;
import io.netty.channel.EventLoopGroup;
import java.time.Duration;
import java.util.Objects;
import reactor.core.Disposable;
import reactor.core.publisher.Mono;

@FunctionalInterface
public interface LoopResources extends Disposable {
    // 这里是worker线程数，未配置的话。从cpu核心数和4直接取一个大的
    int DEFAULT_IO_WORKER_COUNT = Integer.parseInt(System.getProperty("reactor.netty.ioWorkerCount", "" + Math.max(Runtime.getRuntime().availableProcessors(), 4)));
    // 这里是select线程数 默认是-1
    int DEFAULT_IO_SELECT_COUNT = Integer.parseInt(System.getProperty("reactor.netty.ioSelectCount", "-1"));
    ....

    // 创建一个默认的资源，把两个线程数的参数传递过去
    static LoopResources create(String prefix) {
        if (Objects.requireNonNull(prefix, "prefix").isEmpty()) {
            throw new IllegalArgumentException("Cannot use empty prefix");
        }
        return new DefaultLoopResources(prefix, DEFAULT_IO_SELECT_COUNT, DEFAULT_IO_WORKER_COUNT, true);
    }
    ....

接下来看一下 DefaultLoopResources做了什么

DefaultLoopResources(String prefix, int selectCount, int workerCount, boolean daemon) {
        this.running = new AtomicBoolean(true);
        this.daemon = daemon;
        this.workerCount = workerCount;
        this.prefix = prefix;

        this.serverLoops = new AtomicReference<>();
        this.clientLoops = new AtomicReference<>();

        this.cacheNativeClientLoops = new AtomicReference<>();
        this.cacheNativeServerLoops = new AtomicReference<>();
        // 因为默认没有配置 所以selectCode必然是-1
        if (selectCount == -1) {
            this.selectCount = workerCount;
            // serverSelectLoops没有创建，而是直接使用的serverLoops
            this.serverSelectLoops = this.serverLoops;
            this.cacheNativeSelectLoops = this.cacheNativeServerLoops;
        }
        else {
            this.selectCount = selectCount;
            this.serverSelectLoops = new AtomicReference<>();
            this.cacheNativeSelectLoops = new AtomicReference<>();
        }
    }

    @SuppressWarnings("FutureReturnValueIgnored")
    EventLoopGroup cacheNioSelectLoops() {
        // 两个相等的话 使用 cacheNioServerLoops 返回工作组
        if (serverSelectLoops == serverLoops) {
            return cacheNioServerLoops();
        }

        EventLoopGroup eventLoopGroup = serverSelectLoops.get();
        if (null == eventLoopGroup) {
            EventLoopGroup newEventLoopGroup = new NioEventLoopGroup(selectCount,
                    threadFactory(this, "select-nio"));
            if (!serverSelectLoops.compareAndSet(null, newEventLoopGroup)) {
                //"FutureReturnValueIgnored" this is deliberate
                newEventLoopGroup.shutdownGracefully(0, 0, TimeUnit.MILLISECONDS);
            }
            eventLoopGroup = cacheNioSelectLoops();
        }
        return eventLoopGroup;
    }

    // 这里相当于返回了工作组
    @SuppressWarnings("FutureReturnValueIgnored")
    EventLoopGroup cacheNioServerLoops() {
        EventLoopGroup eventLoopGroup = serverLoops.get();
        if (null == eventLoopGroup) {
            EventLoopGroup newEventLoopGroup = new NioEventLoopGroup(workerCount,
                    threadFactory(this, "nio"));
            if (!serverLoops.compareAndSet(null, newEventLoopGroup)) {
                //"FutureReturnValueIgnored" this is deliberate
                newEventLoopGroup.shutdownGracefully(0, 0, TimeUnit.MILLISECONDS);
            }
            eventLoopGroup = cacheNioServerLoops();
        }
        return eventLoopGroup;
    }

可以看出来，如果未配置。netty是没有select线程组的。结合分析reactor模型可以发现，这种情况对处理效率是有影响的。而且最大只和cpu核心数量相同的配置也明显无法重复利硬件用资源。

资料来源：关于spring cloud gateway中 netty线程池的一点小优化

Spring Cloud Gateway中netty线程池优化