参考文档:
- https://www.rabbitmq.com/networking.html#tuning-for-large-number-of-connections
- https://fasterdata.es.net/network-tuning/
- https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux
- https://psc.edu/index.php/services/networking/68-research/networking/641-tcp-tune
0. 完整适合短连接的TCP服务器的Linux内核设置
永久调整
在/etc/sysctl.d目录下,创建配置文件,将需要调整的参数加入其中即可。配置文件名格式为<number>-<appname>.conf。appname可以直接用应用系统运行时的os用户名。
执行以下命令,使之生效。替换掉命令中的文件名。
/sbin/sysctl -p /etc/sysctl.d/<number>-<appname>.conf
例如:
cat << EOF > /etc/sysctl.d/101-ichat.conf
net.core.somaxconn = 8192
net.ipv4.tcp_max_syn_backlog =8192
net.core.rmem_default = 4096
net.core.rmem_max = 6291456
net.core.wmem_default = 4096
net.core.wmem_max = 4194304
net.ipv4.tcp_mem = 3084288 4112386 6168576
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_rmem = 4096 87380 6291456
net.ipv4.tcp_fin_timeout=15
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_keepalive_time=30
net.ipv4.tcp_keepalive_intvl=10
net.ipv4.tcp_keepalive_probes=4
EOF
以下为详细解释:
1. 调整监听队列长度
net.core.somaxconn = 8192
somaxconn - INTEGER
Limit of socket listen() backlog, known in userspace as SOMAXCONN.
Defaults to 4096. (Was 128 before linux-5.4)
See also tcp_max_syn_backlog for additional tuning for TCP sockets.
net.ipv4.tcp_max_syn_backlog =8192
tcp_max_syn_backlog - INTEGER
Maximal number of remembered connection requests (SYN_RECV),
which have not received an acknowledgment from connecting client.
This is a per-listener limit.
The minimal value is 128 for low memory machines, and it will
increase in proportion to the memory of machine.
If server suffers from overload, try increasing this number.
Remember to also check /proc/sys/net/core/somaxconn
A SYN_RECV request socket consumes about 304 bytes of memory.
RabbitMQ建议:
Maximum number of remembered connection requests which did not receive an acknowledgment yet from connecting client. Default is 128, max value is 65535. 4096 and 8192 are recommended starting values when optimising for throughput.
2. TCP读写buffer
net.core.rmem_default = 4096
rmem_default
The default setting of the socket receive buffer in bytes.
net.core.rmem_max = 6291456
rmem_max
The maximum receive socket buffer size in bytes.
net.core.wmem_default = 4096
wmem_default
The default setting (in bytes) of the socket send buffer.
net.core.wmem_max = 4194304
wmem_max
The maximum send socket buffer size in bytes.
net.ipv4.tcp_mem = 3084288 4112386 6168576
net.ipv4.tcp_wmem = 4096 16384 4194304
tcp_wmem - vector of 3 INTEGERs: min, default, max
min: Amount of memory reserved for send buffers for TCP sockets.
Each TCP socket has rights to use it due to fact of its birth.
Default: 4K
default: initial size of send buffer used by TCP sockets. This
value overrides net.core.wmem_default used by other protocols.
It is usually lower than net.core.wmem_default.
Default: 16K
max: Maximal amount of memory allowed for automatically tuned
send buffers for TCP sockets. This value does not override
net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables
automatic tuning of that socket's send buffer size, in which case
this value is ignored.
Default: between 64K and 4MB, depending on RAM size.
net.ipv4.tcp_rmem = 4096 87380 6291456
tcp_rmem - vector of 3 INTEGERs: min, default, max
min: Minimal size of receive buffer used by TCP sockets.
It is guaranteed to each TCP socket, even under moderate memory
pressure.
Default: 4K
default: initial size of receive buffer used by TCP sockets.
This value overrides net.core.rmem_default used by other protocols.
Default: 87380 bytes. This value results in window of 65535 with
default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit
less for default tcp_app_win. See below about these variables.
max: maximal size of receive buffer allowed for automatically
selected receiver buffers for TCP socket. This value does not override
net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables
automatic tuning of that socket's receive buffer size, in which
case this value is ignored.
Default: between 87380B and 6MB, depending on RAM size.
3. TIME_WAIT优化
对于短连接的应用而言,调整TIMEWAIT,调整KEEPALIVE
RabbitMQ建议:
net.ipv4.tcp_fin_timeout=15
tcp_fin_timeout - INTEGER
The length of time an orphaned (no longer referenced by any
application) connection will remain in the FIN_WAIT_2 state
before it is aborted at the local end. While a perfectly
valid "receive only" state for an un-orphaned connection, an
orphaned connection in FIN_WAIT_2 state could otherwise wait
forever for the remote to close its end of the connection.
Cf. tcp_max_orphans
Default: 60 seconds
RabbitMQ解释:
Lowering this timeout to a value in the 15-30 second range reduces the amount of time closed connections will stay in the TIME_WAIT state.
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_reuse - INTEGER
Enable reuse of TIME-WAIT sockets for new connections when it is
safe from protocol viewpoint.
0 - disable
1 - global enable
2 - enable for loopback traffic only
It should not be changed without advice/request of technical experts.
Default: 2
net.ipv4.tcp_max_tw_buckets 不调整。
tcp_max_tw_buckets - INTEGER
Maximal number of timewait sockets held by system simultaneously.
If this number is exceeded time-wait socket is immediately destroyed
and warning is printed. This limit exists only to prevent
simple DoS attacks, you _must_ not lower the limit artificially,
but rather increase it (probably, after increasing installed memory),
if network conditions require more than default value.
4. TCP Keepalive调整
RabbitMQ建议:
net.ipv4.tcp_keepalive_time=30
net.ipv4.tcp_keepalive_intvl=10
net.ipv4.tcp_keepalive_probes=4
Linux Kernel Doc:
tcp_keepalive_time - INTEGER
How often TCP sends out keepalive messages when keepalive is enabled.
Default: 2hours.
tcp_keepalive_probes - INTEGER
How many keepalive probes TCP sends out, until it decides that the
connection is broken. Default value: 9.
tcp_keepalive_intvl - INTEGER
How frequently the probes are send out. Multiplied by
tcp_keepalive_probes it is time to kill not responding connection,
after probes started. Default value: 75sec i.e. connection
will be aborted after ~11 minutes of retries.
5. TCP选项
RabbitMQ中有建议,但是Linux Kernel Doc中没找到对应文档。
tcp_listen_options.nodelay,是否不开启Nagle's algorithm,默认为true,即不开启。
tcp_listen_options.sndbuf,发送buffer,os会自动调整。
tcp_listen_options.recbuf,接收buffer,os会自动调整。
tcp_listen_options.backlog,未接受的监听队列长度。
tcp_listen_options.keepalive,是否开启keepalive,默认为false。