在客户端与服务端之间的交互过程中,客户端向服务端发送一个syn的请求建立连接包,但是服务端收到后不返回syn+ack确认包:
1、
在客户端与服务端tcpdump进行端口对抓,使用wireshark分析
看到服务端收到了客户端发送的syn包,但是并没有返回客户端syn+ack包,客户端等待了1s设置的超时时间后重发,才建立了连接
2、
netstat -s查看网络情况
netstat -s | grep reject
13126873 packets rejects in established connections because of timestamp
由于时间戳导致包被拒绝的情况一直在增长
cat /proc/net/netstat
查看机器的计数器文件
TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW TWRecycled TWKilled PAWSPassive PAWSActive PAWSEstab DelayedACKs DelayedACKLocked DelayedACKLost ListenOverflows ListenDrops TCPPrequeued TCPDirectCopyFromBacklog TCPDirectCopyFromPrequeue TCPPrequeueDropped TCPHPHits TCPHPHitsToUser TCPPureAcks TCPHPAcks TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPFACKReorder TCPSACKReorder TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo TCPLoss TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures TCPFastRetrans TCPForwardRetrans TCPSlowStartRetrans TCPTimeouts TCPRenoRecoveryFail TCPSackRecoveryFail TCPSchedulerFailed TCPRcvCollapsed TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnData TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger TCPAbortFailed TCPMemoryPressures TCPSACKDiscard TCPDSACKIgnoredOld TCPDSACKIgnoredNoUndo TCPSpuriousRTOs TCPMD5NotFound TCPMD5Unexpected TCPSackShifted TCPSackMerged TCPSackShiftFallback TCPBacklogDrop TCPMinTTLDrop TCPChallengeACK TCPSYNChallenge BusyPollRxPackets TCPFromZeroWindowAdv TCPToZeroWindowAdv TCPWantZeroWindowAdv
TcpExt: 0 0 28417247 273595 0 0 0 0 0 0 8920817876 886076087 0 0 0 13126873 5837834196 10352662 19315573 128 128 134035850356 3564537962116 77185674789917 335133 153000568912 172198514916 99782109633 204994309264 0 150637 0 8436 455406 0 335321 78327 1591607 50432 11755659 173769 3833 0 29057 579 1551569 449839 80637 78679662 0 170 6 0 19315343 5 1899979 1163 1461621 184288 0 72304 0 309 0 0 19123 544588 209 0 0 10513441 5094638 7108594 82227 0 309118 139877 0 1239 1239 45114
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets
IpExt: 0 0 2 0 0 0 649370286806505 597235907092484 72 0 0 0
找到对应错误原因关键字为PAWSEstab,查看原代码,
static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb, const struct tcphdr *th, int syn_inerr)
{
struct tcp_sock *tp = tcp_sk(sk);
/* RFC1323: H1. Apply PAWS check first. */
if (tcp_fast_parse_options(sock_net(sk), skb, th, tp) &&
tp->rx_opt.saw_tstamp &&
tcp_paws_discard(sk, skb)) {
if (!th->rst) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_PAWSESTABREJECTED);
if (!tcp_oow_rate_limited(sock_net(sk), skb,
LINUX_MIB_TCPACKSKIPPEDPAWS, &tp->last_oow_ack_time))
tcp_send_dupack(sk, skb);
goto discard;
}
/* Reset is accepted even if it did not pass PAWS. */
}
可以知道到达时间戳晚导致reject。
然后查看linux的环境配置
cat /etc/sysctl.conf
kernel.printk = 5
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_timestamps = 1
net.core.somaxconn = 4096
net.ipv4.tcp_max_tw_buckets = 30000
net.netfilter.nf_conntrack_max = 524288
net.netfilter.nf_conntrack_tcp_timeout_established = 300
net.netfilter.nf_conntrack_max = 524288
看到net.ipv4.tcp_timestamps设置值为1,开启了时间戳选项,如果开启recycle为1则会进行强校验,一分钟内同ip主机的timestamp必须是递增,否则丢弃,但是我们这里并没有开启recycle。
现在场景缺失因为时间戳造成paws,所以进一步分析需要手动更新本机时间戳,时间要同步
同步后执行netstat -s | grep reject不再增长
3、
观察服务的tcp iotimeout仍然在增加,问题还存在,只是解决了时间戳paws的问题
进一步怀疑是否为网卡队列处理能力有问题导致在网卡丢列时候就丢弃了syn包
netstat -s|grep drop
10021478 outgoing packets dropped
在持续增长,果然,查看网卡是单队列,处理能力不足,解决办法换成了多队列,或者调大buffer
ethtool -G eth1
如果看完觉得有所收获的话,记得点赞关注哦