RTC服务器是UDP协议,存在以下几个难点:
- UDP包数目众多,包普遍比较小。比如一个视频关键帧,可能会被分成几十个UDP发送。比如每个Opus包,几十到一百多字节不等。
- 不同协议需要复用端口(才能支持K8S云原生平台),每个包都需要找到对应的Session处理,客户端地址可能还会变更。
- 高实时性,每个Session要即时的收发数据,不能做主动聚集包后收发,每个Session短时间就一两个包处理,没有太多可以批量处理的包。
- 内核对UDP协议的性能优化,不如TCP高,优化方式也不如TCP多。
- 需要加密和解密,除了CPU消耗,还导致内存拷贝。
尽管这样,还是有不少可以做的,详细可以看下面的链接:
- v4.0, 2021-02-28, RTC: Support high performance Zero Copy NACK. 4.0.76
- v4.0, 2021-02-27, RTC: Support Object Cache Pool for performance. 4.0.75
- v4.0, 2021-02-12, RTC: Support High Resolution(about 25ms) Timer. 4.0.72
- v4.0, 2021-02-10, RTC: Improve performance about 700+ streams. 4.0.71
优化过程中,最关键的是压测工具srs-bench,以及Perf+GCP。
发现Perf和GCP的数据有点差距,比如67%左右CPU使用时:
top - 14:58:57 up 25 days, 1:58, 4 users, load average: 0.66, 0.76, 0.73
Tasks: 92 total, 2 running, 90 sleeping, 0 stopped, 0 zombie
%Cpu(s): 30.1 us, 5.1 sy, 0.0 ni, 61.8 id, 0.0 wa, 0.0 hi, 3.1 si, 0.0 st
KiB Mem : 8008964 total, 460028 free, 1390824 used, 6158112 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 6311680 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8375 root 0 -20 1120556 992436 4192 R 68.1 12.4 24:14.17 srs
8462 root 20 0 312104 36364 3800 S 1.0 0.5 0:25.25 perf
6745 root 20 0 150332 6664 2380 S 0.7 0.1 0:15.11 dstat
6 root 20 0 0 0 0 S 0.3 0.0 49:03.07 ksoftirqd/0
SRS的统计信息:
Hybrid cpu=70.00%,969MB, cid=47984,8, timer=24421,4394,19973, clock=0,45,4,0,0,0,0,0,0,
objs=(pkt:0,raw:0,fua:0,msg:0,oth:401,buf:0,drop:0),
cache=(pkt:20-31w,raw:109113-69w,fua:32227-41w,msg:1-41w,buf:19-34w)
RTC: Server conns=401, rpkts=(47734,rtp:47726,stun:1,rtcp:7),
spkts=(1710,rtp:117,stun:1,rtcp:1592), rtcp=(pli:0,twcc:3982,rr:398),
snk=(39826,a:19913,v:19913,h:0), rnk=(2,2,h:2,m:0),
fid=(id:0,fid:5272,ffid:42461,addr:1,faddr:47734)
对比Perf的Top37函数,总计60.34%:
Overhead Shared Object Symbol
10.13% srs.4.0.77 [.] sha1_block_data_order_avx2
4.37% srs.4.0.77 [.] bitvector_left_shift
2.96% libpthread-2.17.so [.] __recvfrom_nocancel
2.51% libc-2.17.so [.] __memcpy_ssse3
2.51% srs.4.0.77 [.] heap_delete
2.49% srs.4.0.77 [.] SrsHourGlass::cycle
2.39% srs.4.0.77 [.] SrsRtpPacket2::decode
2.19% srs.4.0.77 [.] SrsRtpObjectCacheManager<SrsRtpPacket2>::recycle
2.16% srs.4.0.77 [.] SrsRtpPacket2::recycle_shared_buffer
1.79% [kernel] [k] finish_task_switch
1.71% srs.4.0.77 [.] SrsRtcPublishStream::on_rtp
1.56% [kernel] [k] system_call_after_swapgs
1.56% [kernel] [k] free_hot_cold_page
1.52% srs.4.0.77 [.] srtp_get_stream
1.47% [kernel] [k] copy_user_enhanced_fast_string
1.39% srs.4.0.77 [.] aesni_ctr32_encrypt_blocks
1.33% srs.4.0.77 [.] operator delete[]
1.32% [kernel] [k] _raw_spin_unlock_irqrestore
1.19% srs.4.0.77 [.] SrsRtcRecvTrack::do_check_send_nacks
0.99% srs.4.0.77 [.] OPENSSL_cleanse
0.94% srs.4.0.77 [.] SrsRtpRingBuffer::set
0.93% srs.4.0.77 [.] std::less<unsigned int>::operator()
0.89% srs.4.0.77 [.] srtp_unprotect
0.88% srs.4.0.77 [.] heap_insert
0.85% srs.4.0.77 [.] SrsRtcPublishStream::check_send_nacks
0.85% srs.4.0.77 [.] SrsRtpNackForReceiver::get_nack_seqs
0.83% srs.4.0.77 [.] SrsRtcPublishStream::get_audio_track
0.81% srs.4.0.77 [.] SrsRtcTrackDescription::has_ssrc
0.72% srs.4.0.77 [.] SrsResourceManager::find_by_fast_id
0.69% srs.4.0.77 [.] SrsSharedPtrMessage::count
0.68% srs.4.0.77 [.] EVP_MD_CTX_cleanup
0.67% srs.4.0.77 [.] SrsRtcPublishStream::do_on_rtp_plaintext
0.64% srs.4.0.77 [.] SrsBuffer::require
0.63% libc-2.17.so [.] epoll_ctl
0.61% [kernel] [k] udp_recvmsg
0.60% srs.4.0.77 [.] operator new[]
0.58% srs.4.0.77 [.] SrsUdpMuxListener::cycle
而GCP的top37函数,总计69.59%:
[root@iZbp12af7ajnkuducj2u8rZ ~]# ./objs/pprof objs/srs gperf.srs.gcp
(pprof) top37
Total: 17795 samples
2397 13.5% 13.5% 2397 13.5% __recvfrom_nocancel
1894 10.6% 24.1% 1894 10.6% sha1_block_data_order_avx2
746 4.2% 28.3% 746 4.2% bitvector_left_shift
501 2.8% 31.1% 511 2.9% heap_delete
485 2.7% 33.8% 2315 13.0% SrsHourGlass::cycle
440 2.5% 36.3% 440 2.5% __GI_epoll_wait
429 2.4% 38.7% 1136 6.4% SrsRtpObjectCacheManager::recycle
424 2.4% 41.1% 424 2.4% __memcpy_ssse3
417 2.3% 43.5% 516 2.9% SrsRtpPacket2::recycle_shared_buffer
373 2.1% 45.6% 1146 6.4% SrsRtpPacket2::decode
321 1.8% 47.4% 321 1.8% __GI_epoll_ctl
287 1.6% 49.0% 4914 27.6% SrsRtcPublishStream::on_rtp
270 1.5% 50.5% 270 1.5% aesni_ctr32_encrypt_blocks
245 1.4% 51.9% 698 3.9% SrsRtcRecvTrack::do_check_send_nacks
218 1.2% 53.1% 218 1.2% srtp_get_stream
200 1.1% 54.2% 1338 7.5% SrsRtpRingBuffer::set
199 1.1% 55.3% 199 1.1% std::less::operator
185 1.0% 56.4% 923 5.2% SrsRtcPublishStream::check_send_nacks
180 1.0% 57.4% 180 1.0% heap_insert
179 1.0% 58.4% 206 1.2% SrsRtpNackForReceiver::get_nack_seqs
175 1.0% 59.4% 175 1.0% __sendto_nocancel
150 0.8% 60.2% 237 1.3% SrsResourceManager::find_by_fast_id
149 0.8% 61.1% 149 0.8% OPENSSL_cleanse
143 0.8% 61.9% 143 0.8% srtp_unprotect
141 0.8% 62.6% 141 0.8% std::vector::size
130 0.7% 63.4% 130 0.7% EVP_MD_CTX_cleanup
127 0.7% 64.1% 264 1.5% SrsRtcPublishStream::get_audio_track
118 0.7% 64.8% 118 0.7% SrsFastCoroutine::pull
118 0.7% 65.4% 118 0.7% SrsRtcTrackDescription::has_ssrc
114 0.6% 66.1% 114 0.6% SrsBuffer::require
113 0.6% 66.7% 3272 18.4% SrsRtcPublishStream::do_on_rtp_plaintext
110 0.6% 67.3% 377 2.1% SrsRtpObjectCacheManager::allocate
106 0.6% 67.9% 8985 50.5% SrsUdpMuxListener::cycle
96 0.5% 68.4% 634 3.6% _st_vp_check_clock
94 0.5% 69.0% 1151 6.5% SrsRtcConnection::notify
84 0.5% 69.4% 84 0.5% PackedCache::KeyMatch (inline)
84 0.5% 69.9% 84 0.5% std::_Rb_tree::_M_begin
差异见下表:
TOP | Perf | Perf | Top | GCP | GCP |
---|---|---|---|---|---|
1 | 10.13% | [.] sha1_block_data_order_avx2 | 1 | 13.5% | __recvfrom_nocancel |
2 | 4.37% | [.] bitvector_left_shift | 2 | 10.6% | sha1_block_data_order_avx2 |
3 | 2.96% | [.] __recvfrom_nocancel | 3 | 4.2% | bitvector_left_shift |
4 | 2.51% | [.] __memcpy_ssse3 | 4 | 2.8% | heap_delete |
5 | 2.51% | [.] heap_delete | 5 | 2.7% | SrsHourGlass::cycle |
6 | 2.49% | [.] SrsHourGlass::cycle | 6 | 2.5% | __GI_epoll_wait |
7 | 2.39% | [.] SrsRtpPacket2::decode | 7 | 2.4% | SrsRtpObjectCacheManager::recycle |
8 | 2.19% | [.] SrsRtpObjectCacheManager<SrsRtpPacket2>::recycle | 8 | 2.4% | __memcpy_ssse3 |
9 | 2.16% | [.] SrsRtpPacket2::recycle_shared_buffer | 9 | 2.3% | SrsRtpPacket2::recycle_shared_buffer |
10 | 1.79% | [k] finish_task_switch | 10 | 2.1% | SrsRtpPacket2::decode |
11 | 1.71% | [.] SrsRtcPublishStream::on_rtp | 11 | 1.8% | __GI_epoll_ctl |
12 | 1.56% | [k] system_call_after_swapgs | 12 | 1.6% | SrsRtcPublishStream::on_rtp |
13 | 1.56% | [k] free_hot_cold_page | 13 | 1.5% | aesni_ctr32_encrypt_blocks |
14 | 1.52% | [.] srtp_get_stream | 14 | 1.4% | SrsRtcRecvTrack::do_check_send_nacks |
15 | 1.47% | [k] copy_user_enhanced_fast_string | 15 | 1.2% | srtp_get_stream |
16 | 1.39% | [.] aesni_ctr32_encrypt_blocks | 16 | 1.1% | SrsRtpRingBuffer::set |
17 | 1.33% | [.] operator delete[] | 17 | 1.1% | std::less::operator |
18 | 1.32% | [k] _raw_spin_unlock_irqrestore | 18 | 1.0% | SrsRtcPublishStream::check_send_nacks |
19 | 1.19% | [.] SrsRtcRecvTrack::do_check_send_nacks | 19 | 1.0% | heap_insert |
20 | 0.99% | [.] OPENSSL_cleanse | 20 | 1.0% | SrsRtpNackForReceiver::get_nack_seqs |
21 | 0.94% | [.] SrsRtpRingBuffer::set | 21 | 1.0% | __sendto_nocancel |
22 | 0.93% | [.] std::less<unsigned int>::operator() | 22 | 0.8% | SrsResourceManager::find_by_fast_id |
23 | 0.89% | [.] srtp_unprotect | 23 | 0.8% | OPENSSL_cleanse |
24 | 0.88% | [.] heap_insert | 24 | 0.8% | srtp_unprotect |
25 | 0.85% | [.] SrsRtcPublishStream::check_send_nacks | 25 | 0.8% | std::vector::size |
26 | 0.85% | [.] SrsRtpNackForReceiver::get_nack_seqs | 26 | 0.7% | EVP_MD_CTX_cleanup |
27 | 0.83% | [.] SrsRtcPublishStream::get_audio_track | 27 | 0.7% | SrsRtcPublishStream::get_audio_track |
28 | 0.81% | [.] SrsRtcTrackDescription::has_ssrc | 28 | 0.7% | SrsFastCoroutine::pull |
29 | 0.72% | [.] SrsResourceManager::find_by_fast_id | 29 | 0.7% | SrsRtcTrackDescription::has_ssrc |
30 | 0.69% | [.] SrsSharedPtrMessage::count | 30 | 0.6% | SrsBuffer::require |
31 | 0.68% | [.] EVP_MD_CTX_cleanup | 31 | 0.6% | SrsRtcPublishStream::do_on_rtp_plaintext |
32 | 0.67% | [.] SrsRtcPublishStream::do_on_rtp_plaintext | 32 | 0.6% | SrsRtpObjectCacheManager::allocate |
33 | 0.64% | [.] SrsBuffer::require | 33 | 0.6% | SrsUdpMuxListener::cycle |
34 | 0.63% | [.] epoll_ctl | 34 | 0.5% | _st_vp_check_clock |
35 | 0.61% | [k] udp_recvmsg | 35 | 0.5% | SrsRtcConnection::notify |
36 | 0.60% | [.] operator new[] | 36 | 0.5% | PackedCache::KeyMatch (inline) |
37 | 0.58% | [.] SrsUdpMuxListener::cycle | 37 | 0.5% | std::_Rb_tree::_M_begin |