1. Brokers
Single Node:
RabbitMQ 3.3.5 release
Mosquito 1.4.10 release
New Mqtt
Cluster:
RabbitMQ 3.3.5 release [4,8] nodes
New Mqtt [4] nodes
2. Mqtt in LineWorks server NCS
2.1 Mqtt usage in NCS
There are two kinds of mqtt clients : publisher subscriber. In LineWorks server usage, the two kinds of mqtt clients are separated,api servers which are less than 20 amount are publishers, LineWorks clients which are about 10000 amount in NCS environment are subscribers.
Figure 1 represents mqtt usage in NCS environment.
2.2 Statistics publish counts on ncs
date | qos1 total | qos1 max tps | qos0 total | qos0 max tps |
---|---|---|---|---|
20170118 | 4807211 | 630 | 545497 | 42 |
20170116 | 4760298 | 685 | 508954 | 58 |
20170113 | 4761518 | 684 | 534527 | 52 |
3. Test scenario
3.1 Focus on LineWorks real environment usage
We design this test scenario based on statistics of mqtt usage on ncs env.The goal of the benchmark is to evaluate the impact of the number of subscribers on MQTT server, in terms of the delivered throughtput(message rate on the subscriber side), the CPU usage of the server, and the time required to transmit a message from a publisher to a subscriber, i.e. the message transmission latency.There should be no limitation caused by the clients (affecting each other) or by the network.
The scalability test starts with a minimum of 10.000 subscribers, and tries to reach a maximum of 100.000 subscribers. The publishers send messages at a steady rate which is proportional to the subscribers amounts.
The tests do not try to reach the maximum message throughput. The goal is to show how the server scales with the number of publishers, each publishing at a fixed rate.
3.2 Machines infomation
The machines used for the benchmark all create on nclould, they have the same configuration listed in the table hereafter.
OS | Processor | RAM |
---|---|---|
centos 7.2 | Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz * 4 | 8 GB |
3.3 Test steps
A test is executed as a sequence of 3 steps:
- During the first step, every thread launches the publishers and the subscribers. Connections are opened.
- The second step starts after a time delay in order to allow the first step to be completed by every thread. Each thread publishes a message to the topic. In this way, as the total number of threads is known, each thread can handle a countdown and start publishing messages when the countdown reaches zero. Messages are asynchronously published, each message being sent by a different publisher. The message production lasts 5 minutes, which appeared to be long enough to reach a steady states.The second step ends when all the messages have been received by the subscribers. Therefore, the test also checks that there is no message lost by the server. The delivered throughput, CPU usage and message transmission latency are measured after a warmup period equal to 1 minute. The latency is measured for every published message. A timestamp is added to the payload of every message.The latency result shown by the graphics is the average of the latencies of all the messages published after the warmup period.
- The third step starts after a time delay allowing not to affect the measures done during the second step. During this step, subscribers unsubscribe and connections are closed.
4. test result
broker|subscriber counts(k)|publish rate(k)|payload size(byte)|qos|cpu|cpu total| latency(ms)|annotation
---------|--------------|-----|----------|--------|-------|-------|------------|----|----------
mosquito| 5 | 0.5 |10 | 1 | 100% | 100 | 180 | cpu max
rabbitmq| 5 | 0.5 |10 | 1 | 107% | 107 | 200 | connection max
rabbitmq| 5 | 0.5 |1000 | 1 | 109% | 109 | 200 | connection max
rabbitmq| 5 | 0.5 |10000 | 1 | 112% | 112 | 220 | connection max
new mqtt| 5 | 0.5 |10 | 1 | 48.8% | 48.8 | 40 |
new mqtt| 5 | 0.5 |1000 | 1 | 52.4% | 52.4 | 40 |
new mqtt| 5 | 0.5 |10000 | 1 | 57.1% | 57.1 | 50 |
new mqtt| 10 | 1 |10 | 1 | 90.3% | 90.3 | 45 |
new mqtt| 10 | 1 |1000 | 1 | 98.9% | 98.9 | 45 |
new mqtt| 10 | 1 |10000 | 1 | 106.6% | 106.6 | 50 |
rabbitmq| 10 | 1 |10 | 1 | 154% * 4 | 616 | 250 |
rabbitmq| 10 | 1 |1000 | 1 | 160% * 4 | 640 | 250 |
rabbitmq| 10 | 1 |10000 | 1 | 167% * 4 | 668 | 300 |
rabbitmq| 40 | 4 |10 | 1 | 130% *8 | 1040 | 1000 | delay max
new mqtt| 40 | 4 |10 | 1 | 137% * 4 | 548 | 40 |
new mqtt| 40 | 4 |1000 | 1 | 149% * 4 | 596 | 40 |
new mqtt| 40 | 4 |10000 | 1 | 160% * 4 | 640 | 50 |
new mqtt| 100 | 10 |10 | 1 | 198.0% * 4 | 792 | 50 |
new mqtt| 100 | 10 |1000 | 1 | 255.2% * 4 | 1020 | 50 |
new mqtt| 100 | 10 |10000 | 1 | 277% * 4 | 1108 | 60 |
mosquito| 5 | 0.5 |10 | 0 | 91% | 91 | 37 | cpu max
rabbitmq| 5 | 0.5 |10 | 0 | 78% | 78 | 57 | connection max
rabbitmq| 5 | 0.5 |1000 | 0 | 80% | 80 | 60 | connection max
rabbitmq| 5 | 0.5 |10000 | 0 | 85% | 85 | 60 | connection max
new mqtt| 5 | 0.5 |10 | 0 | 34.2% | 34.2 | 30 |
new mqtt| 5 | 0.5 |1000 | 0 | 38.7% | 38.7 | 30 |
new mqtt| 5 | 0.5 |10000 | 0 | 42.2% | 42.2 | 40 |
new mqtt| 10 | 1 |10 | 0 | 64.9% | 64.9 | 30 |
new mqtt| 10 | 1 |1000 | 0 | 71.6% | 71.6 | 30 |
new mqtt| 10 | 1 |10000 | 0 | 79.7% | 79.7 | 40 |
rabbitmq| 10 | 1 |10 | 0 | 66% * 4 | 264 | 60 |
rabbitmq| 10 | 1 |1000 | 0 | 75% * 4 | 300 | 60 |
rabbitmq| 10 | 1 |10000 | 0 | 84% * 4 | 336 | 60 |
rabbitmq| 40 | 4 |10 | 0 | 112% * 8 | 896 | 1000 | delay max
new mqtt| 40 | 4 |10 | 0 | 102% * 4 | 408 | 35 |
new mqtt| 40 | 4 |1000 | 0 | 111% * 4 | 444 | 40 |
new mqtt| 40 | 4 |10000 | 0 | 118% * 4 | 472 | 50 |
new mqtt| 100 | 10 |10 | 0 | 166% * 4 | 664 | 50 |
new mqtt| 100 | 10 |1000 | 0 | 211.8% * 4 | 847 | 50 |
new mqtt| 100 | 10 |10000 | 0 | 248.7% *4 | 995 | 55 |
4.1 qos1 (acknowledgement, not retained)
4.2 qos0 (no acknowledgement, retained)
4.3 discuss
- mosquito
- mosquito only have a single node and only use one cpu, when subscriber users are 5k, cpu reach almost 100%
- rabbitmq
- when 40k user subscribe on 4 nodes, delay is increasing.
- for 1 node, rabbitmq can establish not more than 6821 subscriber users
[irteam@dev-chenzhaoyu1.ncl ~]$ mosquitto_pub -h 10.113.236.145 -p 1884 -t 123 -q 0 -m hahaha
Error: Connection refused
[irteam@test-mqtt-cluster003.ncl ~]$ netstat -ant | grep 1884 | grep EST | wc -l
6849
{file_descriptors,[{total_limit,81820},
{total_used,6821},
{sockets_limit,73636},
{sockets_used,6819}]},
3.new mqtt
- new mqtt use less resources
- 4 nodes can hold 100k subscribes(tps 10k)