Directory [-]
- Tuning server parameters
- TCP/IP parameter Settings
- Maximum file descriptor
- Application runtime tuning
- OutOfMemory Killer
- Tuning client parameters
- Server testing
- Netty server
- The Spray server
- Undertow
- node.js
- Reference documentation
In fact, I’ve recently added several frameworks, now including Netty, Undertow, Jetty, Spray, vert. x, Grizzly, and Node.js. The test data can be found in the next article: Performance comparisons of seven WebSocket frameworks
The famous C10K problem came up in 2001. This article is one of the defining documents of high-performance server development, addressing the issue of serving 10,000 connections on a single machine, which was a very challenging goal at the time due to hardware and software limitations. But as time flies, with the rapid development of hardware and software, the single target of 10,000 has become the simplest thing. You can now use any of the major languages to provide up to 10,000 concurrent processing capabilities for a single machine. So now the goal has been raised 100 times, to C1000k, which is one server serving one million connections. In 2010,2011 has seen some implementation of C1000K articles, so in 2015, C1000K implementation should not be a difficult matter.
This paper is my record in the practice process. My goal is to use spran- WebSocket, Netty, Undertow and Node.js framework respectively to achieve the C1000K server, and see how difficult these frameworks are to achieve and how the performance is. The development languages are Scala and Javascript.
Of course, when talking about performance, we also have to talk about how many requests per connection per second, the number of RPS, and consider the size of each message. In general, we take a percentage, say 20% of connections per second that send or receive messages. My requirement is that the server just push and the client doesn’t actively send messages. One message is sent to a million people every minute. So the test tool implemented establishes 60000 websocket connections per client, a total of 20 clients. It was not really possible to use 20 machines, I used two AWS C3.2 Xlarge (8-core 16G) servers as client machines. Ten clients per machine. The server sends a mass message every 1 minute. The message content is simple, just the server’s time of the day.
Recently, I saw the message push system implemented by 360 with Go. The following is their data:
Current 360 news delivery system in the service of 50 + internal product, thousands of paragraph development platform App, real-time connect hundreds of millions of orders of magnitude, long day alone billions of scale, one minute can realize scale radio, peak day issued billions level, physical machine 400 units, more than 3000 instances distribution in nine separate clusters, each cluster multinational nearly 10 IDC inside and outside.
The code for the four servers and the Client test tool can be downloaded on Github. (There are actually more than four frameworks, including implementations of Netty, Undertow, Jetty, Spray-Websocket, vert. x, Grizzly and Node.js.)
Each server can easily reach 1.2 million simultaneous WebSocket active connections, but there are differences in resource usage and transaction time. 1.2 million is just a conservative figure, the server is still very easy with so many connections. Next, I will test C2000K.
We need to tune some server/client parameters before testing.
Tuning server parameters
Tend to modify the two files, / etc/sysctl. Conf and/etc/security/limits the conf, used to configure the TCP/IP parameters and maximum file descriptors.
TCP/IP parameter Settings
Modify the /etc/sysctl.conf file and set network parameters.
net.ipv4.tcp_wmem = 4096 87380 4161536
net.ipv4.tcp_rmem = 4096 87380 4161536
net.ipv4.tcp_mem = 786432 2097152 3145728Copy the code
The values are adjusted as required. More parameters can be found in a previous article: Tuning the Linux TCP/IP protocol stack. The /sbin/sysctl -p command takes effect immediately.
Maximum file descriptor
The Linux kernel itself has a maximum limit on file descriptors, which you can change as needed:
- Maximum number of open file descriptors: /proc/sys/fs/file-max
- Temporary Settings:
echo 1000000 > /proc/sys/fs/file-max
- Permanent Settings: Modify
/etc/sysctl.conf
File, addfs.file-max = 1000000
- Temporary Settings:
- Maximum number of open file descriptors for a process
useulimit -n
View current Settings. useulimit -n 1000000
Make temporary Settings.
To make it permanent, you can change it/etc/security/limits.conf
File, add the following line:
* hard nofile 1000000
* soft nofile 1000000
root hard nofile 1000000
root soft nofile 1000000Copy the code
Note that the hard limit cannot be greater than /proc/sys/fs/nr_open, so sometimes you need to change the value of nr_open. Run echo 2000000 > /proc/sys/fs/nr_open
To view the number of open file descriptors in use, run the following command:
[root@localhost ~]# cat /proc/sys/fs/file-nr
1632 0 1513506Copy the code
The first value is the number of open file descriptors allocated and used by the current system, the second value is the number of open file descriptors released after allocation (no longer used at present), and the third value is file-max.
To sum up:
- The number of open file descriptors for all processes cannot exceed /proc/sys/fs/file-max
- The number of file descriptors opened by a single process cannot exceed the soft limit of nofile in user limit
- The soft limit of nofile cannot exceed its hard limit
- The hard limit of nofile cannot exceed /proc/sys/fs/nr_open
Application runtime tuning
- Java application memory tuning server using 12G memory, throughput first garbage collector:
JAVA_OPTS="-Xms12G -Xmx12G -Xss1M -XX:+UseParallelGC"Copy the code
- V8 engine
node --nouse-idle-notification --expose-gc --max-new-space-size=1024 --max-new-space-size=2048 --max-old-space-size=8192 ./webserver.jsCopy the code
OutOfMemory Killer
If the server itself does not have much memory, such as 8GB, you may have “Killed” server processes with less than 1 million connections. You can see this by running dmesg
Out of memory: Kill process 10375 (java) score 59 or sacrifice childCopy the code
This is OOM Killer for Linux. If oom-killer is enabled, there will be 3 more files in /proc/pid for each process related to the OOM rating adjustment. Echo -17 > /proc/$(pidof Java)/oom_adj echo -17 > /proc/$(pidof Java)/oom_adj
Tuning client parameters
On a system, the number of local ports to connect to a remote service is limited. According to TCP/IP protocol, the port is a 16-bit integer, which can only be 0 to 65535, and 0 to 1023 are reserved ports, so only 1024 to 65534, that is, 64511 ports can be allocated. In other words, a machine can only create more than 60,000 long connections with one IP. To achieve more client connections, you can use more machines or network cards, or you can use virtual IP addresses. For example, the following command adds 19 IP addresses, one for the server and the other 18 for the client, resulting in 18 * 60000 = 1080000 connections.
Ifconfig eth0:0 192.168.77.10 netmask 255.255.255.0 UP ifconfig eth0:1 192.168.77.11 netmask 255.255.255.0 UP ifconfig Eth0:2 192.168.77.12 netmask 255.255.255.0 UP ifconfig eth0:3 192.168.77.13 netmask 255.255.255.0 up ifconfig eth0:4 192.168.77.14 netmask 255.255.255.0 UP ifconfig eth0:5 192.168.77.15 netmask 255.255.255.0 up ifconfig eth0:6 192.168.77.16 netmask 255.255.255.0 UP ifconfig eth0:7 192.168.77.17 netmask 255.255.255.0 up ifconfig eth0:8 192.168.77.18 netmask 255.255.255.0 UP ifconfig eth0:9 192.168.77.19 netmask 255.255.255.0 up ifconfig eth0:10 192.168.77.20 netmask 255.255.255.0 UP ifconfig eth0:11 192.168.77.21 netmask 255.255.255.0 up ifconfig eth0:12 192.168.77.22 netmask 255.255.255.0 UP ifconfig eth0:13 192.168.77.23 netmask 255.255.255.0 up ifconfig eth0:14 192.168.77.24 netmask 255.255.255.0 UP ifconfig eth0:15 192.168.77.25 netmask 255.255.255.0 up ifconfig eth0:16 192.168.77.26 netmask 255.255.255.0 UP ifconfig eth0:17 192.168.77.27 netmask 255.255.255.0 up ifconfig eth0:18 192.168.77.28 netmask 255.255.255.0 upCopy the code
Modify /etc/sysctl.conf file:
net.ipv4.ip_local_port_range = 1024 65535Copy the code
The /sbin/sysctl -p command takes effect immediately.
Server testing
In the actual test, I used one AWS C3.4 Xlarge (16 cores, 32GB memory) as the application server and two AWS C3.2 Xlarge (8 cores, 16GB memory) servers as the clients. These two machines were more than enough test clients, each creating ten Intranet virtual IP addresses and 60,000 Websocket connections per IP.
The client configuration is as follows: /etc/sysctl.conf Configuration
fs.file-max = 2000000
fs.nr_open = 2000000
net.ipv4.ip_local_port_range = 1024 65535Copy the code
The/etc/security/limits. Conf configuration
* soft nofile 2000000
* hard nofile 2000000
* soft nproc 2000000
* hard nproc 2000000Copy the code
The server configuration is as follows: /etc/sysctl.conf Configuration
fs.file-max = 2000000
fs.nr_open = 2000000
net.ipv4.ip_local_port_range = 1024 65535Copy the code
The/etc/security/limits. Conf configuration
* soft nofile 2000000
* hard nofile 2000000
* soft nproc 2000000
* hard nproc 2000000Copy the code
Netty server
- Set up 1.2 million connections, don’t send messages, easily reach. I have 14 gigabytes of unused memory left.
[roocolobu ~]# ss -s; free -m
Total: 1200231 (kernel 1200245)
TCP: 1200006 (estab 1200002, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 4
Transport Total IP IPv6
* 1200245 - -
RAW 0 0 0
UDP 1 1 0
TCP 1200006 1200006 0
INET 1200007 1200007 0
FRAG 0 0 0
total used free shared buffers cached
Mem: 30074 15432 14641 0 9 254
-/+ buffers/cache: 15167 14906
Swap: 815 0 815Copy the code
- Send a message every minute to all 1.2 million Websockets with the current server time. The sending display here is single thread sending, the server sent 1.2 million total time is about 15 seconds.
02:15:43. [307] - thread pool - 1-1 the INFO com.colobu.webtest.net ty. The WebServer $- send MSG to channels for C4453a26 bca6-42 b6 - b29b - 43653767 f9fc 02:15:57. [] - thread pool - 1-1 of 190 INFO com.colobu.webtest.net ty. The WebServer $- sent 1200000 channels for c4453a26-bca6-42b6-b29b-43653767f9fcCopy the code
The CPU usage is low and the network bandwidth usage is about 10 MBIT/s.
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 0 0 100 0 0 0| 0 0 | 60B 540B| 0 0 | 224 440 0 0 100 0 0 0| 0 0 | 60B 870B| 0 0 | 192 382 0 0 100 0 0 0| 0 0 | 59k 74k| 0 0 |2306 2166 2 7 87 0 0 4| 0 0 |4998k 6134k| 0 0 | 169k 140k 1 7 87 0 0 5| 0 0 |4996k 6132k| 0 0 | 174k 140k 1 7 87 0 0 5| 0 0 |4972k 6102k| 0 0 | 176k 140k 1 7 87 0 0 5| 0 0 |5095k 6253k| 0 0 | 178k 142k 2 7 87 0 0 5| 0 0 |5238k 6428k| 0 0 | 179k 144k 1 7 87 0 0 5| 0 24k|4611k 5660k| 0 0 | 166k 129k 1 7 87 0 0 5| 0 0 |5083k 6238k| 0 0 | 175k 142k 1 7 87 0 0 5| 0 0 |5277k 6477k| 0 0 | 179k 146k 1 7 87 0 0 5| 0 0 |5297k 6500k| 0 0 | 179k 146k 1 7 87 0 0 5| 0 0 |5383k 6607k| 0 0 | 180k 148k 1 7 87 0 0 5| 0 0 |5504k 6756k| 0 0 | 184k 152k 1 7 87 0 0 5| 0 48k|5584k 6854k| 0 0 | 183k 152k 1 7 87 0 0 5| 0 0 |5585k 6855k| 0 0 | 183k 153k 1 7 87 0 0 5| 0 0 |5589k 6859k| 0 0 | 184k 153k 1 5 91 0 0 3| 0 0 |4073k 4999k| 0 0 | 135k 110k 0 0 100 0 0 0| 0 32k| 60B 390B| 0 0 |4822 424Copy the code
Client (a total of 20, one of which is selected here to view its indicators). Each client maintains 60,000 connections. The total time for each message to be sent from the server to the client is 633 ms on average, and the standard deviation is very small, and the time per connection is similar.
Active WebSockets for eb810c24-8565-43ea-bc27-9a0b2c910ca4
count = 60000
WebSocket Errors for eb810c24-8565-43ea-bc27-9a0b2c910ca4
count = 0
-- Histograms ------------------------------------------------------------------
Message latency for eb810c24-8565-43ea-bc27-9a0b2c910ca4
count = 693831
min = 627
max = 735
mean = 633.06
stddev = 9.61
median = 631.00
75% <= 633.00
95% <= 640.00
98% <= 651.00
99% <= 670.00
99.9% <= 735.00
-- Meters ----------------------------------------------------------------------
Message Rate for eb810c24-8565-43ea-bc27-9a0b2c910ca4
count = 693832
mean rate = 32991.37 events/minute
1-minute rate = 60309.26 events/minute
5-minute rate = 53523.45 events/minute
15-minute rate = 31926.26 events/minuteCopy the code
The average RPS for each client is 1000, and the total RPS is about 20000 requests /seconds. The average RPS is 633 ms, the longest is 735 ms, and the shortest is 627ms.
The Spray server
- Set up 1.2 million connections, don’t send messages, easily reach. It has relatively high memory, with 7GB left.
[root@colobu ~]# ss -s; free -m
Total: 1200234 (kernel 1200251)
TCP: 1200006 (estab 1200002, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 4
Transport Total IP IPv6
* 1200251 - -
RAW 0 0 0
UDP 1 1 0
TCP 1200006 1200006 0
INET 1200007 1200007 0
FRAG 0 0 0
total used free shared buffers cached
Mem: 30074 22371 7703 0 10 259
-/+ buffers/cache: 22100 7973
Swap: 815 0 815Copy the code
- Send a message every minute to all 1.2 million Websockets with the current server time. High CPU usage, fast transmission, bandwidth up to 46M. It takes about 8 seconds to send a group message.
05/22 04:42:57.569 INFO [ool-2-worker-15] c.c.w.s.WebServer - send msg to workers 。for 8454e7d8-b8ca-4881-912b-6cdf3e6787bf
05/22 04:43:05.279 INFO [ool-2-worker-15] c.c.w.s.WebServer - sent msg to workers for 8454e7d8-b8ca-4881-912b-6cdf3e6787bf. current workers: 1200000Copy the code
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
74 9 14 0 0 3| 0 24k|6330k 20M| 0 0 | 20k 1696
70 23 0 0 0 6| 0 64k| 11M 58M| 0 0 | 18k 2526
75 11 6 0 0 7| 0 0 |9362k 66M| 0 0 | 24k 11k
82 4 8 0 0 6| 0 0 | 11M 35M| 0 0 | 24k 10k
85 0 14 0 0 1| 0 0 |8334k 12M| 0 0 | 44k 415
84 0 15 0 0 1| 0 0 |9109k 16M| 0 0 | 36k 425
81 0 19 0 0 0| 0 24k| 919k 858k| 0 0 | 23k 629
76 0 23 0 0 0| 0 0 | 151k 185k| 0 0 | 18k 1075Copy the code
Client (a total of 20, one of which is selected here to view its indicators). Each client maintains 60,000 connections. The total time for each message to be sent from the server to the client is 1412 ms on average, and the standard deviation is large and the time for each connection varies greatly.
Active WebSockets for 6674c9d8-24c6-4e77-9fc0-58afabe7436f
count = 60000
WebSocket Errors for 6674c9d8-24c6-4e77-9fc0-58afabe7436f
count = 0
-- Histograms ------------------------------------------------------------------
Message latency for 6674c9d8-24c6-4e77-9fc0-58afabe7436f
count = 454157
min = 716
max = 9297
mean = 1412.77
stddev = 1102.64
median = 991.00
75% <= 1449.00
95% <= 4136.00
98% <= 4951.00
99% <= 5308.00
99.9% <= 8854.00
-- Meters ----------------------------------------------------------------------
Message Rate for 6674c9d8-24c6-4e77-9fc0-58afabe7436f
count = 454244
mean rate = 18821.51 events/minute
1-minute rate = 67705.18 events/minute
5-minute rate = 49917.79 events/minute
15-minute rate = 24355.57 events/minuteCopy the code
Undertow
- Set up 1.2 million connections, don’t send messages, easily reach. It takes up less memory, with 11GB remaining.
[root@colobu ~]# ss -s; free -m
Total: 1200234 (kernel 1200240)
TCP: 1200006 (estab 1200002, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 4
Transport Total IP IPv6
* 1200240 - -
RAW 0 0 0
UDP 1 1 0
TCP 1200006 1200006 0
INET 1200007 1200007 0
FRAG 0 0 0
total used free shared buffers cached
Mem: 30074 18497 11576 0 10 286
-/+ buffers/cache: 18200 11873
Swap: 815 0 815Copy the code
- Send a message every minute to all 1.2 million Websockets with the current server time. Group play takes about 15 seconds.
03:19:31. [154] - thread pool - 1-1 the INFO Arthur c. olobu. Webtest. Undertow. The WebServer $- send MSG to channels for D9b450da - 2631-42 BC - a802-44285 f63a62d 03:19:46. [] - thread pool - 1-1 of 755 INFO Arthur c. olobu. Webtest. Undertow. WebServer $- sent 1200000 channels for d9b450da-2631-42bc-a802-44285f63a62dCopy the code
Client (a total of 20, one of which is selected here to view its indicators). Each client maintains 60,000 connections. The total time for each message to be sent from the server to the client is 672 ms on average, and the standard deviation is small and the time per connection does not vary much.
Active WebSockets for b2e95e8d-b17a-4cfa-94d5-e70832034d4d
count = 60000
WebSocket Errors for b2e95e8d-b17a-4cfa-94d5-e70832034d4d
count = 0
-- Histograms ------------------------------------------------------------------
Message latency for b2e95e8d-b17a-4cfa-94d5-e70832034d4d
count = 460800
min = 667
max = 781
mean = 672.12
stddev = 5.90
median = 671.00
75% <= 672.00
95% <= 678.00
98% <= 684.00
99% <= 690.00
99.9% <= 776.00
-- Meters ----------------------------------------------------------------------
Message Rate for b2e95e8d-b17a-4cfa-94d5-e70832034d4d
count = 460813
mean rate = 27065.85 events/minute
1-minute rate = 69271.67 events/minute
5-minute rate = 48641.78 events/minute
15-minute rate = 24128.67 events/minute
Setup Rate for b2e95e8d-b17a-4cfa-94d5-e70832034d4dCopy the code
node.js
Node.js is not a framework FOR me to consider and is listed here for reference only. Performance is also good.
Active WebSockets for 537c7f0d-e58b-4996-b29e-098fe2682dcf count = 60000 WebSocket Errors for 537c7f0d-e58b-4996-b29e-098fe2682dcf count = 0 -- Histograms ------------------------------------------------------------------ Message latency for 537C7F0D-e58B-4996-b29E-098FE2682dcf count = 180000 min = 808 Max = 847 mean = 812.10STddev = 1.95 median = 812.0075% <= 812.00 95% <= 813.00 98% <= 814.00 99% <= 815.00 99.9% <= 847.00 -- Meters ---------------------------------------------------------------------- Message Rate for 537C7F0D-e58B-4996-b29E-098FE2682dcf count = 180000 mean rate = 7191.98 events/minute 1-minute rate = 10372.33 Events /minute 5-minute rate = 16425.78 events/minute 15-minute rate = 9080.53 events/minuteCopy the code
Reference documentation
- HTTP long connection 2 million attempts and tuning
- Maximum number of open file descriptors for Linux
- The target of 1M concurrent connections was achieved
- Zhihu: How to achieve 3 million long connections in a single server?
- Build the C1000K server
- The secret to multi-millionth concurrency implementation
- C1000k New idea: user-mode TCP/IP stack
- Github.com/xiaojiaqi/C…
- 600k concurrent websocket connections on AWS using Node.js
- Plumbr. Eu/blog/memory…
- It.deepinmind.com/java/2014/0…
- Access.redhat.com/documentati…
- www.nateware.com/linux-netwo…
- Warmjade. Blogspot. Jp / 2014 _03_22_…
- Mp.weixin.qq.com/s?__biz=MjM…