[Testing practice] Get the cloud network system performance test done

One, foreword

When we deploy or manage network systems, we are more concerned with network connectivity than with overall performance, or even if we consider performance, anomalies, or stability, we find that there is no appropriate means to test or monitor network health. In this article, first, a brief introduction of the network performance testing index key will focus on what index, when conducting performance test, how to estimate the performance of the system in order to reasonable planning and deployment of test environment, then introduces the basic network connectivity testing tools, network performance test tools, and how to deploy the monitor when conducting performance tests, more intuitive statistical performance indicators, etc. This article is not about reinventing the wheel because there is already a good documentation in the community. It is just a brief introduction and a few simple demo examples to provide some solutions or ideas when you encounter problems.

1.1 Network Performance Test Indicators

The purpose of performance testing is to pressure the system on the premise of expected performance indicators and verify whether the system can achieve the expected goals without resource performance bottlenecks. For different services or application scenarios, indicators may have different emphases. For the network performance test of the underlying system, the following indicators are usually focused:

Availability: Availability is the most basic requirement for a system. Before starting work, you need to check network connectivity. Common tools include ping, fping, Telnet, and curl to check whether a network link is normal. Note That the ping function is implemented based on ICMP. If the ping function can be pinged, the network can be pinged, but the port access permission is not available. Usually, you need to use Telnet and curl to check network connectivity and service availability.
Bandwidth BPS, the number of bits per second that can be transmitted. The bandwidth usually refers to the limit bandwidth between nodes in the network. Because the link patency between nodes is not completely visible, the limit bandwidth is usually determined by the capacity of the devices that comprise the network.
Throughput. Network throughput refers to the remaining bandwidth provided to network applications between two nodes in a network at a certain time. Network bottlenecks can be found through network throughput. Throughput is measured in line with the toilet effect. For example, even though the client and server are connected to their own 100M Ethernet cards, if the two 100M Ethernet cards are connected to a 10M switch, then the 10M switch is the bottleneck of the network. Throughput is limited by bandwidth. Throughput/bandwidth is the usage of the network link
Packet per second (PPS) : indicates the transmission rate in network packets. PPS is usually used to evaluate the forwarding capability of the network, while the forwarding based on Linux server is very affected by the size of the network packet. Therefore, packets of 1bytes are usually selected to test the PPS in the limit scenario of the system.
Delay: indicates the time required for sending a request from one end to receiving a response from the remote end. This indicator may have different meanings in different scenarios. It can represent the time it takes to establish a connection (such as TCP handshake delay) or the time it takes for a packet to return (such as RTT)
Packet loss rate: Network packet loss rate refers to the ratio of the number of lost packets to the number of sent packets. It is usually tested within the throughput range. The packet loss rate is related to the network congestion of each route between the client and the website server. Due to the limited processing capability of switches and routers, some packets are discarded when the network traffic is too high to process. The TCP/IP network can automatically resend packets. In this case, repeated resends cause more packets to be lost. Therefore, network congestion often leads to higher and higher packet loss rate, which is similar to traffic jam on the road.
Jitter rate: Jitter rate is the change of network delay. It is caused by any two adjacent packets of the same application passing through the network delay in the transmission route. Calculation method: Jitter rate is calculated by dividing the delay time difference between adjacent packets by the packet serial number difference.

1.2 Network performance evaluation of the system

Generally, the performance indicators of the system will be clearly defined in the product requirements document, and QA students will carry out test tasks based on the deployment environment. For the performance of the exploratory testing, usually need to QA classmate transverse know the reasonable scope of relevant indicators, such as the need to test the specified network performance, bandwidth is directly associated with the physical adapter configuration, network card is determined, the bandwidth is identified (the actual bandwidth is limited by the least of the entire network link that module), in order to more effective forecast, You can refer to documentation, published performance data of competing products, or empirical analysis. For example, to test the maximum bandwidth of the gateway, assume that the gateway node is configured with two 82599 10Gb nics as bond1, then the theoretical maximum bandwidth will not exceed 20Gb. Plus physical loss, the actual estimated bandwidth is about 15Gb to 18Gb. Assume that the client and server are deployed on physical machines with the same specifications. The bandwidth upper limit of a dedicated server is about 8Gb. To obtain the maximum bandwidth of the gateway node, at least three dedicated servers must be deployed on the client and server.

Network connectivity testing tool

After completing the test scheme design, review and environment deployment, the first thing to do is to test service availability. Related test tools have been mentioned above. Here, the general use of ping is briefly introduced.

2.1.1 ping

ICMP is an Internet Control Message Protocol (ICMP). As a subprotocol of TCP/IP, it is used to transfer control messages between IP hosts and routers. Control messages refer to the messages about the network itself, such as whether the network is connected, whether the host is reachable, and whether the route is available. Although these control messages do not transmit user data, they play an important role in the transmission of user data. The ping command is implemented based on ICMP. You can use this command to check network connectivity and network speed and resolve domain names. By default, ICMP packets are sent until the user manually stops them. You can use the following common parameters to control the transmission of packets:

The -c command specifies the number of packets to be sent
-w Specifies the maximum waiting time if there are multiple nics
-i Specifies the network interface card that sends packets
-s specifies the size of the packet to be sent. The default size is 64bytes

Press CTRL + | in the process of ping will print out the summary of the current information, statistical number send package quantity, receive present, packet loss rate, etc. For example, secondaryIP is configured for eth0 on the following cloud host. You need to test the network connectivity

root@pytest-likailiang-1t3j9c-2:~# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host LO valid_lft forever preferred_lft forever Inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether Fa :16:3e: B7: B2: CA BRD FF: FF: FF: FF: FF: FF INET 172.16.1.2/24 BRD 172.16.1.255 Scope global eth0 VALID_Lft forever Preferred_lft forever Inet 172.16.1.3/24 Scope global secondary eth0 valid_lft forever Preferred_lft forever Inet6 fe80::f816:3eff:feb7:b2ca/64 scope link valid_lft forever preferred_lft forever root@pytest-likailiang-1t3j9c-2:~# ping 192.168.1.70 -i 172.16.1.3 -c 10 -i 2 -w 10 -s 64 PING 192.168.1.70 (192.168.1.70) from 172.16.1.3: 56(84) bytes of data. 64 bytes from 192.168.1.70: ICmp_seq =1 TTL =63 time=1.51 ms 64 bytes from 192.168.1.70: Icmp_seq =2 TTL =63 time= 0.97ms 2/2 packets, 0% loss, Min /avg/ewma/ Max = 0.972/1.242/1.444/1.512 ms 64 bytes from 192.168.1.70: Icmp_seq =3 TTL =63 time=0.766 ms 64 bytes from 192.168.1.70: ICmp_seq =4 TTL =63 time=0.716 ms 64 bytes from 192.168.1.70: Icmp_seq =5 TTL =63 time= 0.62ms 5/5 packets, 0% loss, Min /avg/ewma/ Max = 0.681/0.929/1.204/1.512 ms -- 192.168.1.70 Ping statistics -- 5 packets transmitted, 5 received 0% packet loss, time 8010ms RTT min/avg/ Max /mdev = 0.681/0.929/1.512/0.309 ms root@pytest-likailiang-1t3j9c-2:~#Copy the code

Unlike ping, which waits for a connection timeout or sends a feedback message to a host, Fping immediately sends a packet to the next host after sending a packet to one host. This enables simultaneous ping of multiple hosts. If a host can be pinged successfully, the host is marked and removed from the waiting list. If the host cannot be pinged successfully, the host cannot be reachable and remains in the waiting list for subsequent operations. The Fping program is similar to ping. Fping differs from ping in that it can specify a range of hosts to be pinged on the command line or specify a host list file to be pinged.

2.1.2 MTR

MTR (Short for My Traceroute) is a more powerful network diagnostic tool that combines ping and Traceroute into one program. By default, MTR sends ICMP packets for link detection. Users can also specify UDP packets for detection through -u parameter. Compared with Traceroute, MTR will continuously detect related nodes on the link and provide corresponding statistical information. MTR can avoid the influence of node fluctuation on the test result, so the test result is more correct.

There are two versions of MTR: Linux and Windows. Under Windows, you can use WinMTR directly. WinMTR is a graphical tool about MTR under Windows. [WinMTR download address] (https://sourceforge.net/projects/winmtr/files/). WinMTR does not need to be installed. After downloading it, you can unpack it and run it directly. In Linux, you can use MTR command directly. Reference for installation and command-line operation:

Installing the MTR Tool

apt-get install mtr -y
Copy the code

MTR (My Traceroute) is a network testing tool preinstalled on almost all Linux distributions, integrating the graphical interface of tracert and ping commands

root@slaver1:~# mtr -h usage: mtr [--help] [--version] [-4|-6] [-F FILENAME] [--report] [--report-wide] [--displaymode MODE] [--xml] [--gtk] [--curses] [--raw] [--csv] [--json] [--split] [--no-dns] [--show-ips] [-o FIELDS] [-y IPINFO] [--aslookup] [-i INTERVAL]  [-c COUNT] [-s PACKETSIZE] [-B BITPATTERN] [-Q TOS] [--mpls] [-a ADDRESS] [-f FIRST-TTL] [-m MAX-TTL] [-U MAX_UNKNOWN] [--udp] [--tcp] [--sctp] [-P PORT] [-L LOCALPORT] [-Z TIMEOUT] [-G GRACEPERIOD] [-M MARK] HOSTNAME See the man page for details.Copy the code

As shown in the figure above, the value of each column of MTR test result is explained as follows:

root@lkl-slaver1:~# MTR -c 2-r 114.114.114.114 Start: Wed Aug 19 15:35:05 2020 HOST: slaver1 Loss% Snt Last Avg Best Wrst StDev 1.|-- ??? 100.0 2, 0.0 0.0 0.0 0.0 0.0 (2) | -??? 100.0 2, 0.0 0.0 0.0 0.0 0.0 3. | -??? 100.0 2, 0.0 0.0 0.0 0.0 0.0 4. | -??? 100.0 2, 0.0 0.0 0.0 0.0 0.0 5. | -??? 100.0 2, 0.0 0.0 0.0 0.0 0.0 6. | - 61.164.31.126 0.0% 2, 1.7, 2.8 1.7 3.9 1.4 7. | - 220.191.200.207 2 5.2 5.4 5.2 5.7 0.0% 0.0 8. | - 202.97.76.2 0.0% 2, 11.5, 11.6 11.5 11.7 0.0 9. | - 222.190.59.206 0.0% 2, 14.3, 16.3 14.3 18.2 2.6 10. | - 58.213.224.170 0.0% 2, 27.6, 22.1 16.5 27.6 7.9 11. 0.0% 2 | - public1.114dns.com 14.3 14.3 14.3 14.4 0.0 Host: Node IP address and domain name (press n to switch display); Loss% : packet Loss rate of a node. Snt: indicates the number of packets sent per second. The default value is 10 and can be specified by using the -c parameter. Last: indicates the value of the Last probe delay. Avg: average value of detection delay; Best: minimum value of detection delay; Wrst: Detection delay to maximum; StDev: indicates the standard deviation. A larger value indicates that the node is more unstable.Copy the code

More references [cloud community] (https://cloud.tencent.com/developer/information/mtr%E6%B5%8B%E8%AF%95%E5%B7%A5%E5%85%B7)

2.1.3 net netcat cat

NetCat is a very simple Unix tool that can read and write TCP or UDP network connections. It can be used in Linux and Windows.

The basic form used by Netcat is:

Nc Parameter Destination ADDRESS PortCopy the code

Common parameters are described as follows:

-k continues listening after the current connection ends. -l is used for port listening. Instead of sending data -n Does not use DNS resolution -n closes the network connection when encountering EOF -p Specifies the source port -u Uses UDP to transmit data -v (Verbose) displays more details -w Specifies the connection timeout period -z Does not send dataCopy the code

Example Reference: [in Linux, netcat network tool introduction] (https://blog.konghy.cn/2020/04/03/linux-natcat/), [” Swiss army knife “Netcat usage summary] (https://www.freebuf.com/sectool/168661.html)

2.1.4 xxxping

Ping command is a very easy to use and common network testing tool, which is based on ICMP. However, due to network security, ICMP may be prohibited in most network environments and cloud environments. Therefore, we must master some other popular network testing tools. For example, tcpping, TCping, PSPing, Hping, Paping and other network testing tools. The community documentation is quite extensive and will not be covered here.

Network performance testing tool

Network performance testing tools IperF and Netperf are introduced here. Related tools such as WRK and PKtGen will be supplemented later

3.1 iperf

3.1.1 iperf profile

Iperf (IPERF3) is a network performance testing tool that uses C/S(client/server) architecture to test TCP and UDP bandwidth and simulate network failures. Using the feature of IPERF, it can be used to test the performance of some network devices such as routers, firewalls, switches, etc., and evaluate system QOS. Iperf is divided into two versions, Unix/Linux and Windows. The Unix/Linux version is updated quickly and the latest version. Windows updates slowly. Iperf for Windows is called JPERF, or XJperf. Jperf builds on iPERF to develop a better UI and new functionality.

3.1.2 IPERF installation and usage instructions

Source installation reference [iperf] (https://github.com/esnet/iperf), recommend command line installed directly

~# apt-get install -y iperf3
Copy the code

On network performance testing recommended reference [ali cloud network performance test methods] (https://www.alibabacloud.com/help/zh/faq-detail/55757.htm#top). In this way, the horizontal data comparison has more reference value, more information reference

3.1.3 iperf instance

The bandwidth test is usually performed in UDP mode because the maximum bandwidth, delay jitter, and packet loss rate can be measured. Theory in the test, the first to link bandwidth as the data sending rate test, for example, from the client to the server of the link between the theory of bandwidth at 100 MBPS, with 100 – b m test first, and then according to the test results (including the actual bandwidth, delay jitter and packet loss rate), as the data sent to the actual bandwidth rate test, You will find that the jitter and packet loss rate are much better than the first time. Repeat the test a few times and you will get a stable actual bandwidth.

UDP mode

Server side:

iperf -u -s
Copy the code

Client:

Iperf -u -c 192.168.1.1 -b 100M -t 60Copy the code

In UDP mode, the data transmission rate is 100 Mbit/s, and the test duration is 60 seconds.

Iperf -u -c 192.168.1.1-b 5M -p 30 -t 60Copy the code

The client simultaneously initiates 30 connection threads to the server and sends data at a rate of 5Mbps.

Iperf -u -c 192.168.1.1 -b 100M -d -t 60Copy the code

Perform the upstream and downstream bandwidth test at the data transmission rate of 100M.

TCP model

Server side:

iperf -s
Copy the code

Client:

Iperf -c 192.168.1.1 -t 60Copy the code

In TCP mode, the client – server 192.168.1.1 upload bandwidth test takes 60 seconds.

Iperf -c 192.168.1.1 -p 30 -t 60Copy the code

The client initiates 30 connection threads to the server simultaneously.

Iperf -c 192.168.1.1 -d -t 60Copy the code

Perform upstream and downstream bandwidth tests

pps iperf3 -s -p 3000 iperf3 -c server-ip -i 1 -t 2000 -V -p 3000 -b 1000000000 -l 100 -M 89 for i in $(seq 3200 3300) ;  do screen -d -m iperf -s -p $i ; done for i in $(seq 3200 3300 ); do screen -d -m iperf -c server-ip -i 1 -t 2000 -V -p $i -b 1000000000 -l 100 -M 89 ; doneCopy the code

3.1.4 reference

Ali cloud network performance test method: [https://www.alibabacloud.com/help/zh/faq-detail/55757.htm#top] (HTTP: / / https://www.alibabacloud.com/help/zh/faq-det ail/55757.htm#top)
Tencent cloud network performance test method: [https://cloud.tencent.com/document/product/213/11460] (https://cloud.tencent.com/document/product/213/11460)
Iperf command: [https://man.linuxde.net/iperf] (https://man.linuxde.net/iperf)
[https://github.com/esnet/iperf](https://github.com/esnet/iperf)

3.2 was

3.2.1 was introduction

Netperf is a network performance measurement tool for TCP or UDP-based transmission. Netperf performs bulk data Transfer and Request/Reponse network performance tests based on different applications.

Netperf works in client/server mode. The Server is the NetServer, which listens for connections from the client. The client is the Netperf, which initiates network tests to the server. A control connection is established between the client and server to transmit information about the test configuration and test results. After the control connection is established and the test configuration information is transmitted, another test connection is established between the client and server to transmit special traffic patterns back and forth to test the network performance.

Community to provide open source [was] (https://hewlettpackard.github.io/netperf/) can only meet two nine time delay statistics, when the special performance of high precision test appears powerless, netease and hangzhou for this research cloud computing students for secondary development, Can satisfy the 9999 value data statistics, the project reference [was – 9999] (https://g.hz.netease.com/CloudQA/netperf-9999).

The common netperf command line parameters are as follows:

-h host: specifies the IP address of the remote NetServer server. -l testlen: Specifies the test duration (s). -t testname: specifies the test duration. Specifies the types of tests to run, including TCP_STREAM, UDP_STREAM, TCP_RR, TCP_CRR, UDP_RR -s size Sets the size of the socket sending and receiving buffer of the local system. -s size Sets the size of the socket sending and receiving buffer of the remote system. -m size Sets the size of the test group for sending and receiving packets of the local system Set the TCP_NODELAY option on the sockets of the local and remote systemsCopy the code

3.2.2 Netperf Network Test Mode Description

TCP_STREAM Netperf By default, TCP batch transmission is performed. That is, -t TCP_STREAM. During the test, Netperf sends packets of TCP data in batches to the NetServer to determine the throughput during data transmission
UDP_STREAM UDP_STREAM is used to test network performance for UDP batch transmission. Note: In this case, the size of the test group cannot be larger than the size of the socket send and receive buffer. Otherwise, NetperF will report an error message
Pssh-i -h client-ips -o StrictHostKeyChecking=no ‘for node in $(cat server-ips); do for i in $(seq 12010 12015); do screen -d -m netperf -H $node -p $i -t UDP_STREAM -l 100 — -m 1 -R 1 ; done ; done’
TCP_RR: The test object in TCP_RR mode is multiple TCP request and response transactions after a connection is established. In this mode, the maximum bandwidth or PPS of the long connection can be obtained
Pssh-i -h client-ips -O StrictHostKeyChecking=no ‘for node in $(cat server-ips); do for i in $(seq 12010 12015); do screen -d -m netperf -H $node -p $i -t TCP_RR -l 100 — -r 1B,1B; done ; done’
Test network latency and throughput./netperf_client_9999 -t TCP_RR -h-l 60 — -r 1B,1B -o “MIN_LAETENCY, MAX_LATENCY,MEAN_LATENCY, P90_LATENCY, P99_LATENCY ,P999_LATENCY,P9999_LATENCY,STDDEV_LATENCY ,THROUGHPUT ,THROUGHPUT_UNITS”
TCP_CRR is different from TCP_RR in that TCP_CRR establishes a new TCP connection for each transaction
Pssh-i -h client-ips -o StrictHostKeyChecking=no ‘for node in $(cat server-ips); do for i in $(seq 12010 12015); do screen -d -m netperf -H $node -p $i -t TCP_CRR -l 100 — -r 1B,1B; done ; done’
UDP_RR UDP_RR uses UDP packets to carry out request/response transactions

3.2.3 instance

Test the network bandwidth BPS

Start NetServer on the peer machine, which is a built-in server program of Netperf. It can be started without any arguments.

root@lkl-stress-test-nginx-017:~#   netserver -p 49999
Starting netserver with host 'IN(6)ADDR_ANY' port '49999' and family AF_UNSPEC
Copy the code

Enter the following command to start Netperf on your computer:

$NETperf -h $(remote IP address) root@lkl-stress-test-nginx-016:~# netperf -h 172.16.0.153 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.0.153 () port 0 AF_INET: demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 8192 65536 65536 10.00 15955.55Copy the code

As you can see, the throughput of the bandwidth is basically about 15G. The meanings of the fields in the output are as follows:

! [](https://pic2.zhimg.com/v2-3cb591171ce3f9b62ed260e7121a5a69_b.png)

PPS
PPS is the number of successful packets sent per second
The calculation method is as follows: PPS = Number of successful packets sent/Test time
You can run the SAR command on the Server to collect statistics on the data packets actually received. The command is shown as follows. The SAR command collects statistics on 300 data packets every second

sar -n DEV 1 300

Test delay

Start netServer on the peer machine. Enter the following command to start Netperf on your computer:

root@lkl-stress-test-nginx-016:~# netperf -h 172.16.0.153 -t omni -- -d rr -o "THROUGHPUT_UNITS, MIN_LATENCY, MAX_LATENCY, MEAN_LATENCY "OMNI Send | Recv TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.0.153 () port 0 AF_INET: demo Throughput Throughput Minimum Maximum Mean Units Latency Latency Latency Microseconds Microseconds Microseconds 22547.84 Trans/s 36 3605 44.23Copy the code

It can be seen that under the resquest/ Response test mode, the throughput per second is 22547.84 times, the minimum delay is 36 microseconds, the maximum delay is 3605 microseconds, and the average delay is 44.23 microseconds. The results show that there are a few packets with severe latency of milliseconds, but most of them are at the level of 100 microseconds

3.2.4 reference

Physical line network performance test method: [https://help.aliyun.com/document_detail/58625.html] (https://help.aliyun.com/document_detail/58625.html)
Netperf and Network Performance Measurement: [https://www.ibm.com/developerworks/cn/linux/l-netperf/index.html](https://www.ibm.com/developerworks/cn/linux/l-netperf /index.html)
[https://www.cnblogs.com/xieshengsen/p/6493277.html](https://www.cnblogs.com/xieshengsen/p/6493277.html)
[https://www.ibm.com/developerworks/cn/linux/l-netperf/index.html](https://www.ibm.com/developerworks/cn/linux/l-netperf /index.html)
[https://sq.163yun.com/blog/article/190965728772210688](https://sq.163yun.com/blog/article/190965728772210688)
PPS limit test of three-layer network: [https://blog.csdn.net/minxihou/article/details/84930250](https://blog.csdn.net/minxihou/article/details/84930250)
Was and iperf network performance test summary: [https://wsgzao.github.io/post/netperf/] (https://wsgzao.github.io/post/netperf/)
[https://blog.didiyun.com/index.php/2018/12/07/netperf/](https://blog.didiyun.com/index.php/2018/12/07/netperf/)
Network performance test method: [https://www.alibabacloud.com/help/zh/faq-detail/55757.htm](https://www.alibabacloud.com/help/zh/faq-detail/55757.htm)
HTTP performance testing WRK use tutorial: [https://juejin.cn/post/6844903550288396296] (https://juejin.cn/post/6844903550288396296)

Iv. Network data monitoring tool

4.1 netdata

Actual results: In the following figure, netData monitors the traffic of the eth0 network adapter on the cloud host. The traffic in the receiving and sending directions is more intuitive than SAR statistics.

! [](https://pic4.zhimg.com/v2-ca9eb5b50972c11092f2fec73b8d62bf_b.png)

Fping is similar to but more powerful than Ping. Fping differs from ping in that it can specify a range of hosts to be pinged on the command line or specify a host list file to be pinged. Unlike ping, which waits for a connection timeout or sends feedback, fping immediately sends packets to the next host after sending packets to one host to ping multiple hosts at the same time. If a host can be pinged successfully, the host is marked and removed from the waiting list. If the host cannot be pinged successfully, the host cannot be reachable and remains in the waiting list for subsequent operations.

The following figure shows netData integrating fping to monitor network connectivity. The fping.conf configuration file is as follows:

~# cat fping.conf fping="/usr/local/bin/fping" hosts="172.16.1.2 10.10.10.10" Isolate the IP address or domain name by space update_every=1 ping_every=1000 fping_opts=" -b 56 -r 0 -t 5000"Copy the code

Fping monitoring is supported

docker run -d –name=netdata

-p 19999:19999

-v /etc/passwd:/host/etc/passwd:ro

-v /etc/group:/host/etc/group:ro

-v /proc:/host/proc:ro

-v /sys:/host/sys:ro

-v /var/run/docker.sock:/var/run/docker.sock:ro

-v /root/fping.conf:/etc/netdata/fping.conf:ro

–cap-add SYS_PTRACE

–security-opt apparmor=unconfined

hub.c.163.com/nvsoline2/netdata:fping

! [](https://pic2.zhimg.com/v2-faf5673ca57d862cf84201f06d37d35d_b.png)

Monitoring data is persisted for 30 days

Dbengine needs to be configured, and 5G drives can theoretically support 2,000 data points per second for 30 days.

root@jiande1-dgw-jiande1:~# cat netdata.conf
[global]
    memory mode = dbengine
    page cache size = 32
    dbengine disk space = 4999
Copy the code

Start with the netdata.conf configuration above.

Portcheck configuration

These belong to the https://learn.netdata.cloud/docs/agent/collectors/go.d.plugin plugin.

~# cat portcheck. Conf update_every: 1 jobs: - name: server1 host: 127.0.0.1 ports: -22 - name: server2 host: 59.111.96.215 Ports: -9009 - name: NlB-beta-test host: 59.111.245.80 update_every: 1 ports: -80Copy the code

Go. D.c. Onf, which opens the items to be monitored

~# cat go.d.conf 
# netdata go.d.plugin configuration
#
# This file is in YaML format.

# Enable/disable the whole go.d.plugin.
enabled: yes

# Enable/disable default value for all modules.
default_run: yes

# Maximum number of used CPUs. Zero means no limit.
max_procs: 0

# Enable/disable specific g.d.plugin module
# If you want to change any value, you need to uncomment out it first.
# IMPORTANT: Do not remove all spaces, just remove # symbol. There should be a space before module name.
modules:
  example: yes
  nginx: yes
  portcheck: yes
Copy the code

Nginx monitoring demo

~ # cat nginx. Conf jobs: - name: the local url: http://10.199.128.66/nginx_statusCopy the code

Docker startup command

docker run -d --name=netdata -p 19999:19999 \ -v /etc/passwd:/host/etc/passwd:ro \ -v /etc/group:/host/etc/group:ro \ -v  /proc:/host/proc:ro \ -v /sys:/host/sys:ro \ -v /var/run/docker.sock:/var/run/docker.sock:ro \ -v /root/fping.conf:/etc/netdata/fping.conf:ro \ -v /root/example.conf:/etc/netdata/go.d/example.conf:ro \ -v /root/portcheck.conf:/etc/netdata/go.d/portcheck.conf:ro \ -v /root/nginx.conf:/etc/netdata/go.d/nginx.conf:ro \ -v /root/go.d.conf:/etc/netdata/go.d.conf:ro \ -v /root/netdata.conf:/etc/netdata/netdata.conf:ro \ --cap-add SYS_PTRACE \ --security-opt apparmor=unconfined \ hub.c.163.com/nvsoline2/netdata:fpingCopy the code

4.2 the reference

Real-time systems performance monitoring tools: [https://github.com/netdata/netdata] (https://github.com/netdata/netdata)
[https://www.hi-linux.com/tags/#NetData](https://www.hi-linux.com/tags/#NetData)
[https://cloud.tencent.com/developer/article/1409664](https://cloud.tencent.com/developer/article/1409664)

【阅读】

[number of netease sail: smart UI test automation solution 】【] (https://zhuanlan.zhihu.com/p/225795549)

[number of netease sail: [the solution] performance pressure measurement and analysis of tuning practice] (https://zhuanlan.zhihu.com/p/212664235)

[number of netease sail: 10000 + solution 】【 interface testing and management of advanced road] (https://zhuanlan.zhihu.com/p/166549356)