Since our microservices architecture is based on.NET Core 3.1 and is built using Docker technology, it necessarily requires an underlying Linux operating system. CentOS is used here, because virtual machines also use CentOS, and Docker is also based on this, so that problems can be better discovered and strange problems will not be caused by different operating systems. Inside will give package CentOS image source oh ~~~

Packing a CentOS Image

.Net core image based on CentOS is not officially available. Therefore, if necessary, you can only pack your own, here is part of the snippet, throwing and turning jade. The mirror is from official centos 7, the time zone has been added to Asia/Shanghai, and it can run almost all.net core programs perfectly except without installing libgdiplus and fonts.

FROM centos:7
#This image provides a.net Core 3.1 environment you can use to run your.NET
# applications.ENV HOME=/opt/app-root \ PATH=/opt/app-root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \ DOTNET_APP_PATH=/opt/app-root/app \ DOTNET_DATA_PATH=/opt/app-root/data \ DOTNET_DEFAULT_CMD=default-cmd.sh \ DOTNET_CORE_VERSION = 3.1 \ DOTNET_FRAMEWORK = netcoreapp3.1 \# Microsoft's images set this to enable detecting when an app is running in a container.DOTNET_RUNNING_IN_CONTAINER=true \ DOTNET_SSL_CERT_DIR=/opt/app-root/ssl_dir LABEL io.k8s.description="Platform for Running.NET Core 3.1 applications" \ IO. K8s. display-name=".NET Core 3.1" \ io.openshift.tags="runtime,.net,dotnet,dotnetcore,rh-dotnet31-runtime" \ io.openshift.expose-services="8080:http" \ io.openshift.s2i.scripts-url=image:///usr/libexec/s2i \ io.s2i.scripts-url=image:///usr/libexec/s2i
# Labels consumed by Red Hat build serviceLABEL Name ="dotnet/ dotnet-31-Runtime-Container "\ com.redhat.component=" rh-dotnet31-Runtime-Container" \ version="3.1.ums"  \ release="1" \ architecture="x86_64"

# Don't download/extract docs for nuget packages
ENV NUGET_XMLDOC_MODE=skip
# add chinese timezone
ENV TZ Asia/Shanghai
RUN ln -fs /usr/share/zoneinfo/${TZ} /etc/localtime \
    && echo ${TZ} > /etc/timezone

# Make dotnet command available even when scl is not enabled.
RUN ln -s /opt/app-root/etc/scl_enable_dotnet /usr/bin/dotnet

RUN yum install -y centos-release-dotnet && \
    INSTALL_PKGS="rh-dotnet31-aspnetcore-runtime-3.1 nss_wrapper tar unzip" && \
    yum install -y --setopt=tsflags=nodocs $INSTALL_PKGS && \
    rpm -V $INSTALL_PKGS && \
    yum clean all -y && \
	# yum cache files may still exist (and quite large in size)
    rm -rf /var/cache/yum/*

# Get prefix path and path to scripts rather than hard-code them in scripts
ENV CONTAINER_SCRIPTS_PATH=/opt/app-root \
    ENABLED_COLLECTIONS="rh-dotnet31"

# When bash is started non-interactively, to run a shell script, for example it
# looks for this variable and source the content of this file. This will enable
# the SCL for all scripts without need to do 'scl enable'.ENV BASH_ENV=${CONTAINER_SCRIPTS_PATH}/etc/scl_enable \ ENV=${CONTAINER_SCRIPTS_PATH}/etc/scl_enable \ PROMPT_COMMAND=".  ${CONTAINER_SCRIPTS_PATH}/etc/scl_enable"
# Add default user
RUN mkdir -p ${DOTNET_APP_PATH} ${DOTNET_DATA_PATH} && \
    useradd -u 1001 -r -g 0 -d ${HOME} -s /sbin/nologin \
      -c "Default Application User" default
# Run container by default as user with id 1001 (default)
USER 0  # admin
Copy the code

Now that you have an image, you are free to distribute.Net Core microservices based on centos.

TCP network performance tuning is a term only

  1. NIC: NIC
  2. DMA: Direct memory access, which typically refers to moving data from a network device directly to an area of memory to save CPU overhead.
  3. Ring Buffer: The Ring Buffer. During system startup, the NIC registers its information with the system. The system allocates a Ring Buffer queue and a special kernel memory area to the NIC for storing incoming packets.

Guidelines for performance tuning

First of all, tuning performance is very complex, so there are many complex issues to consider in order to optimize performance, so there is no perfect off-the-shelf solution that you can copy.

There are these constraints on performance tuning:

  • Network, network card capability
  • Driver features and configuration
  • The hardware configuration of a computer
  • CPU to memory architecture
  • The CPU auditing
  • Kernel version

Because there are so many factors, you can see why we still need to set so many parameters to tune the operating system after it leaves the factory.

Network card to receive package

  • Ring Buffer

As the name suggests, it is a circular buffer queue, so it is likely to overwrite previous data if it is not large enough. On the nic device, it allocates RX Ring Buffer to receive data and TX Ring Buffer to send data. It uses SoftIRQs mode (including hardware interrupt, software interrupt two ways).

  • Interrupts and interrupt handling

Hardware interrupt is the top cut off, when the NIC receives the data, it is through the DMA copy data to the kernel buffer, NIC using hardware interrupt notification kernel data, then interrupt processing to take over the subsequent processing, of course it needs to interrupt any task processing, and other resources for their own use, so is interrupted by the expensive cost very much. Hardware interrupts typically follow the processing logic and distribute themselves as software interrupts or SoftIRQs so that tasks can be handled more gently.

#Viewing hardware Interrupts
cat /proc/interrupts
Copy the code
  • SoftIRQs

SoftIRQs are also top-level interrupts that cannot be interrupted at run time. It is designed to handle the Ring Buffer.

#Check RX and TX SoftIRQs
cat  /proc/softirqs | grep RX
cat  /proc/softirqs | grep TX
Copy the code
  • NAPI Polling

The new API, designed in pull mode, effectively avoids the use of hardware interrupts to monitor data, reducing the burden on the system. A picture is worth a thousand words. See the picture below.

  • Network protocol stack

Once the data has been received from the NIC to the kernel layer, the subsequent processing is taken over by different protocol stacks: Ethernet, ICMP, IPV4, IPV6, TCP, UDP, SCTP, and so on. Finally, the data is passed to the socket buffer, and the upper-layer application can receive the data, which is moved from kernel space to user space, ending kernel layer participation.

  • Network diagnostic tool

Netstat: obtains network statistics from network files

cat /proc/net/dev
cat /proc/net/tcp
cat /proc/net/unix
Copy the code

Dropwatch: Monitors the release of memory to the kernel. IP: manages and monitors routing and device information ethtool: displays and changes NIC parameter configurations

  • TCP parameter tuning persistence needs to be implemented through restart

Many network parameters are kernel-controlled, so sySCTL is used to read and change them. However, sySCTL is only applicable to the current operating environment, and the system restarts to return to the kernel default Settings. Therefore, modify the /etc/sysctl.conf file.

  • Check the bottleneck

An appropriate tool to look at hardware parameter bottlenecks is ethtool.

 ethtool -S eth3
 #According to the following
 tx_scattered: 0
 tx_no_memory: 0
 tx_no_space: 0
 tx_too_big: 0
 tx_busy: 0
 tx_send_full: 0
 rx_comp_busy: 0
 rx_no_memory: 0
 stop_queue: 0
 wake_queue: 0
 vf_rx_packets: 0
 vf_rx_bytes: 0
 vf_tx_packets: 0
 vf_tx_bytes: 0
 vf_tx_dropped: 0
 tx_queue_0_packets: 43783950
 tx_queue_0_bytes: 31094532492
 rx_queue_0_packets: 91969116
 rx_queue_0_bytes: 69733870231
 tx_queue_1_packets: 48375299
 tx_queue_1_bytes: 32469935392
 rx_queue_1_packets: 80277415
 rx_queue_1_bytes: 65208029029
 
 netstat -s
 #Analyze protocol layer errors
Copy the code

Performance tuning

  • SoftIRQ lost

If SoftIRQs does not take enough time to run, data may be received abnormally, NIC buffer overflow, and data loss. The processing time of SoftIRQs needs to be increased.

#  sysctl net.core.netdev_budget 
net.core.netdev_budget = 300
Copy the code

If you use cat /proc/net/softnet_stat to see that the third column is substantially increased, SoftIRQs does not have enough Cpu processing time, you can double this time.

0aa72ed3 00000000 000006a9 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0b70c1b6 00000000 00000636 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000Copy the code
sysctl -w net.core.netdev_budget=600
Copy the code
  • Tuned

Tuned performance is a recommended performance tuning service that automatically adjusts CPU, IO, kernel, and more based on the scenario and is easy to install.

# yum -y install tuned
# service tuned start
# chkconfig tuned on
# tuned-adm list
# tuned-adm profile throughput-performance
Switching to profile 'throughput-performance' 
Copy the code
  • The backlog queue

To view the storage queue received by the NIC, use

cat /proc/net/softnet_stat
#If there is data in column 2 and it is growing, the backlog queue needs to be enlarged
#The number of rows queried is the number of CPU cores
 sysctl -w net.core.netdev_max_backlog=X
Copy the code
  • Adjust the RX and TX buffer queues
ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:             18811
RX Mini:        0
RX Jumbo:       0
TX:             2560
Current hardware settings:
RX:             9709
RX Mini:        0
RX Jumbo:       0
TX:             170
Copy the code

According to the above data, RX has a size of about 18K and only 9K shape is used, while TX has a size of 2K and only 170 pieces are used. You can change the RX and TX sizes

 ethtool -G eth0 rx 20490  tx 8192
 #The modification is persisted in /sbin/ifup-local
Copy the code
  • Adjust the transmission queue length

The default transmission queue is 1000. You can run the IP -s link command to query the transmission queue. If packet loss occurs, you can add a queue

# ip link set dev em1 txqueuelen 2000
# ip link
#To make persistent changes, restart the machine and modify the /sbin/ifup-local file 
Copy the code
  • TCP timestamp

Timestamp option The sender places a timestamp value in each message segment. The recipient returns this value in acknowledgement, allowing the sender to calculate the RTT for each ACK received (we must say “ACK received” instead of “segment received” because TCP typically acknowledges multiple segments with a single ACK). We mentioned that many current implementations calculate an RTT for each window value, which is correct for Windows containing eight message segments. However, larger window sizes require better RTT calculations. So we need to make sure that the TCP timestamp is enabled, which most systems do by default.

#  sysctl net.ipv4.tcp_timestamps 
#  sysctl -w net.ipv4.tcp_timestamps=1
Copy the code
  • tcp sack

It enables the receiver to tell the sender which segments are lost, which segments are retransmitted, and which segments have been received in advance. From this information TCP can retransmit only those segments that are truly lost. Therefore, if your network performance is particularly good and there is little packet loss, turning off sack can improve TCP performance.

 sysctl -w net.ipv4.tcp_sack=0
Copy the code
  • TCP window scaling

It is a high-performance extension of TCP. So we need to make sure it’s open

# sysctl net.ipv4.tcp_window_scaling
net.ipv4.tcp_window_scaling = 1
Copy the code
  • Tuning tcp_RMEm (Socker memory)

There are three default values that can be adjusted: minimum, default, and maximum. The default maximum is generally 4MB,

# sysctl net.ipv4.tcp_rmem
4096 87380 4194304
#Sysctl -w net.ipv4.tcp_rmem= "16384 349520 16777216"
# sysctl net.core.rmem_max
4194304
# sysctl -w net.core.rmem_max=16777216
Copy the code
  • tcp listen backlog

Listeners are only server calls, which do two things. (1) When a socket is created, it is assumed to be an active socket, that is, it is a client socket that will call CONNECT to initiate a connection. The listen function converts an unconnected socket to a passive socket, indicating that the kernel should accept connection requests to that socket. Calling LISTEN converts the socket from the CLOSED state to listen. (2) Backlog specifies the maximum number of connections the kernel should queue for the corresponding socket; The kernel maintains two queues for a given listening socket, the sum of which does not exceed backlog:If the application is slow or has a large number of connections, you need to increase the backlog.

# sysctl net.core.somaxconn
net.core.somaxconn = 128
# sysctl -w net.core.somaxconn=2048
net.core.somaxconn = 2048
# sysctl net.core.somaxconn
net.core.somaxconn = 2048
Copy the code
  • Time_wait optimization

After a TCP connection is established, the party that actively closes the connection enters the TIME_WAIT state. When the client closes the connection, it sends the last ACK and enters the TIME_WAIT state. After that, the client stays in the CLOSED state for another 2 MSL. This is an example of a client actively closing a connection. Why is TCP/IP designed this way? This is mainly due to (1) reliably implementing the termination of TCP full-duplex connections and (2) allowing old duplicate partitions to disappear in the network. During the four-way handshake to close the TCP connection, the final ACK is sent by the end that actively closes the connection (end A). If the ACK is lost, the other end (end B) will resend the final ACK. Therefore, end A must maintain the status information (TIME_WAIT) to allow it to resend the final ACK. If end A does not maintain TIME_WAIT but is in the CLOSED state, end A responds with an RST segment, which is interpreted as an error when received by end B. Therefore, to terminate A TCP full-duplex connection normally, you must handle the loss of any of the four segments during the termination process, and end A that actively closes the connection must maintain the TIME_WAIT state.

#View the statistics of all connection status in the current system
# netstat -n|awk '/^tcp/{++S[$NF]}END{for (key in S) print key,S[key]}'
Copy the code

If your system has a lot of TIME_WAIT, you need to adjust the kernel parameters

#Enable SYN Cookies. When SYN overflow occurs, cookies are enabled to prevent a small number of SYN attacks. The default value is 0, indicating that the SYN wait queue is disabled.
net.ipv4.tcp_syncookies = 1
#Enable reuse. Allow time-Wait Sockets to be re-used for new TCP connections. Default is 0, indicating closure.
net.ipv4.tcp_tw_reuse = 1
#Enable fast recovery of time-wait Sockets from TCP connections. The default value is 0, indicating that the fast recovery is disabled.
net.ipv4.tcp_tw_recycle = 1
#Change the default TIMEOUT time 10
net.ipv4.tcp_fin_timeout = 10
#The range of ports used to connect outward. The default value is small: 32768 to 61000, changed to 1024 to 65000.
net.ipv4.ip_local_port_range = 1024 65000 
#The length of the SYN queue, which defaults to 1024 and increases to 8192, can accommodate more network connections waiting for connections
net.ipv4.tcp_max_syn_backlog = 8192 
#Indicates the frequency at which TCP sends keepalive messages when Keepalive is enabled. The default value is 2 hours. The default value is 20 minutes.
net.ipv4.tcp_keepalive_time = 1200 
Copy the code

conclusion

TCP tuning is a complex thing, and there is no one-size-fits-all solution for all scenarios, so this is just an analysis of the tuning of various scenarios. Some optimization references are given below. Edit /etc/sysctl.conf for gigabit network:

net.ipv4.tcp_mem= 98304 131072 196608
net.ipv4.tcp_window_scaling=1
net.core.wmem_default = 65536
net.core.rmem_default = 65536
net.core.wmem_max=8388608
Copy the code

Sysctl -p does not limit bandwidth for overloaded configurations:

net.ipv4.tcp_window_scaling=1 net.ipv4.tcp_timestamps=0 net.ipv4.tcp_sack=0 net.ipv4.tcp_rmem=10000000 10000000 10000000  net.ipv4.tcp_wmem=10000000 10000000 10000000 net.ipv4.tcp_mem=10000000 10000000 10000000 net.core.rmem_max=524287 net.core.wmem_max=524287 net.core.rmem_default=524287 net.core.wmem_default=524287 net.core.optmem_max=524287 net.core.netdev_max_backlog=300000Copy the code