Wechat official account: Operation and maintenance development story, author: Teacher Xia

An overview of the

The Nginx Ingress Controller implements the Kubernetes Ingress API based on Nginx. Nginx is recognized as a high-performance gateway, but it cannot take full advantage of its high performance without some parameter tuning. Nginx Ingress

Tuning kernel parameters

Let’s start by looking at which kernel parameters can improve Ingress performance. Ensure maximum performance of Ingress in high concurrency environment.

Query the size of the connection queue

The maximum value of the TCP full-connection queue depends on the minimum value between the somaxconn and backlog, i.e., min(somaxconn, backlog). In a high concurrency environment, if the queue size is too small, the queue overflow may occur and the connection cannot be established. To increase the connection queue for Nginx Ingress, you only need to adjust the somaxconn kernel parameter, but I want to share with you the principle behind this. Nginx does not read somaxconn when listening to sockets, but has its own parameter configuration. There is also a backlog parameter that can be set for the location of the listen port in nginx.conf, which determines the size of the connection queue for the nginx Listen port.

server { listen 80 backlog=1024; .Copy the code

Backlog is the size of the backlog in the listen(int sockfd, int backlog) function. Nginx defaults to 511, which can be set by modifying the configuration file. By default, the Go standard library reads somaxCONN directly as the queue size when LISTENING. That is, even if your Somaxconn configuration is high, the maximum connection queue on the port that Nginx listens to is only 511, which can cause overflows in high concurrency scenarios. So in Nginx Ingress, the Nginx Ingress Controller automatically reads the somaxconn value as a backlog argument and writes it to the generated nginx.conf: Github.com/kubernetes/… That is, the connection queue size of the Nginx Ingress only depends on the size of the somaxconn. This value defaults to 4096 in the Nginx Ingress and is recommended to be 65535 for the Nginx Ingress:

sysctl -w net.core.somaxconn=65535

Copy the code

Example Expand the range of the source port

According to the introduction and Tuning of TCP three-way handshake and Four-way Wave in Linux, we know that clients can occupy ports. High concurrency scenarios can cause the Nginx Ingress to use a large number of source ports to connect to upstream. The source port range is adjusted in the kernel parameter net.ipv4.ip_local_port_range. If the port range is too small in a high concurrency environment, source ports may be used up and some connections may be abnormal. Nginx Ingress creates the Pod source port range from 32768 to 60999 by default. You are advised to expand the range to 1024 to 65535:

sysctl -w net.ipv4.ip_local_port_range="1024 65535"

Copy the code

TIME_WAIT

According to the introduction and Tuning of TCP three-way handshake and Four-way Wave in Linux, we know that clients can occupy ports. When the number of connections in the TIME_WAIT state is large in NetNS, the source port will be occupied for a long time. The default TIME_WAIT time for connections is 2MSL. If the number of connections in this state exceeds a certain amount, new connections may not be created. It is recommended that Nginx Ingress enable TIME_WAIT multiplexing, that is, allow TIME_WAIT connections to be reused for new TCP connections:

sysctl -w net.ipv4.tcp_tw_reuse=1

Copy the code

Reduce the time of net.ipv4.tcp_fin_timeout and the time of net.netfilter.nf_conntrack_tcp_timeout_time_wait. Let the system release the resources they occupy as quickly as possible.

sysctl -w net.ipv4.tcp_fin_timeout=15
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30

Copy the code

Increase the number of connections in TIME_WAIT state

Nginx must pay attention to this value because it protects your system against service failures once all ports are occupied. Tcp_max_tw_buckets can help you reduce the probability of this happening and gain more time to recover. If only more than 60,000 ports are available, set the following parameters:

sysctl -w net.ipv4.tcp_max_tw_buckets = 55000

Copy the code

To increase the maximum number of file handles

As a reverse proxy, Nginx establishes one connection per request to the client and one connection to the upstream server, so theoretically Nginx can handle up to half of the system’s maximum number of file handles. The maximum number of file handles is controlled by the kernel parameter fs.file-max. The default value is 838860.

sysctl -w fs.file-max=1048576

Copy the code

Configuration of the sample

Add initContainers to the Pod of Nginx Ingress Controller to set the kernel parameters:

initContainers:
      - name: setsysctl
        image: busybox
        securityContext:
          privileged: true
        command:
        - sh
        - -c
        - |
          sysctl -w net.core.somaxconn=65535
          sysctl -w net.ipv4.ip_local_port_range="1024 65535"
          sysctl -w net.ipv4.tcp_max_tw_buckets = 55000
          sysctl -w net.ipv4.tcp_tw_reuse=1
          sysctl -w fs.file-max=1048576
          sysctl -w net.ipv4.tcp_fin_timeout=15
          sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30

Copy the code

Application layer configuration tuning

In addition to kernel parameters that need to be tuned, some configurations of Nginx itself also need to be tuned. Let’s take a closer look.

The maximum number of Keepalive connection requests was increased. Procedure

The Keepalive_requests directive sets the maximum number of requests that can be served on a keep-alive connection. When the maximum number of requests is reached, the connection is closed. The default is 100. After a Keep Alive connection is established, nginx sets a counter for the connection to count the number of client requests that have been received and processed on the keep Alive connection. If the maximum value of this parameter is reached, nginx will forcibly close the long connection, forcing the client to re-establish a new long connection.

A quick explanation: when QPS=10000, clients send 10000 requests per second (usually establishing multiple long connections), and each connection can only run a maximum of 100 requests, which means that on average 100 long connections are closed by Nginx every second. It also means that in order to maintain QPS, the client has to create 100 new connections every second. As a result, there will be a large number of TIME_WAIT socket connections (even though keep Alive is in effect between the client and Nginx). Therefore, in high QPS scenarios, it is necessary to increase this parameter to avoid large numbers of connections being generated and discarded, reducing TIME_WAIT.

If it is an Intranet Ingress, the QPS of a single client may be large. For example, if the QPS reaches 10000, Nginx may frequently disconnect keepalive connections with the client, resulting in a large number of TIME_WAIT connections. We should try to avoid creating a large number of TIME_WAIT connections. Therefore, it is recommended that this high concurrency scenario should increase the maximum number of keepalive connections between Nginx and clients. In Nginx Ingress configuration corresponding to keep alive – requests, can be set to 10000, reference: kubernetes. Making. IO/Ingress – ngi… Similarly, Nginx and configuration of upstream keepalive connection request number is upstream – keepalive – requests, reference: kubernetes. Making. IO/ingress – ngi…

However, it is not necessary to set this parameter. If you set it to a higher value, it may cause unbalanced load because Nginx keeps a keepalive connection with upstream for too long. This will result in less connection scheduling and “hardened” connections, resulting in unbalanced traffic.

The maximum number of idle keepalive connections was increased. Procedure

“Nginx has a keepalive configuration for upstream, which is not a Keepalive timeout or the maximum number of Keepalive connections, but the maximum number of keepalive connections available.” When this number is breached, the least recently used connections are closed.

A quick explanation: There is an HTTP service that receives the request as an upstream server with a response time of 100 milliseconds. Achieving 10000 QPS performance requires approximately 1000 HTTP connections between the Nginx and upstream server. Nginx pools connections for this purpose, allocates a connection to each request as it comes in, and reclaims the connection into the pool at the end of the request, changing its state to Idle. (A) Assume that the keepalive parameter of the upstream server is set to A small value, such as the common 10.a. assume that the request and response are uniform and smooth, then the 1000 connections will be used by subsequent requests immediately after being put back into the connection pool, and the idle threads in the thread pool will be very few and close to zero. The number of connections will not be repeatedly oscillated. B. The request and response in the display are not stable, we look at the connection in 10 milliseconds (note the scenario is 1000 threads +100 milliseconds response time, 10000 requests completed per second), we assume that the response is always stable, but the request is not stable. The first 10 ms has 50, and the second 10 ms has 150:

  1. In the next 10 milliseconds, there are 100 end of connection requests to reclaim connections to the connection pool, but assume that the requests are not uniform at this point and instead of the expected 100 requests coming in in 10 milliseconds, there are only 50 requests. Note that the pool has reclaimed 100 connections and allocated 50 connections, so there are 50 free connections in the pool.

  2. Then note the keepalive=10 setting, which means a maximum of 10 free connections can be kept in the connection pool. So nginx had to close 40 of the 50 free connections, leaving only 10.

  3. In the next 10 milliseconds, 150 requests come in, and 100 requests end the task and release the connection. 150-100 = 50, 50 connections are empty, minus the 10 free connections from the previous connection pool, nginx had to create 40 new connections to meet the requirement.

C. Similarly, if the corresponding imbalance is assumed, the above connection number fluctuation will also occur.

The default value is 32. In high concurrency scenarios, a large number of requests and connections are generated. In the real world, requests are not evenly distributed, and some connections may be temporarily idle. Causes a TIME_WAIT spike. In high concurrency scenarios can be transferred to the 1000, reference: kubernetes. Making. IO/ingress – ngi…

Gateway timeout

Ingress nginx establishes a TCP connection and communicates with upstream pod. This involves 3 timeout configurations that we tune accordingly. Proxy-connect-timeout (” proxy-connect-timeout “) specifies the connection timeout between nginx and upstream. Ingress nginx (default: 5s) Like three seconds. Reference: kubernetes. Making. IO/ingress – ngi…

Proxy-read-timeout sets the timeout (s) for reading upstream (s) while the ingress nginx default is 60 seconds (s). We reduced read/write timeouts between the gateway and upstream pod to 3s after taking P99.99 for all normal requests, allowing Nginx to block abnormal requests in a timely way. Reference: kubernetes. Making. IO/ingress – ngi…

Kubernetes. Making. IO/ingress – ngi…

Increase the maximum number of connections of a worker

Max-worker-connections Controls the maximum number of connections that each worker process can open. The default value is 16384. Suggestions raised in high concurrency environment, such as setting up to 65536, this allows nginx has the ability to handle more connections, reference: kubernetes. Making. IO/ingress – ngi…

Optimize retry mechanism

“Nginx provides a default retry mechanism for upstream requests. By default, when the upstream service returns an error or times out, nginx automatically retries the request without limit. Since both access layer nginx and ingress Nginx are nGINx in nature, both layers enable the default retry mechanism, resulting in a large number of retries on abnormal requests and, at worst, cluster gateway avalanches. Access layer Nginx solves this problem together: Access layer nginx must use proxy_next_upstream_tries to limit the retry times, while ingress nginx uses proxy-next-upstream=”off” to disable the default retry mechanism. Reference: kubernetes. Making. IO/ingress – ngi…

Enable Brotli compression

Reference: kubernetes. Making. IO/ingress – ngi…

Compression is the universal method of exchanging time for space. Use CPU time to buy a lot of network bandwidth and increase throughput. Brotli is a compression method developed by Google and released in 2015. Our common compression algorithm is Gzip (also used by ingress-Nginx by default), and brotli is said to be 20% to 30% more compressed than Gzip. The default compression algorithm is gzip and the compression level is 1. To enable Brotli, set the following parameters:

  • Enable-brotli: true or false, whether to enable brotli compression algorithm

  • Brotli-level: indicates the compression level. The value ranges from 1 to 11. The default value is 4.

  • Brotli-types: MIME types that are instantly compressed by brotli

Configuration of the sample

The Nginx Ingress Controller will watch and be automatically reconfigured using configMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-ingress-controller
data:
  keep-alive-requests: "10000"
  upstream-keepalive-connections: "200"
  max-worker-connections: "65536"
  proxy-connect-timeout: "3"
  proxy-read-timeout: "3"
  proxy-send-timeout: "3"
  proxy-next-upstream: "off"
  enable-brotli: "true"
  brotli-level: "6"
  brotli-types: "text/xml image/svg+xml application/x-font-ttf image/vnd.microsoft.icon application/x-font-opentype application/json font/eot application/vnd.ms-fontobject application/javascript font/otf application/xml application/xhtml+xml text/javascript application/x-javascript text/plain application/x-font-truetype application/xml+rss image/x-icon font/opentype text/css image/x-win-bitmap"

Copy the code

The resources

  • Optimize nginx ingress – controller concurrent performance: cloud.tencent.com/developer/a…

  • Nginx Ingress configuration reference: kubernetes. Making. IO/Ingress – ngi…

  • Tuning NGINX for Performance: www.nginx.com/blog/tuning…

  • Ngx_http_upstream_module: nginx.org/en/docs/htt…