Author: Tom brother wechat public account: Micro technology
Recently, we want to build a GateWay. Since the inter-system request invocation is based on the gRPC framework, the GateWay’s first responsibility is to receive and forward gRPC requests. The system architecture is as follows:
Simply take a look. It doesn’t matter if you don’t understand the architecture diagram because of the customized business background. Later, I will separately analyze and explain the core technology points inside
Why introduce a gateway? Request link more than a hop, performance loss does not say, once down on all play over!
But that’s the way it is. You can’t just do what you want!
Sometimes technical solutions go around a big circle to solve an inescapable factor. This factor can be many:
-
It may be technical requirements, such as monitoring statistics, need to add an interception layer somewhere in the upper layer, data collection, unified processing
-
It may be that the technical implementation meets great challenges, at least the current technical team research and development strength can not solve this problem
-
It may be context-session related, with a task triggering multiple requests, but always completing all processing on a single machine
-
It could be policy, you have to go through that for data security.
The gateway introduced in this paper is for security reasons. Due to the security restrictions of some companies, external services cannot directly access the internal computing nodes of the company. Therefore, a front gateway needs to be introduced to be responsible for reverse proxy, request routing and forwarding, data communication, call monitoring, etc.
1. Abstract problems and technology selection
The above business structure may be more complex, do not understand the business background of students are easy to be confused. So let’s simplify a little bit, abstract out a specific problem to solve, simplify the description.
The process is divided into three steps:
1, the client initiates a gPRC call (based on HTTP2), the request to the gRPC gateway
2. Upon receiving the request, the gateway queries the mapping relationship of the target server from the Redis cache according to the parameter identifier agreed in the request
3. Finally, the gateway forwards the request to the target server to obtain the response result and return the data in the original way.
GRPC must use HTTP/2 to transmit data, supports plaintext and TLS encryption of data, and supports streaming data interaction. Take advantage of the multiplexing and streaming features of HTTP/2 connections.
Technology selection
1. We planned to use Netty to do it at the earliest, but because the PROto template of gRPC was not defined by us, the parsing cost was very high, and in addition, the data in the request Header had to be read, which made the development difficult, so this was an alternative.
2. Another way to change thinking is to look for the reverse proxy framework and go back to the mainstream Nginx line. However, Nginx is developed in C language, and if the request is forwarded based on the regular load balancing strategy, there is no big problem. However, our internal dependence on task resource relationships also indirectly determines our dependence on external storage systems.
Nginx was suitable for static content and a static Web server, but we valued its high performance, so we chose Openresty
OpenResty® is a high performance Web platform based on Nginx and Lua, with a large number of excellent Lua libraries, third-party modules, and most of the dependencies. It is used to easily build dynamic Web applications, Web services and dynamic gateways that can handle extremely high concurrency and scalability.
2. Openresty code SHOW
http { include mime.types; default_type application/octet-stream; access_log logs/access.log main; sendfile on; keepalive_timeout 120; client_max_body_size 3000M; server { listen 8091 http2; location / { set $target_url '' ; access_by_lua_block{ local headers = ngx.req.get_headers(0) local jobid= headers["jobid"] local redis = require "Redis" local red = redis:new() red:set_timeouts(1000) -- 1 SEC local OK, err = red:connect("156.9.1.2", 6379) local res, err = red:get(jobid) ngx.var.target_url = res } grpc_pass grpc://$target_url; }}}Copy the code
3, performance pressure measurement
1, Client machine, during the pressure test, observe the network connection:
Conclusion:
In concurrent load testing scenarios, requests are forwarded to three gateway servers, and only a few TCP connections are in TIME_WAIT state on each server. It can be seen that this section of connection can achieve the connection multiplexing effect.
2, gRPC gateway machine, during the pressure test, observe the network connection:
There are a large number of requested connections in TIME_WAIT state. The port number can be divided into two categories: 6379 and 40928
[root@tf-gw-64bd9f775c-qvpcx nginx]# netstat -na | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
LISTEN 2
ESTABLISHED 6
TIME_WAIT 27500
Copy the code
Using the Linux shell statistics command, 27500 TCP connections on the 172.16.66.46 server are in TIME_WAIT state
[root@tf-gw-64bd9f775c-qvpcx nginx]# netstat -na | grep 6379 |awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
ESTABLISHED 1
TIME_WAIT 13701
Copy the code
There are 13,701 connections in the TIME_WAIT state to Redis (redis access port 6379)
[root@tf-gw-64bd9f775c-qvpcx nginx]# netstat -na | grep 40928 |awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
ESTABLISHED 2
TIME_WAIT 13671
Copy the code
There are 13,671 connections connected to the back-end Server target Server and in TIME_WAIT state. The number of connections is roughly the same because Redis is queried for each forward request.
Summary of conclusions:
1. The client sends a request to the gateway. At present, the long connection has been maintained, meeting the requirements.
GRPC gateway is connected to Redis cache server, currently is a short connection, each request to create a connection, performance overhead is too large. Need to be optimized separately
3, gRPC gateway forward requests to the target server, is currently a short connection, used up that is abandoned, did not play long Http2.0 connection advantages. Need to be optimized separately
4, what is TIME_WAIT
Check TCP connections in TIME_WAIT state:
netstat -anpt | grep TIME_WAIT | wc -l
Copy the code
We all know that TCP is three handshakes and four waves. What’s the process of waving?
Close () is called. The protocol layer sends a FIN packet. The terminator enters the FIN_WAIT_1 state
2. After the passively closed party receives the FIN packet, the protocol layer replies ACK. Then the passively closed party enters the CLOSE_WAIT state, and the actively closed party waits for the other party to close, then enters the FIN_WAIT_2 state. At this point, the active closing party waits for the application of the passive closing party, calling **close()** operation
3, the passively closed party calls close() after all data has been sent; In this case, the protocol layer sends a FIN packet to the actively closed party and waits for the ACK of the other party. The passively closed party enters the LAST_ACK state.
4. When the party that initiatively closes receives the FIN packet, the protocol layer replies ACK; In this case, close the connection and enter the TIME_WAIT state. The party that is passively CLOSED enters the CLOSED state
5. The party that waits for the Maximum Segment Lifetime (2MSL) ends TIME_WAIT and enters the CLOSED state
How long is 2MSL? I don’t know, one minute, two minutes, four minutes, and 30 seconds. This may vary from distribution to distribution. 60 seconds on Centos 7.6.1810 kernel version 3.10.
Here’s a big picture of the TCP state machine:
Why must there be a TIME_WAIT?
Although both parties have agreed to close the connection, and the four handshake messages have been coordinated and sent, the connection can be directly transferred to the CLOSED state. However, the network is unreliable. The initiator cannot ensure that the last ACK packet sent by the initiator is received by the initiator. For example, the ACK packet is lost or arrives late. Therefore, the TIME_WAIT state is used to resend ACK packets that may be lost.
Simply put, the reason why TIME_WAIT is 2MSL long is to avoid the unreliable TCP transmission caused by network packet loss or network delay. This TIME_WAIT state can maximize the reliability of network transmission.
Note: A connection cannot be reused until it is in the CLOSED state!
How do I optimize the problem that TIME_WAIT is too much
1. Adjust system kernel parameters
Net.ipv4. tcp_Syncookies = 1 Indicates that SYN Cookies are enabled. When SYN waiting queue overflow occurs, cookies are enabled to defend against a small number of SYN attacks. The default value is 0, indicating that cookies are disabled. Net.ipv4. tcp_TW_reuse = 1: reuse is enabled. Allows reuse of time-wait Sockets for new TCP connections. Default: 0; Net.ipv4. tcp_tw_RECYCLE = 1 Indicates that fast recycling of time-wait sockets is enabled in TCP connections. The default value is 0, indicating that fast recycling of TIME-wait sockets is disabled. Tcp_fin_timeout = Change the default TIMEOUT time. Net.ipv4. tcp_max_tw_buckets = 5000 indicates that the system keeps the maximum number of TIME_WAIT sockets at the same time (default is 18000). When the number of TIME_WAIT connections reaches the specified value, all TIME_WAIT connections are immediately cleared and a warning message is displayed. However, this rough cleaning up of all connections means that some connections are not successfully waiting for 2MSL, which can cause communication exceptions. Net.ipv4. tcp_timestamps = 1(default: 1) Timestamp in socket Connect requests of the same host with the same source IP address must be incrementable within 60 seconds. When the server turns on tcp_TW_RECCycle, it checks for the timestamp, and if the timestamp of the packet sent by the server is not bouncing or lagging, then the server will not return the packet. Many companies now use LVS for load balancing, usually one LVS in front and several back-end servers behind. This is NAT, when the request arrives at the LVS, it is after the modification of address data is forwarded to the backend server, but will not change timestamp data, for the backend server, request the source address is the address of the LVS, and port will reuse, so from the perspective of the backend server, originally different client requests through the LVS forwarding, Could be considered the same connection, coupled with a different client time may not be consistent, so there will be a timestamp disorder phenomenon, so the back of the packet is discarded, the specific performance is usually is the client sends a SYN, obviously but the server is not response ACK, can with the following command to confirm the data packets are discarded, So use other optimizations as appropriate: Net.ipv4. ip_local_port_range = 1024 65535, which increases the range of ports available to the system to establish connections. The problem with this setting is that the system will randomly assign ports from 1025 to 65535 for connections. Net.ipv4. ip_LOCAL_reserved_ports = 7005,8001-8100 We can set this parameter to tell the system which ports should be reserved for us and not be used for automatic allocation.Copy the code
2. Optimize short connections to long connections
Short connection working mode: Connect -> Transfer data -> Close connection
Persistent connection mode: Connect -> Transfer Data -> Maintain Connection -> Transfer Data ->… -> Close the connection
5. Access Redis short connection optimization
In high concurrency programming, it is necessary to use connection pooling technology to change short link into long link. Create connection, send and receive data, send and receive data… Disconnect the connection, so we can reduce a lot of time to create and disconnect the connection. It’s definitely better than short connections in terms of performance
In OpenResty, the set_keepalive function can be set to support persistent connections.
The set_keepalive function takes two arguments:
-
The first parameter: the maximum idle time of the connection
-
The second parameter: connection pool size
Local res, err = red:get(jobid) // Set the pool size to 40 and the maximum idle time to 10 seconds red:set_keepalive(10000, 40)Copy the code
After the Reload NGINx configuration, retest the load
Conclusion: The connection number of Redis is basically controlled within 40.
For other parameter Settings, see:
Github.com/openresty/l…
6, access the target Server machine short connection optimization
Nginx provides oneupstream
Module to control load balancing and content distribution. The following load algorithms are provided:
-
Polling (default). Each request is allocated to a different back-end server one by one in chronological order. If the back-end server goes down, the request is automatically deleted.
-
Weight. Specifies the polling probability, weight proportional to access ratio, in case of uneven backend server performance.
-
Ip_hash. Each request is assigned according to the hash result of the access IP, so that each visitor accesses a fixed back-end server, which can solve the session problem.
-
Fair (third party). Requests are allocated based on the response time of the back-end server, and those with short response times are allocated first.
-
Url_hash (third party). Requests are allocated based on the hash result of the accessed URL, so that each URL is directed to the same back-end server. This is effective when the back-end server is cached.
Because upstream provides the Keepalive function, the maximum number of idle keepalive connections to upstream servers that are kept in the cache of each worker process can be reused, thus reducing the performance overhead of frequent TCP connection creation and destruction.
Disadvantages:
Upstream (upstream) does not support dynamic change. While our target address is dynamically changing, we dynamically query routes in real time according to business rules. To solve this dynamic problem, we introduce Balancer_by_lua_block from OpenResty.
Extend upstream functionality by writing Lua scripts.
Upstream (nginx.conf) dynamically obtains the IP address and Port of the destination and forwards the request. The core code is as follows:
upstream grpcservers {
balancer_by_lua_block{
local balancer = require "ngx.balancer"
local host = ngx.var.target_ip
local port = ngx.var.target_port
local ok, err = balancer.set_current_peer(host, port)
if not ok then
ngx.log(ngx.ERR, "failed to set the current peer: ", err)
return ngx.exit(500)
end
}
keepalive 40;
}
Copy the code
After modifying the configuration, restart Nginx and continue the pressure test. Observe the result:
TCP connections are generally in the ESTABLISHED state. The TIME_WAIT state before the optimization is almost gone.
[root@tf-gw-64bd9f775c-qvpcx nginx]# netstat -na | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
LISTEN 2
ESTABLISHED 86
TIME_WAIT 242
Copy the code
Write in the last
This paper is mainly to solve the gRPC request forwarding problem, build a gateway system, technology selection OpenResty, not only retain the high performance of Nginx but also OpenResty dynamic easy to expand. Then for the preparation of LUA code, performance testing, constantly adjust and optimize, to solve the TCP connection of each link interval can be reused.
Recommended reading
How many pieces of data can a B+ tree store in mysql?
Learn these 10 design principles and be an architect!!
How to design the Redis cache for the 100mb system??
[High concurrency, High performance, high availability] system design experience
Everyone is an architect?? It’s not easy!!
[10,000 level concurrent] How to design e-commerce inventory deduction? Not oversold!