Abstract: teach you how to design a second kill system architecture: from the e-commerce system architecture to the second kill system, from the high concurrency “black technology” and zhishengqi recruit to server hardware optimization, omni-dimensional master second kill system architecture!

This article is shared from The Huawei cloud community “Practice proves true Knowledge: Decryption of the strongest seckill System architecture in the whole network, Not all seckill is seckill!!” , author: Ice River.

E-commerce system architecture

In the field of e-commerce, there are typical sSEC scenarios. What is sSEC scenarios? To put it simply, the number of people buying a product is far greater than the inventory of the product, and the product will be snapped up in a very short time. For example, the annual June 18 and November 11 promotion, xiaomi new product promotion and other business scenes are typical business scenes.

We can simplify the architecture of the e-commerce system as shown in the following figure.

As shown in the figure, we can simply divide the core layer of the e-commerce system into load balancing layer, application layer and persistence layer. Next, we estimate the concurrency for each layer.

  • If the load balancing layer uses high-performance Nginx, we can estimate the maximum concurrency of Nginx to be 10W+, which is expressed in tens of thousands.

  • Assume that Tomcat is used in the application layer, and the maximum concurrency of Tomcat can be estimated to be about 800, which is expressed in hundreds.

  • Assuming that the cache of the persistence layer uses Redis and the database uses MySQL, the maximum concurrency of MySQL can be estimated to be about 1000, in thousands. The maximum concurrency of Redis can be estimated to be about 5W, in 10,000 units.

So, the concurrency of the load balancing layer, the application layer and the persistence layer are different. So, what can we usually do to improve the overall concurrency and cache of the system?

(1) Expand the system

System capacity expansion includes vertical capacity expansion and horizontal capacity expansion, adding devices and machine configurations, and is effective in most scenarios.

(2) Cache

Local cache or centralized cache, reduce network IO, read data based on memory. Most scenarios work.

(3) Read and write separation

Use read and write separation, divide and conquer, increase the parallel processing capability of the machine.

Seckill system features

For the second kill system, we can expound some characteristics of its own existence from two angles of business and technology.

Service characteristics of the seckill system

Here, we can use 12306.cn for example, the annual Spring Festival travel rush, 12306.cn’s page view is very large, but the site’s usual page view is relatively gentle, that is to say, the annual Spring Festival travel rush, 12306.cn’s page view will appear instantaneous phenomenon.

For example, The Millet second kill system, at 10 in the morning to sell goods, 10 o ‘clock before the page view is relatively gentle, 10 o ‘clock will also appear concurrent instantaneous increase phenomenon.

Therefore, the traffic and concurrency of a seckill system can be represented in the following figure.

It can be seen from the figure that the concurrent amount of seckill system has the characteristic of instantaneous convex peak, which is also called flow spike phenomenon.

We can summarize the features of a seckill system as follows.

(1) Time limit, limit and price limit

Within a specified time; The number of goods in the activity is limited; The price of the item will be much lower than the original price, that is, the item will be sold for much lower than the original price in the split-kill activity.

For example, the time of seckill activity is limited to 10 am to 10:30 am on a certain day, the number of goods is only 100,000, and the price of goods is very low, such as: 1 yuan purchase business scenarios.

Time limits, limits, and price limits can exist separately or in combination.

(2) Activity preheating

Activities need to be configured in advance; Before the activity starts, users can view the relevant information of the activity; Before the start of the second kill activity, vigorously promote the activity.

(3) Short duration

The number of people buying is huge; The merchandise will sell out quickly.

In the system traffic presentation, there will be a spike phenomenon, at this time the concurrent traffic is very high, most of the second kill scenarios, goods will be sold out in a very short time.

The technical characteristics of seckill system

We can summarize the technical characteristics of seckill system as follows.

(1) The instantaneous concurrency is very high

Lots of users snap up goods at the same time; The instantaneous concurrency peak is very high.

(2) Read more and write less

The page view of goods in the system is huge; The quantity of goods available for purchase is very small; The number of queries and visits to inventory is far greater than the number of goods purchased.

Traffic limiting measures are often added to product pages. For example, verification codes were added to the product pages of the early sSEC system to smooth the front-end access traffic to the system. In the recent sSEC product details page, users are prompted to log in to the system when opening the page. These are some measures to limit the access to the system.

(3) Simple process

The business process of seckill system is generally simple; Generally speaking, the business process of seckill system can be summarized as: order inventory reduction.

Kill three stages

Generally, there are three stages from the beginning to the end of the seckill:

  • Preparation phase: This phase is also called the system warm-up phase. During this phase, the service data of the system is preheated in advance. During this phase, users constantly refresh the page to check whether the seckill activity has started. Some of the data can be stored in Redis to warm up, in part, as the user refreshes the page.

  • Second kill phase: This phase is mainly the process of second kill activities, which will produce instantaneous high concurrent flow and cause a huge impact on system resources. Therefore, system protection must be done in the second kill phase.

  • Settlement stage: finish the data processing work after the second kill, such as data consistency problem processing, abnormal situation processing, goods back warehouse processing, etc.

For such a system with heavy traffic in a short period of time, it is not suitable for system expansion, because even if the system is expanded, that is, the system will be used for a short period of time, most of the time, the system can be accessed normally without expansion. So, what can we do to improve the performance of the system?

Second kill system scheme

According to the characteristics of seckill system, we can take the following measures to improve the performance of the system.

(1) Asynchronous decoupling

The whole process is disassembled and the core process is controlled by queue.

(2) current limit and brush prevention

Control the overall website traffic and raise the threshold of requests to avoid system resource exhaustion.

(3) Resource control

Control the resource scheduling in the whole process, and make full use of strengths and circumvent weaknesses.

Because the application layer can handle much less concurrency than the cache. So, in a high-concurrency system, we can use OpenResty directly to access the cache from the load balancing layer, avoiding the performance cost of calling the application layer. You can go to openresty.org/cn/ to learn about Ope…

If the concurrency is too high at the beginning of the seckill activity, we can put the user’s request into a queue for processing, and pop up the queuing page for the user.

Note: the picture is from Meizu

Seckill system sequence diagram

Many seckill systems and solutions to seckill systems on the Internet are not real seckill systems, they use only synchronous processing of requests, once the concurrency really up, their so-called seckill system performance will dramatically decline. Let’s take a look at the sequence diagram of the seckill system when placing orders synchronously.

Synchronous ordering process

1. The user initiates a seckill request

In the synchronous ordering process, first, the user initiates a seckill request. The mall service performs the following operations in sequence to process the sSEC request.

(1) Identify whether the verification code is correct

Mall service determines whether the verification code submitted by the user when initiating the seckill request is correct.

(2) Judge whether the activity has ended

Verify that the current seckill activity has ended.

(3) Verify whether the access request is in the blacklist

In the field of e-commerce, there is a lot of malicious competition, that is to say, other businesses may maliciously request seckill system through improper means, occupying a large amount of bandwidth and other system resources of the system. At this point, it is necessary to use risk control system to achieve blacklist mechanism. For simplicity, a blacklist mechanism can also be implemented using interceptors to count access frequency.

(4) Verify whether the real inventory is sufficient

The system needs to verify whether the real inventory of goods is enough and whether it can support the inventory of goods in this second kill activity.

(5) Deduct the inventory in the cache

In the second kill service, information such as commodity inventory is stored in the cache. In this case, it is necessary to verify whether the commodity inventory used by the second kill activity is sufficient and deduct the quantity of commodity inventory from the second kill activity.

(6) Calculate the price of the second kill

In the second kill activity, the second kill price of the commodity is different from the real price of the commodity, so the second kill price of the commodity needs to be calculated.

Note: In a seckill scenario, if the system involves more complex services, more operations will be involved. Here, I just list some common operations.

2. Submit the order

(1) Order entry

Save the order information submitted by the user to the database.

(2) Deducting real inventory

After the order is put into storage, the quantity of goods placed in this successful order shall be deducted from the real inventory of goods.

If we use the above process to develop a seckill system, when the user initiates a seckill request, the overall performance of the system will not be too high because each business process of the system is executed in serial. When the concurrency is too high, we will pop up the following queuing page for the user to prompt the user to wait.

Note: the picture is from Meizu

The queue could be 15 seconds, 30 seconds, or even longer. There is a problem: the connection between the client and the server is not released between the time the user initiates the seckill request and the time the server returns the result, which can take up a lot of resources on the server.

A lot of online articles on how to achieve the second kill system are adopted in this way, so, this way can do the second kill system? The answer is yes, but the amount of concurrency supported in this way is not too high. At this point, some users may ask: our company is doing this second kill system ah! After the line has been using, no problem ah! What I want to say is: it is possible to make a second kill system with synchronous ordering, but the performance of synchronous ordering is not too high. The reason why your company adopts the synchronous ordering method to make the seckill system does not have a big problem is that the concurrency of your seckill system does not reach a certain level, that is to say, the concurrency of your seckill system is not high.

Therefore, many so-called second kill system, there are second kill business, but can not be called the real second kill system, the reason is that they use a synchronous order process, limit the concurrent flow of the system. The reason why there are no big problems after the launch is that the concurrency of the system is not high enough to overwhelm the whole system.

If 12306, Taobao, Tmall, JINGdong, Millet and other large mall seconds kill system is so to play, then, their system will be played to death sooner or later, their system engineers are not fired to blame! Therefore, in the second kill system, this kind of synchronous processing of the order of the business process is not desirable.

This is the whole process of synchronous ordering, but if the ordering process is more complex, it will involve more business operations.

Asynchronous ordering process

1. The user initiates a seckill request

After a user initiates a seckill request, the mall service goes through the following process.

(1) Check whether the verification code is correct

When a user initiates a seckill request, the verification code is sent together with the request. The system checks whether the verification code is valid and correct.

(2) Whether to limit the current

The system determines whether to limit the flow of a user’s request. In this case, we can determine the length of the message queue. Because we put the user’s request in the message queue, and the message queue is stacked with the user’s request, we can determine whether to limit the user’s request according to the number of pending requests in the current message queue.

For example, in a kill activity, we sell 1000 items and there are 1000 requests in the message queue. If there are still subsequent kill requests, we can no longer process the subsequent requests and directly return to the user a message indicating that the item is sold out.

Therefore, by using traffic limiting, we can process users’ requests and release connected resources more quickly.

(3) Send MQ

After the user’s seckill request passes the validation, we can send the user’s request parameters and other information to MQ for asynchronous processing, and at the same time, respond to the user with the result information. In the mall service, there are dedicated asynchronous task processing modules that consume requests in the message queue and process subsequent asynchronous processes.

When a user initiates a seckill request, the asynchronous order process processes fewer business operations than the synchronous order process. It sends the subsequent operations to the asynchronous processing module via MQ for processing, and quickly returns the response result to the user, releasing the request connection.

2. Asynchronous processing

We can asynchronously process the following operations of the order process.

(1) Judge whether the activity has ended

(2) Determine whether the request is on the system blacklist. In order to prevent the malicious competition of peers in the field of e-commerce, blacklist mechanism can be added to the system to put the malicious request into the system blacklist. You can do this by using interceptors to count access frequency.

(3) Subtracts the inventory quantity of split-kill items in the cache.

(4) Generate a seckill Token, which is bound to the current user and the current seckill activity. Only the request that generates a seckill Token is eligible for seckill activity.

Here we introduce the asynchronous processing mechanism, in the asynchronous processing, the system can control how many resources, how many threads to deal with the corresponding task.

3. Short polling query results

In this case, the client can use a short poll to check whether the second kill qualification is obtained. For example, the client can poll the request server every 3 seconds to check whether the request server has obtained a seckill Token. In this case, the processing on the server is to determine whether the current user has a seckill Token. If the server generates a seckill Token for the current user, the current user has a seckill Token. Otherwise, the query continues until time out or the server returns a message that the item is sold out or there is no second kill qualification.

When using short polling to query the seconds kill result, we can also prompt the user to queue processing on the page, but at this time, the client will poll the server to query the status of the seconds kill qualification every few seconds. Compared with the synchronous ordering process, there is no need to occupy the request connection for a long time.

At this point, there may be some netizens will ask: using the way of short polling query, there will be no query until the timeout whether there is a second kill qualification state? The answer is: maybe! Let’s think about the real scene of The event. In essence, businesses participate in the event not to make money, but to improve the sales and popularity of the product, and attract more users to buy their products. Therefore, we do not have to guarantee that users can query 100% of the status of seckill eligibility.

4. Second kill settlement

(1) Verify the order Token

When the client submits the seckill settlement, it will submit the seckill Token to the server together, and the mall service will verify whether the current seckill Token is valid.

(2) Add to the shopping cart

After verifying that the Token is valid and valid, the mall service will add the products to the shopping cart.

5. Submit your order

(1) Order warehousing

Save the order information submitted by the user to the database.

(2) Delete the Token

After the order is successfully stored, the Token is deleted.

Here’s a question for you to consider: Why do we use asynchronous processing only in the pink part of the asynchronous ordering process, and not in the rest of the asynchronous peak clipping and valley filling?

This is because in the design of asynchronous ordering process, whether in product design or interface design, we carry out the flow limiting operation on the user’s request at the stage when the user initiates the second kill request. It can be said that the flow limiting operation of the system is very pre-emptive. When the user initiates the seckill request, the traffic limit is carried out, and the peak traffic of the system has been smoothed out. After that, the concurrency of the system and the system traffic are not very high.

Therefore, a lot of articles and posts on the Internet in the introduction of the second kill system, saying that the use of asynchronous peak clipping to carry out some current limiting operations when placing an order, that is bullshit! Because the single order operation belongs to the later operation in the whole process of the second kill system, the flow limiting operation must be processed in advance, and it is useless to do the flow limiting operation in the process after the second kill business.

High concurrency “black science and technology” and winning strange move

Let’s say we use Redis for caching in a seckill system, and let’s say the number of concurrent reads and writes in Redis is around 50,000. Our mall seckill business needs to support about 1 million concurrent requests. If all of these 1 million concurrent messages are sent to Redis, Redis is likely to fail. So, how do we solve this problem? Let’s explore this question.

In highly concurrent seckill systems, if Redis is used to cache data, the concurrency capability of the Redis cache is key, because many prefix operations need access to Redis. While asynchronous peak clipping is just a basic operation, the key is to ensure the concurrent processing capability of Redis.

The key idea to solve this problem is: divide and conquer, divide and open up commodity inventory.

Foster a

When we store the inventory quantity of instant kill items in Redis, we can “split” the inventory of instant kill items to improve the read and write concurrency of Redis.

For example, the id of the original second kill product is 10001, and the inventory is 1000 pieces, and the storage in Redis is (10001, 1000). We divide the original inventory into 5 pieces, and each piece of inventory is 200 pieces. At this time, the information we store in Redia is (10001_0, 200). (10001_1, 200), (10001_2, 200), (10001_3, 200), (10001_4, 200).

At this point, we will inventory after segmentation, each division of inventory to use commodity id plus a digital id to store, so, in the store inventory of each Key in the Hash arithmetic, the Hash results are different, which means, has a great probability of store inventory Key is not in the same slot Redis, This improves the performance and concurrency of Redis requests.

After splitting the inventory, we also need to store a mapping relation between the id of the commodity and the Key after splitting the inventory in Redis. The Key of the mapping relation is the ID of the commodity, namely 10001, and the Value is the Key to store the inventory information after splitting the inventory, namely 10001_0, 10001_1, 10001_2. 10001 _3, 10001 _4. In Redis we can use a List to store these values.

In the real processing of inventory information, we can first query all the keys corresponding to the split inventory from Redis, and at the same time use AtomicLong to record the current number of requests, and use the number of requests to perform modular calculation on the length of all the keys corresponding to the split inventory from Redia. The result is 0,1,2,3,4. Then concatenate the item ID in front to get the actual inventory cache Key. At this point, you can directly go to Redis to obtain the corresponding inventory information according to this Key.

Substitute stealthily

In a high-concurrency business scenario, we can access the cache directly from the load balancing layer using the Lua script library (OpenResty).

Here, let’s consider a scenario: in the second kill business scenario, the second kill product is snapped up in an instant. At this point, when users launch seconds kill request again, if the system by the load balance layer of each service request application layer, again by the various application layer service access cache and the database, in fact, nature has no meaning, because the goods had been sold out, and then through the system of application layer by layer check already do not have much meaning!! The concurrent visits of the application layer are in hundreds, which will reduce the concurrency of the system to a certain extent.

In order to solve this problem, at this time, we can take out the user ID, commodity ID and SEC activity ID carried by the user when sending requests in the load balancing layer of the system, and directly access the inventory information in the cache through Lua script and other technologies. If the inventory of a kill item is less than or equal to 0, the user is directly returned with a message indicating that the item is sold out, without layer upon layer verification of the application layer. For this architecture, we can refer to the architecture diagram of the e-commerce system in this article (the first diagram at the beginning of the text).

Redis power kill system

We can design a Hash data structure in Redis to support the deduction of goods inventory, as shown below.

seckill:goodsStock:${goodsId}{
	totalCount:200,
	initStatus:0,
	seckillCount:0
}
Copy the code

In the Hash data structure we designed, there are three very major properties.

  • TotalCount: Indicates the total number of items that participated in the seckill. We need to load this value into the Redis cache before the seckill begins.

  • InitStatus: We design this value as a Boolean value. If the value is 0 before the second kill starts, it indicates that the second kill has not started. You can perform scheduled tasks or background operations to change the value to 1.

  • SeckillCount: indicates the number of goods to be killed in seconds. During the killing process, the upper limit of this value is totalCount. When this value reaches totalCount, the killing is complete.

We can use the following code snippet to participate in the cache of the commodity data load during the kill warm-up phase.

/** * @author binghe * @description */ public class SeckillCacheBuilder{private static final String GOODS_CACHE = "seckill:goodsStock:"; private String getCacheKey(String id) { return GOODS_CACHE.concat(id); } public void prepare(String id, int totalCount) { String key = getCacheKey(id); Map<String, Integer> goods = new HashMap<>(); goods.put("totalCount", totalCount); goods.put("initStatus", 0); goods.put("seckillCount", 0); redisTemplate.opsForHash().putAll(key, goods); }}Copy the code

When seckillCount starts, we need to determine in our code whether the seckillCount value in the cache is less than the totalCount value. If seckillCount is less than the totalCount value, we can lock the inventory. In our program, these two steps are not atomic. If we operate the Redis cache simultaneously across multiple machines in a distributed environment, synchronization problems can occur, leading to serious “oversold” consequences.

In the field of e-commerce, there is a technical term called “oversold”. As the name implies: “oversold” means that the quantity of goods sold is more than the quantity of goods in stock, which is a very serious problem in the field of e-commerce. So how do we solve the oversold problem?

Lua script solves the oversold problem perfectly

How do we solve the synchronization problem when multiple machines are working on Redis at the same time? A good solution is to use Lua scripts. We can use Lua scripts to encapsulate the inventory destocking operation in Redis into an atomic operation, which ensures atomicity and solves synchronization problems in high-concurrency environments.

For example, we could write the following Lua script to perform inventory deduction in Redis.

local resultFlag = "0" 
local n = tonumber(ARGV[1]) 
local key = KEYS[1] 
local goodsInfo = redis.call("HMGET",key,"totalCount","seckillCount") 
local total = tonumber(goodsInfo[1]) 
local alloc = tonumber(goodsInfo[2]) 
if not total then 
    return resultFlag 
end 
if total >= alloc + n  then 
    local ret = redis.call("HINCRBY",key,"seckillCount",n) 
    return tostring(ret) 
end 
return resultFlag
Copy the code

We can use the following Java code to invoke the above Lua script.

public int secKill(String id, int number) { 
    String key = getCacheKey(id); 
    Object seckillCount =  redisTemplate.execute(script, Arrays.asList(key), String.valueOf(number)); 
    return Integer.valueOf(seckillCount.toString()); 
}
Copy the code

In this way, we can ensure the atomicity of the operation when executing the kill activity, thus effectively avoiding the data synchronization problem, and thus effectively solving the “oversold” problem.

In order to deal with the business scenario of high concurrency and large traffic of seckill system, in addition to the business architecture of seckill system itself, we need to further optimize the performance of server hardware. Next, we will take a look at how to optimize the performance of server.

Optimizing server performance

The operating system

Here, I use CentOS 8, we can enter the following command to check the operating system version.

CentOS Linux release 8.0.1905 (Core) 
Copy the code

For the high concurrency scenario, we mainly optimize the network performance of the operating system, and there are many parameters about network protocol in the operating system. The optimization of the server network performance is mainly to optimize these system parameters, so as to improve the application access performance.

System parameters

In the CentOS operating system, you can run the following command to view all system parameters.

/sbin/sysctl -a
Copy the code

Part of the output is as follows.

There are so many parameters, more than a thousand, that it is impossible to tune all the parameters of the operating system in a high concurrency scenario. We focus more on parameters related to the network. To obtain network-related parameters, we first need to obtain the types of operating system parameters. To obtain the types of operating system parameters, run the following command.

/sbin/sysctl -a|awk -F "." '{print $1}'|sort -k1|uniq
Copy the code

The following information is displayed.

abi
crypto
debug
dev
fs
kernel
net
sunrpc
user
vm
Copy the code

Net types are the net-related operating system parameters that we care about. We can get subtypes under net type, as shown below.

/sbin/sysctl -a|grep "^net."|awk -F "[.| ]" '{print $2}'|sort -k1|uniq
Copy the code

The following information is displayed:

bridge
core
ipv4
ipv6
netfilter
nf_conntrack_max
unix
Copy the code

In Linux, these network-related parameters can be modified in the /etc/sysctl.conf file. If these parameters do not exist in the /etc/sysctl.conf file, you can add them to the /etc/sysctl.conf file by yourself.

Among the subtypes of the NET type, the subtypes we need to focus on are core and ipv4.

Optimize the socket buffer

If the server’s network socket buffer is too small, the application will have to read and write more than once to process the data, which will greatly affect the performance of our program. If the network socket buffer is set large enough, the performance of our program can be improved to a certain extent.

We can get information about the server socket buffer by typing the following command on the server command line.

/sbin/sysctl -a|grep "^net."|grep "[r|w|_]mem[_| ]"
Copy the code

The following information is displayed:

net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.core.wmem_default = 212992
net.core.wmem_max = 212992
net.ipv4.tcp_mem = 43545        58062   87090
net.ipv4.tcp_rmem = 4096        87380   6291456
net.ipv4.tcp_wmem = 4096        16384   4194304
net.ipv4.udp_mem = 87093        116125  174186
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_wmem_min = 4096
Copy the code

Where, the keywords with Max, default, and min represent the maximum value, default value, and minimum value respectively. The words with meM, RMEM, and WMEM are total memory, receive buffer memory, and send buffer memory respectively.

Note here that the units for both rMEM and wMEM keywords are “bytes,” while the units for meM keywords are “pages.” A “page” is the smallest unit of managed memory for an operating system. On Linux, the default size of a page is 4KB.

How to optimize frequent sending and receiving of large files

How can we optimize the performance of the server if we need to send and receive large files frequently in high concurrency scenarios?

Here, we can modify the system parameters as shown below.

net.core.rmem_default
net.core.rmem_max
net.core.wmem_default
net.core.wmem_max
net.ipv4.tcp_mem
net.ipv4.tcp_rmem
net.ipv4.tcp_wmem
Copy the code

Here, let’s assume that the system can allocate up to 2GB of MEMORY for TCP, minimum 256MB, and pressure 1.5GB. If a page is 4KB, the minimum value, pressure value and maximum value of tcp_mem are 65536, 393216 and 524288, respectively, in “pages”.

If the average file packet size is 512KB, each socket read/write buffer can hold a minimum of 2 packets, a default of 4 packets, and a maximum of 10 packets. The minimum value, default value, and maximum value of tcp_rmem and tcp_wmem are 1048576, 2097152, and 5242880, respectively, in bytes. Rmem_default and wmem_default are 2097152, rmem_max and wmem_max are 5242880.

Note: More on how these values are calculated below ~~

Also note here: if the buffer is larger than 65535, you also need to set the net.ipv4.tcp_window_scaling parameter to 1.

After the above analysis, we finally got the system tuning parameters as shown below.

net.core.rmem_default = 2097152
net.core.rmem_max = 5242880
net.core.wmem_default = 2097152
net.core.wmem_max = 5242880
net.ipv4.tcp_mem = 65536  393216  524288
net.ipv4.tcp_rmem = 1048576  2097152  5242880
net.ipv4.tcp_wmem = 1048576  2097152  5242880
Copy the code

Optimizing TCP Connections

Those who have a certain understanding of computer networks know that TCP connections need to go through the “three-way handshake” and “four-way wave”, but also through a series of technical support such as slow start, sliding window, sticky packet algorithm to support reliable transmission. While these ensure the reliability of TCP, they sometimes affect the performance of our programs.

So how do we optimize TCP connections in high concurrency scenarios?

(1) Close the sticky packet algorithm

If the user is sensitive to the request time, we need to add the tcp_nodelay parameter to the TCP socket to turn off the sticky packet algorithm so that the packet can be sent immediately. In this case, we can also set net.ipv4.tcp_syncookies to 1.

(2) Avoid frequent creation and reclamation of connection resources

The creation and reclaiming of network connections is very performance intensive, and we can optimize server performance by closing idle connections and reusing allocated connection resources. We are not unfamiliar with the reuse of allocated connection resources, such as: thread pool, database connection pool is reuse of threads and database connections.

We can use the following parameters to close the server’s idle connections and reuse the allocated connection resources.

net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time=1800
Copy the code

(3) Avoid sending data packets repeatedly

TCP supports the timeout retransmission mechanism. If the sender sends data packets to the receiver but does not receive any feedback, the TCP timeout retransmission mechanism is triggered if the specified interval is reached. To prevent successful packets from being sent again, we need to set the net.ipv4.tcp_sack parameter of the server to 1.

(4) Increase the number of server file descriptors

In Linux, a network connection also occupies a file descriptor, and the more connections, the more file descriptors it occupies. If the file descriptor is set too small, it can also affect the performance of our server. At this point, we need to increase the number of server file descriptors.

For example, if fs.file-max = 10240000, the server can open a maximum of 10240000 files.

Click to follow, the first time to learn about Huawei cloud fresh technology ~