This is the 8th day of my participation in the August More text Challenge. For details, see: August More Text Challenge.
Function of connection pool
Connection pooling is the process of allocating a batch of connections in advance and recycling them into a buffer, creating a pooling effect. Take seckill as an example. The service concurrency of seckill interface is very high. What will happen if connection pooling is not used?
When introducing KV storage, I mentioned that seckill systems use Redis to cache activity information. If the seckill interface service has only one connection to Redis and the average request to Redis takes 10ms, how many requests per second? That’s right, a connection can only handle up to 100 requests per second.
In the seckill system, in addition to the activity information, the seckill inventory information is also cached in Redis, and it needs to support more than 10,000 concurrent capabilities. The active information plus the coarse information in the database is distributed to 50 seconds kill interface nodes, and the average request from each node to Redis may exceed 300 QPS, far more than the processing capacity of a single connection.
This brings us to the first problem: a single connection cannot handle high concurrency.
You might say, since reuse of a single connection can’t handle high concurrency, why not create a new connection every time you request it? It’s a great idea, but the reality is tough. To establish a connection, TCP takes three handshakes. If the network latency is 5ms, three handshakes will take 15ms, which is longer than a single request back and forth. Second, if a connection is established on every request, consider closing the connection to avoid overloading Redis with connections. Closing the connection involves four TCP waves, which is another time overhead.
So, without connection pooling, the second problem arises: each time a connection is set up and closed, the request latency increases and Redis can be overwhelmed.
In addition, if connections are established and closed frequently with high concurrency, the operating system consumes too much CPU for allocating and reclaiming system resources.
So how can we design connection pooling to solve these three problems?
In general, connection pools have several parameters: minimum number of connections, free number of connections, and maximum number of connections. Why? See below:
The minimum number of connections is used to control the minimum number of connections. If the number of connections is smaller than the minimum number, burst traffic may cause performance problems.
The number of idle connections is used to control the number of idle connections in the connection pool. If the number exceeds this value, resources are wasted and redundant connections need to be closed. If it falls below this, it may not be able to handle bursts of traffic and new idle connections need to be allocated.
The allocation of idle connections can be controlled by a timer. During the setting, it is necessary to ensure that there are enough idle connections when the Seckill service initiates requests to Redis, so as to reduce the time and resource overhead of establishing connections. Usually, independent threads periodically check whether the number of idle connections is less than a certain value, such as every second to check whether the number of idle connections is less than 2, and if so, create a batch of idle connections.
The maximum number of connections is usually used to control the number of connections in the system to not exceed the maximum number, so as not to overwhelm Redis with a large number of connections.
When the Seckill service starts, it initializes the connection pool according to the parameters in the configuration file, such as setting the initial number of connections to the minimum number of connections. When the Seckill service needs to initiate a Redis request, it will first try to obtain a connection from the connection pool. If the connection cannot be obtained, it will establish a new connection.
When the number of connections exceeds the maximum, the request blocks, waiting for other requests to return the connection. After requesting Redis, the Seckill service needs to put the connection back into the connection pool. If the number of idle connections exceeds the specified number, the Seckill service will directly close the connection.
In addition, to ensure that multiple instances in the Redis cluster are load balanced, the connections in the connection pool also need to be load balanced.
How do you get connections from the pool and put them back into the pool when you’re done?
A circular queue is usually used to hold idle connections. When in use, you can pull connections from the head of the queue and put idle connections at the end of the queue when you are done. In the Go language, another approach is to use a buffered channel as a queue. This is very simple to implement, in the actual code section I will give you a detailed introduction.
The role of the coprogram pool
Coroutine pool is simply a pooling technology realized by multiple coroutines.
If you are familiar with the Linux kernel, you should know that in the Linux kernel, resources are scheduled as a unit of process, and threads are also lightweight processes. In other words, processes and threads are created and scheduled by the kernel. A coroutine is a task execution unit created by the application, such as the “goroutine” coroutine in the Go language. Coroutines themselves run on threads, are scheduled by the application itself, and are a lighter unit of execution than threads.
What advantages do coroutines have over processes and threads? It starts with the performance of processes and threads with high concurrency.
When an application needs to create a process or thread, it first calls a system function to submit the request to the kernel. It then switches from user mode to kernel mode, where the kernel creates processes and threads, and then from kernel mode to user mode.
In the process of switching back and forth between user mode and kernel mode, the operating system needs to save a lot of context information. Specifically, when switching from user mode to kernel mode, the kernel needs to write the data in the CPU registers to the stack memory; When the program is restored from kernel mode to user mode, the kernel is required to load the previously saved register data from stack memory into the CPU.
When the kernel creates processes and threads, it allocates Processing Control blocks (PCBS), which are used to save the running state of processes and threads. That is, processes and threads occupy memory space in the kernel. In addition, additional CPU resources are consumed during PCB allocation.
In particular, when a parent creates a child process, the child process inherits the memory state of the parent process. When a child process modifies data in a chunk of memory, it triggers copy-on-write, and the system allocates new memory space for the child process, which means additional CPU overhead for the child process at run time. In addition, each thread has its own stack space. For example, the default stack size under Linux is 8MB, which means that even if the thread does nothing, it will waste 8MB of memory.
Also, as mentioned earlier, state switching occurs when creating processes and threads, mainly because tasks in the kernel are run as kernel threads. Moreover, kernel threads can be created dynamically and can exceed the number of CPU threads. However, a CPU thread can only run one kernel thread at a time, which involves context switching between kernel threads.
Context switches are CPU intensive. First, the CPU takes more time to save or load data during context switches, resulting in lower effective CPU usage. Second, after the context switch, new data needs to be loaded, which may lead to data invalidity in THE L1 and L2 caches of the CPU and performance degradation.
In high concurrency scenarios, each creation or destruction of a thread brings a large amount of CPU and memory overhead. To solve these problems, coroutines were born.
In Go, a coroutine starts with 2KB of memory, much smaller than threads and processes. Coroutines are created and destroyed in user mode, and there is no switch between user mode and kernel mode. In addition, coroutines are called entirely by the application in user mode, with no kernel-mode context switch involved. Coroutine switching is fast because there is no thread state to deal with and little context to save.
Since coroutine creation, switching, and destruction performance is already very high, why do coroutine pool?
As mentioned earlier, each coroutine has an initial memory size of 2K. If a coroutine is created for each request, when the seckill service reaches 100,000 concurrent single-server requests, it will lead to nearly 200MB of memory allocation and usage, which may lead to memory collection problems by GC.
In addition, although coroutine creation is fast, it still takes time, such as memory allocation, initialization of the state of the coroutine, and so on. In a high-concurrency scenario such as seckill, adding even 1 millisecond latency per request can impose significant CPU overhead on the service.
So how do we design the coroutine pool?
In Go language, there are two ways to implement coroutine pool: preemption and scheduling.
In a preemptive coroutine pool, all tasks are stored in a shared channel, and the tasks in the channel are consumed by multiple coroutines at the same time. The advantage of this is that the logic of sending tasks can be implemented very simply. You can just take the task and put it directly into the shared channel. The disadvantage is that multiple coroutines consuming a channel can involve lock contention, and a single channel can also cause capacity problems when coroutines perform time-consuming tasks.
In a scheduling coroutine pool, each coroutine has its own channel, and each coroutine consumes only its own channel. When the task is delivered, the load balancing algorithm can be used to select the appropriate coroutine to execute the task. Such as choosing the coroutine with the fewest tasks in the queue, or simply polling.