1. Microservice traffic limiting
With the popularity of microservices, stability between services becomes increasingly important. Cache, degrade and flow limiting are three powerful tools to protect the stability of microservice system, as well as the need to understand the whole microservice bucket.
Cache is the purpose of ascension can be processed by the system access speed and increase the system capacity, while the drop is when the service problem or affect the performance of the core process requires a temporary block, to be open again after peak or problem solving, and some of the scenes is not can use caching and relegated to solve, such as a scarce resource, the write operation of the database, frequent complex queries, Therefore, there needs to be a means to limit the number of requests for these scenarios, namely, flow limiting.
For example, when we design a function and prepare to go online, this function will consume some resources, and the processing limit is 3000 QPS per second. But what should we do if the actual situation is higher than 3000 QPS?
Therefore, the purpose of traffic limiting should be to protect the system by limiting the rate of concurrent access/requests or requests within a time window, and once the rate reaches the limit, denial of service, waiting, and degradation can be achieved.
To learn how to implement a distributed traffic limiting framework, first of all, we need to understand the two basic traffic limiting algorithms.
2. Traffic limiting algorithm
2.1 Leaky bucket algorithm
Leaky bucket algorithm idea is very simple, the water (that is, the request) first into the leaky bucket, the leaky bucket at a certain speed of water, when the inflow speed is too high, directly overflow, and then reject the request, it can be seen that the leaky bucket algorithm can force the limit of data transmission rate. Schematic diagram (source network) :
2.2 Token bucket algorithm
The token bucket algorithm works the same as the leaky bucket algorithm but in the opposite direction, which is easier to understand. Over time, the system adds tokens to the bucket at a constant 1/QPS interval (10ms if QPS=100) and stops if the bucket is full. When new requests come in, each takes a token, and blocks or denies service if there are no tokens left. Schematic diagram (source network) :
2.3 Algorithm Selection
The difference between the leaky bucket algorithm and the token bucket algorithm is that the leaky bucket algorithm can restrict the data transfer rate, while the token bucket algorithm can limit the average data transfer rate and allow some degree of emergent situation. The token bucket also has the advantage of being able to change speed easily. As soon as the rate needs to be increased, the rate at which the tokens are put into the bucket is increased as needed. Therefore, the core algorithm of the flow limiting framework is mainly based on the token bucket algorithm.
3. Local traffic limiting
Given the principle of the token bucket algorithm described above, how to implement it in code?
In order to achieve lock-free, AtomicLong, the atomic type of Long, is recommended. The advantage of AtomicLong is that it is very convenient to perform CAS plus and CAS minus operations on it (that is, to put and take tokens in the token bucket). To avoid the overhead of context switch for threads, the core CAS algorithm is as follows:
According to the token bucket algorithm we know above, the token bucket needs a ScheduledThread to put the token continuously. The code of this part is as follows:
4. Overview of distributed traffic limiting
What are the problems to be solved in distributed traffic limiting? I think there are at least the following:
1. Dynamic rules: ** For example, QPS of flow limiting can be dynamically modified. The function of flow limiting can be turned on or off at any time, and the rules of flow limiting can be dynamically changed with the business.
2.** Cluster traffic limiting: ** For example, unified traffic limiting is implemented for all instances of a service in the Spring Cloud microservice architecture to control subsequent traffic accessing the database.
3.** fusing degradation: ** For example, when a resource in the invocation link appears unstable state (such as call timeout or abnormal proportion increases), the invocation of this resource is restricted to make the request fail quickly and avoid cascading errors caused by affecting other resources.
Several other optional features, such as real-time monitoring of data, gateway flow control, hotspot parameter traffic limiting, system adaptive traffic limiting, blacklist and whitelist control, annotation support, etc., can be easily extended.
5. Distributed traffic limiting solution
The idea of distributed traffic limiting I list the following three schemes:
1. Redis token bucket
This scheme is the simplest idea of cluster limiting. In local limiting, we use Long’s atomic class as a token bucket, and when the number of instances exceeds 1, we consider Redis as a common memory area for reading and writing. When it comes to concurrency control, distributed locking can also be implemented using Redis.
The disadvantages of this scheme are obvious. Each token is fetched and the network overhead is at least millisecond, so the concurrency supported by this scheme is very limited.
2. Uniform distribution of QPS
The idea is to limit the flow of the cluster to the maximum degree of localization.
For example, we have two server instances, corresponding to the same Application (applica.name is the same), the QPS set in the program is 100, and the Application is connected to the same console program. The console side divides THE QPS according to the number of Application instances, and dynamically sets the QPS of each instance to 50. If two servers have different configurations, the load balancing layer allocates traffic based on the advantages and disadvantages of the servers. For example, 70% traffic is allocated to one server and 30% traffic is allocated to the other. In this case, the console can also implement a weighted QPS allocation strategy.
Objectively speaking, this is a cluster limiting scheme, but there are still some problems. The allocation ratio of this mode is based on the trend of big data traffic. In actual situation, it may not be strictly 50/50 or 50/37, and the error is not controllable. It is very easy to encounter an awkward situation where users encounter a request rejection when accessing a server continuously and the other server has sufficient idle traffic at the moment.
3. Invoice server
The idea behind this scheme is based on the Redis token bucket scheme. How to solve the problem that every token fetching is accompanied by a network overhead, the solution of this scheme is to establish a layer of control terminal, which is used to interact with the Redis token bucket. Only when the number of remaining tokens on the client is insufficient, the client can fetch tokens from the control layer and take a batch of tokens each time.
The idea is similar to the Java Collections framework’s array expansion, where a threshold is set and asynchronous calls are triggered only when the threshold is exceeded. The rest of the token access operation is the same as local limiting. Although this scheme still has errors, the maximum error is only the number of tokens per batch.
Open source projects
The above mentioned three implementation ideas of distributed traffic limiting schemes. Here we recommend a distributed traffic limiting project SnowJean (
Github.com/yueshutong/…). .
Through the source code of the project, the author observed that the flow limiting project has many clever points in solving the distributed flow limiting. For example, SnowJean uses the observer mode to realize dynamic rule configuration, uses the factory mode to realize the construction of the flow limiting device, and uses the builder mode to construct the flow limiting rule.
When working out how to check the health of a client instance, you use the expiration time of Redis and the heartbeat packets sent by the client (which are deferred as the heartbeat is sent). On the plus side, the project provides a QPS view presentation based on the front-end Echarts chart, as shown below.