My original
Current limiting concept
purpose
- Protect the system from overloading by limiting the speed of concurrent/requests.
- Damage the service, not not the service.
- When the load is high, protect core services or services
Current limiting way
There are many ways to limit traffic:
- QPS: Limits the number of requests per second
- Concurrency: Avoid resource exhaustion caused by enabling too many threads
- Connection number: limits the number of TCP connections
The following flow limits refer to QPS limits, and do not involve the number of concurrent connections and connections.
Current limiting rules
Traffic limiting policies can be static or dynamic based on service characteristics.
Rules include:
- Current limit algorithm
- Parameter, traffic limiting threshold
- Corresponding strategies, etc. (processing methods, etc.)
Static rules
Traffic limiting Threshold The server performs a performance pressure test on the entire server link in advance to learn about service characteristics, request API QPS, and estimate the water levels of the entire service cluster and downstream (middleware such as downstream services and related storage). The traffic limiting threshold of the server cluster is based on the pressure test data.
Relevant indexes to be obtained by pressure measurement
- Service: For computing services, you need to consider the system resource consumption caused by the computing of the service itself
- Caches and databases: Services that need to access storage need to consider the load on the database
- Message queues: Although generally message queues can be used for peak shaving purposes, some message queues are used for other purposes such as business transactions and need to be considered for load
- Others: All services or middleware involved in downstream links should take their loads into account
Need to consider:
- Different thresholds for different apis
- Service threshold for the entire link: If a link requests 10 services, the threshold calculation takes into account the traffic quota of the 10 services for the service
- Different traffic limiting policies have different traffic limiting thresholds
- Theoretically, every change in the business logic of API implementation, as well as the increase and decrease of API must be re-tested, relatively troublesome
- When estimating the threshold, consider the security watermark of the entire service. If the threshold is too high, it is prone to overload; if the threshold is too low, it wastes resources
Dynamic rules
Two implementations of dynamic rules:
- Adaptive: The service dynamically changes traffic limiting rules based on RT, load, QPS, and other indicators
- External notification: Notifies the service of updates by calling the service API or polls the rule server itself
The traffic limiting threshold is dynamically calculated based on the response time or the load on the service and downstream on the monitoring system and service governance center
Service traffic limiting vs client traffic limiting vs gateway traffic limiting
In most cases, lower bound flows occur on the server side, because in many cases the number of clients is uncertain. But sometimes this can be done on the client side, or simultaneously on the server side, to prevent overuse of the service by a single client.
Traffic limiting is generally recommended on the server
Traffic limiting on the server
Before processing a request, the service must perform traffic limiting calculations to prevent system overload. In addition, different traffic limiting policies must be provided for clients of different services. When a service reaches the traffic limiting threshold, other services cannot request services.
Advantages of traffic limiting on the server
- Better control of the overall service load: The traffic limiting threshold on the server does not change with the increase or decrease of the number of clients
- You can implement traffic limiting policies with different thresholds for different upstream services. You can configure different traffic limiting quotas for different callers or assign different tags to different services and limit traffic based on the tags.
disadvantages
- If the server is limited only for QPS, regardless of the number of connections: There is also some resource consumption while the service is being connected, and these pressures can often become bottlenecks. Especially for short links, the continuous chain building process will produce a large amount of resource consumption
- If the server also limits the number of connections: It is difficult to allocate quotas for different links or services. It is easy to cause too many connections for one service or service and restrict other services
Client traffic limiting
When a client invokes downstream services, the traffic limiting quota of each service cluster is used to protect downstream services from overload.
advantages
- When the threshold is reached, no request is made to the server, avoiding extra resource consumption on the server, such as establishing connections
disadvantages
- If the number of clients increases or decreases, you need to recalculate the traffic limiting threshold for each client
- Client traffic limiting may have bugs or client load balancing may tilt, resulting in loss limiting
- Different service apis have different traffic limiting thresholds: If there are many downstream services and different apis of each service have different traffic limiting quotas, traffic limiting on clients is complicated
The gateway current-limiting
The request requests the server through the gateway, in the gateway for different services and different apis for traffic limiting.
advantages
- The gateway can effectively protect the load of the entire cluster. When the number of servers increases or decreases, the gateway can adjust the threshold accordingly
- Configure different traffic limiting quotas and policies for different upstream services
disadvantages
- Gateway resources required
- The gateway itself is highly available
Current limiting grain size
The granularity of current limiting can be divided into:
- Service: implements uniform traffic limiting policies for all service apis
- API: Each API will have different request links, so there will be different traffic limiting policies (threshold, etc.)
- API parameters: Many times we want to be able to restrict the most frequently accessed Top K data in a particular hotspot. For example, in scenarios such as seckill and rush, do not restrict traffic caused by frequent access of one commodity so that other commodities cannot be accessed.
Service granularity
A service provides a uniform policy for limiting traffic. The advantage is that it is very simple, but it is very easy to limit the loss and cannot protect the service itself and downstream.
Such as: API1 query is very complex. The database security level can only provide TPS of 10/s, while for API2, the database can provide TPS of 1000/s. In this case, if traffic limiting is performed according to the service granularity, The traffic limiting threshold of 10/s QPS is provided. So it’s very unreasonable.
API granularity
Different apis implement different traffic limiting policies. This approach is more complex, but more reasonable, and can protect services well. There are several scenarios to consider:
- Add or subtract apis, then adjust the traffic limiting policy accordingly
- API implementation changes: Request processing implementation changes may require re-adjustment of the traffic limiting threshold to avoid overload of the service itself or downstream services due to additional business logic.
In most cases, API granularity traffic limiting should be implemented to better protect the service itself, downstream services and middleware of the service, and achieve better traffic limiting effect.
The processing mode of current limiting
If the threshold of traffic limit is reached, you can handle the situation as follows:
- Error code: The error code can be used up, BUSY, or backoff
- Wait and retry: Be careful not to exceed the request timeout
Current limit algorithm
Fixed window algorithm Fixed window
By maintaining a count per unit of time, the count is incremented by one each time a request passes, and when the count exceeds the threshold, the flow limiting process is entered. If the unit time has expired, the counter is cleared to start the next count.
The disadvantage of this method is that the window is fixed, and there will be the problem of burst traffic at the boundary of two Windows. Of course, depending on the size of the window, this problem can be ignored if it is small enough
The time window is 1s, and the limit threshold of 1s is 4. If a malicious user sends three requests in the last 500 milliseconds of 1s and three requests in the first 500 milliseconds of the next second, it actually sends six requests in one second, which exceeds the traffic limit.
Sliding window algorithm
The time window is refined and divided into N small Windows. The window takes the small window as the smallest sliding unit. In this way, burr phenomenon can be avoided between the two time Windows. (Of course there are still burrs between small Windows)
Token Bucket algorithm Token Bucket
Token bucket algorithm is one of the most commonly used algorithms in Traffic Shaping and Rate Limiting.
There are three stages
- Token generation: The token is added to the bucket at an r/s rate. If the number of tokens in the bucket reaches the bucket capacity B, the token is discarded
- Get a token: A token is obtained (removed) from the bucket before the request is processed, and will not be processed until the token is obtained
- Unable to get tokens: When there are not enough tokens in the bucket, the request enters the flow limiting process (block wait or return failure directly, the wait should be timed out)
Characteristics of token bucket
- The most common single-machine traffic limiting algorithm, many open source components, easy to use
- Token bucket can handle sudden traffic in a short period of time. For example, if the bucket capacity is 10, the rate is 2/s, and the request QPS is 2/s, then the bucket contains eight tokens to handle sudden traffic.
Leaky Bucket algorithm Leaky Bucket
- A: A bucket that pours water in. There is a hole in the bottom of the bucket that drains water out at a constant rate. If the water comes in faster than the water drains out, the bucket may fill up and overflow
- B on the right: Similar to A on the left, the requests are put into a bucket and taken out of the bucket at a fixed rate for processing. When the bucket is full, the requests will be blocked or denied service
Characteristics of leaky bucket
- Does not handle sudden traffic well: for example, the bucket capacity is 10 and the request processing rate is 2/s; At a rate of 2/s, 8 burst requests can be placed in the bucket and the rest can be waited or discarded.
- Buckets can be either a normal queue or a priority queue: priority queues are necessary
- The timeout period for requests placed in the bucket should be calculated. If the request is placed in the bucket before it is processed, it is meaningless and blocks other requests.
The difference between a token bucket and a token bucket is that the token bucket is put into the bucket at a fixed rate, the request comes, the token is removed, the bucket is empty, the request is blocked; A leaky bucket is when requests come in and you put them in the bucket, you take them out at a constant rate, the bucket is full, the requests are blocked. Token buckets allow burst traffic, while leaky buckets are not.
Dynamic traffic limiting algorithm
The previous algorithms all limit at a constant rate (the token bucket puts tokens into the bucket at a fixed rate, and the fixed and sliding window limit thresholds are fixed). The biggest problem with this is that each time the business logic changes, the stress test needs to be re-run to calculate the latest threshold. In THE TCP congestion control algorithm, the traffic sending rate is dependent on the network environment. In fact, the traffic limiting of a service can also be dynamically adjusted based on indicators such as the latency or load of requests processed by the service.
Slow start during warm-up period
Similar to the slow start of TCP, define a threshold for the start of TCP and gradually increase the threshold within the defined warmup period until the threshold reaches the defined normal value at the end of the period. This is ideal for scenarios where the service itself or the storage on which it depends needs to be warmed up.
Consider Guava
For example, take the token bucket as an example, the limit threshold is 4/s at the beginning, the warm-up time is 3/s, and the normal limit threshold is 7/s. Then, within 3 seconds, increase the limit threshold by 1 per second to reach the threshold of 7/s.
Based on response time
In this way, the traffic limiting rules are adjusted in real time according to the request response time, which is relatively reasonable. Smooth adjustment increases the threshold for fast response, decreases the threshold for slow response, and defines a maximum security threshold.
The key is how to quantify fast and slow:
- The maximum threshold is estimated based on the safe water level of the service obtained from the pressure measurement.
- Set a conservative threshold at startup
- Compared with the response time in the previous time window, the threshold can be adjusted higher if the response time is faster than before.
Then, with the change of response time, the threshold value is constantly changing, and the range of threshold value is adjusted between [1, max_value].
Monitoring based system
If the service is CPU consuming computing type or I/O consuming storage type, it is more reasonable to be based on monitoring system.
Traffic limiting rules are dynamically changed as monitoring indicators change based on system indicators, such as CPU, load, memory, and I/O, and service indicators, such as request response time. Traffic limiting is relatively complex and can be calculated according to your own service design. For I/O indicators, the I/O weights are higher; for computing, the CPU and Load weights are higher.
Distributed current limiting
The biggest problem of single-node traffic limiting is that when service nodes are dynamically added or decreased, the traffic limiting quota of each service changes accordingly. If the number of service nodes increases without reducing the traffic limiting quota of the original node, downstream services may become overloaded.
Distributed limiting avoids this problem by limiting the flow of a resource by taking numbers like redis clusters or invoicing servers.
Redis current-limiting
Based on the single thread and atomic operation characteristics of Redis to achieve the flow limiting function, this way can achieve a simple distributed flow limiting. However, Redis itself is also prone to become a bottleneck, and both the master-slave structure and its cluster mode of Redis have the problem of primary node failure.
Scheme 1: fixed window count
The resource name + time window to limit is the precise timestamp as the Redis key, set the timeout to be slightly greater than the timestamp, and then use the atomic properties of Redis incrby to increase the count.
If the time window for limiting traffic is in seconds, then
- Redis key: resource name + Unix TIMESTAMP
- count : incrby key count
- expire 2
- Check whether count reaches the threshold
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local acquireCount = tonumber(ARGV[2])
local current = tonumber(redis.call('get', key) or "0")
if current + acquireCount > limit then
return 0
else
redis.call("incrby", key, acquireCount)
redis.call("expire", key, "2")
return 1
end
Copy the code
Problems with this scheme:
- To interact with Redis: Poor latency
- Redis, a hot resource, can easily become a bottleneck
- The primary/secondary switchover in Redis results in loss limiting effect
- The clock of the service has errors: it cannot be retrieved from Redis Lua because there are write operations in Lua and read operations with random nature cannot be used
- It is a fixed window algorithm, which is easy to generate burst traffic between Windows
Option 2: token bucket
local function acquire(key, acquireTokens, currentTimeMillSecond)
local rateLimiterInfo = redis.pcall("HMGET", key, "lastTimeMilliSecond"."availableTokens"."maxLimit"."rate")
local lastTimeMilliSecond = rateLimiterInfo[1]
local availableTokens = tonumber(rateLimiterInfo[2])
local maxLimit = tonumber(rateLimiterInfo[3])
local rate = rateLimiterInfo[4]
local currentTokens = availableTokens;
local result = -1
if (type(lastTimeMilliSecond) ~= 'boolean' and lastTimeMilliSecond ~= false and lastTimeMilliSecond ~= nil) then
local diffTime = currentTimeMillSecond - lastTimeMilliSecond
if diffTime > 0 then
local fillTokens = math.floor((diffTime / 1000) * rate)
local allTokens = fillTokens + availableTokens;
currentTokens = math.min(allTokens, maxLimit);
end
end
if (currentTokens - acquireTokens >= 0) then
result = 1
redis.pcall("HMSET", key, "lastTimeMilliSecond", currentTimeMillSecond, "availableTokens", currentTokens - acquireTokens)
end
return result
end
local key = KEYS[1]
local acquireTokens = ARGV[1]
local currentTimeMillSecond = ARGV[2]
local ret = acquire(key, acquireTokens, currentTimeMillSecond)
return ret
Copy the code
Problems with this scheme:
- To interact with Redis: Poor latency
- Redis, a hot resource, can easily become a bottleneck
- The primary/secondary switchover in Redis results in loss limiting effect
- The clock of the service has errors: it cannot be retrieved from Redis Lua because there are write operations in Lua and read operations with random nature cannot be used
Invoice server
The above redIS scheme takes Redis as an invoice server. However, due to the availability problems (master/slave switchover, etc.) and simple control rules of redIS scheme, for those requiring high availability and complex rules, they choose to develop their own server programs as the invoice server. Such as Alibaba’s open-source Sentinel
An invoice server typically consists of a number of service processes that form one or more invoice clusters. The service receives the invoice from the invoice server through RPC. If it succeeds, the invoice can be executed; otherwise, it enters the traffic limiting mechanism. To reduce the latency caused by RPC communication, you can obtain the data in batches.
Invoice rules Invoice rules (traffic limiting algorithms) can be stored in a consistency store or database, etc. The invoice server periodically updates or listens for notifications to obtain rule changes. Algorithms and thresholds can also be adjusted dynamically through other services and then notified to the invoicing server, or the invoicing server itself can calculate based on the load.
Invoice server features:
- High availability of invoice server: in cluster mode and persistent to database.
- Invoice server load balancing: when the service receives invoices from the invoice service cluster, it is necessary to pay attention to the invoice server load balancing, so as to avoid a large number of remaining invoices after receiving invoices from some invoice server
- High performance of invoice server: Because the calculation and storage of invoice server are based on memory, performance is not likely to be a bottleneck
- Invoice server consistency: Similar to the ID generator, for extremely demanding scenarios, information such as invoices from the invoice server can be persisted on a regular basis and recovered from in the event of a failure
Difficulties in microservices architecture lower limit flows
In the microservice architecture, the invocation link is very long and the invocation relationship of the service is complex. A small change of the upstream service will affect all the downstream services, and it will also have a superposition effect.
Flow control rules
In microservice architecture, fixed flow control rules are not appropriate. Fixed rules are usually calculated according to pressure measurement results. However, in microservice architecture, the change of a node on the link will lead to the failure of fixed rules. For example, if the service link is A->B->C->D and the service logic of LINK B is changed, the link may be changed. As A result, the pressure test result of the link may be inaccurate. If the previous threshold is used, the flow control may fail.
In microservice architecture, dynamic rules should be adopted so that the service ADAPTS itself or the rule server informs the service to change the traffic limiting rules
Based on the traffic limiting rule of the invoked link
When limiting traffic for downstream services, consider the traffic quota for upstream services. Otherwise, the entire downstream link may become unavailable due to the fault of an upstream service.
For example, link 1: A->B->C->D Link 2: D->B->C If A fault occurs, the traffic of call B increases, so does the traffic of call C. In this case, service D of link 2 will be affected by flow control of service B or SERVICE C.
Therefore, the quotas for link 1 and link 2 should be different for LINK B and C. If link 1 reaches the threshold, traffic limiting is performed on link 1 without affecting link 2.
Consider the call link priority
A typical microservice scenario defines its availability weights based on the business. High-weight businesses tend to prioritize their availability. In this case, on complex call links, traffic limiting is implemented based on requests of different services. When the service pressure is too high, important services are preferentially guaranteed to run.
For example, link 1: A->B->C->D Link 2: D->B->C Link 1 is more important than link 2. If the load of LINK C increases, the traffic limiting threshold of link C must be lowered, and the availability of link 2 must be sacrificed to ensure link 1.
Therefore, links need to be tagged. Service C distinguishes links based on the tags.
conclusion
- Traffic limiting algorithms tend to be relatively simple and open source solutions abound. The difficulty lies in how to choose a suitable traffic limiting algorithm.
- Both single-node lines and distributed traffic limiting have their own advantages and disadvantages. You need to select an appropriate solution based on your service features, scale, and architecture complexity.
- In microservice architecture, flow limiting is quite complex, so it is necessary to consider various scenarios that may lead to loss limiting effect.
C++ token bucket: github.com/ikenchina/r…