What is current limiting?
As the name suggests, traffic restriction or flow control.
A good analogy is that old electrical switches are equipped with fuses. If someone uses a device with very high power, the fuses will blow out to protect the appliances from being burned out by a strong current.
Why use traffic restriction?
In theory, a complete external service system architecture should evaluate the system throughput and control the inlet flow based on upstream flow, flow rate, peak time point, peak QPS and load capacity of its own system at the initial stage of design.
When the traffic threshold is exceeded, the system can use service denial, queuing, or traffic diversion to ensure that the load is healthy.
If the system does not have a finite traffic policy, the system can only passively accept the sudden traffic exceeding its own load, and each sub-service in the system gradually disintegrates, and finally the whole service avalanches.
Small concept Review
QPS
QPS is a measure of the volume of traffic processed by a specific query server within a specified period of time. On the Internet, the performance of a DNS server is often measured by the query rate Per Second.
RPS
Requests Per Second refers to the number of Requests made by clients Per Second. Alibaba Cloud PTS interprets this term to mean that RPS is also called QPS in some places, which corresponds approximately to TPS (Transaction Per Second) of Loadrunner/ Jmeter without discussing “transactions” alone.
TPS
Transactions Per Second indicates the number of Transactions processed by the server Per Second. TPS typically consists of one message in and one message out, plus one user database access. (Service TPS = CAPS x Average TPS per call)
Current limiting grain size
- Cluster current-limiting
- Single current limiting
Cluster current-limiting
There are two modes of cluster traffic limiting
- The gateway layer
- The application layer
The gateway layer
Common gateway layer design, based on Nginx Lua Module to achieve the overall control. Here is a simple Lua demo.
local locks = require "resty.lock"
local function limiter(a)
-- ngx dict
local limiter = ngx.shared.limiter
-- limiter lock
local lock = locks:new("limiter_lock")
localkey = gx.var.host.. ngx.var.uri-- add lock
local elapsed, err =lock:lock("ngx_limiter:"..key)
if not elapsed then
return fail("failed to acquire the lock: ", err)
end
-- limit max value
local limit = 5
-- current value
local current =limiter:get(key)
- current limiting
if current ~= nil and current + 1> limit then
lock:unlock()
return 0
end
if current == nil then
limiter:set(key, 1.1) Initialization -
else
limiter:incr(key, 1) -- +1
end
lock:unlock()
return 1
end
ngx.print(limiter())
Copy the code
Learn about Lua-Resty-Lock: github.com/openresty/l…
Nginx.conf
http{...lua_shared_dict limiter_lock 10m;
lua_shared_dict limiter 10m;
}
Copy the code
The application layer
The application layer is usually implemented by business code, which is based on Redis counting and ensures atomicity of Redis execution through Lua Script.
local key = "limiter:". KEYS[1]
local limit = tonumber(ARGV[1])
local expire_time = tonumber(ARGV[2])
local is_exists = redis.call("EXISTS", key)
if is_exists == 1 then
if redis.call("INCR", key) > limit then
return 0
else
return 1
end
else
redis.call("SET", key, 1)
redis.call("EXPIRE", key, expire_time)
return 1
end
Copy the code
Single current limiting
I can’t fight alone, I can’t fight in a cluster. Based on local cache, simple local count traffic limiting can be implemented independently of storage middleware. From a macro perspective, as long as the gateway layer load balancing service is highly available, the traffic of each node is not very different, and only the traffic control of a single node is concerned.
The above is the classification of traffic limiting granularity. The following is the specific traffic limiting algorithm model.
Current limiting model
The above Demo is based on the simple fixed time window model to achieve current limiting. However, when the critical point occurs, the instantaneous large flow impact,
There are two commonly used model classifications:
- Time model
- The barrel model
Time model
There are two time models:
- Fixed window model
- Sliding window model
Fixed time model
The code demo of the various granularity limiting modes discussed above is this way.
Take QPS as an example. The flow limit is 1000QPS. We will divide timeline into Windows at a fixed interval, and each window has an independent counter. This is the simplest flow limiting model, but its disadvantages are obvious. When there is a large flow impact at the critical point, the flow control cannot be satisfied.
As shown in the figure (picture source network), 1000QPS concurrency occurs in both 900ms and 1100ms. Although the flow limit requirements are met in a single window, in fact, the QPS at the critical point has reached 2000 and the service is overloaded.
Sliding time model
As shown in the figure (image source network), in order to avoid the impact of critical mass flow, the sliding time model will divide each window into N sub-windows, and each sub-window counts independently. In this way, the sum of w1+ W2 counts can be used to verify the flow limiting threshold.
The barrel model
There are also two types of bucket models:
- The token bucket
- bucket
Token bucket model
The token bucket algorithm works by putting tokens into the bucket at a constant rate. If the request needs to be processed, a token needs to be fetched from the bucket first. When no token is available in the bucket, the service is denied. However, token buckets allow a degree of burst traffic, which solves the problem that traffic is often sudden in real Internet applications.
Ref: en.wikipedia.org/wiki/Token_…
As shown below (picture source network):
There are two ways to implement the algorithm:
- Ticker Defines a Ticker that continuously generates tokens and imports them into a bucket. The problem is that it consumes system resources. If you limit the flow based on a dimension, multiple buckets will be created, corresponding to multiple Tickers, and the resource consumption will be terrible.
- Inert Fill Defines an Inert Fill function. This function is called before each token fetch. If the current time is later than lastAccessTime, it calculates how many tokens can be generated during that time, adds the generated tokens to the token bucket and updates the data. In this way, you only need to evaluate once when you get the token.
Token count method in bucket
Number of tokens in bucket = number of remaining tokens + (time of token fetching this time – time of token fetching last time)/ time interval for token placing * Number of tokens placed each time
Commonly used token buckets such as github.com/juju/rateli… 2K Star
Multiple ways to populate tokens:
func NewBucket(fillInterval time.Duration, capacity int64) *Bucket
Copy the code
The default token bucket. FillInterval adds a token to the bucket for a long time. Capacity is the capacity of the bucket.
func NewBucketWithQuantum(fillInterval time.Duration, capacity, quantum int64) *Bucket
Copy the code
As the default, the only difference is that the number of tokens populated at a time is Quantum instead of 1.
func NewBucketWithRate(rate float64, capacity int64) *Bucket
Copy the code
Fill the number of tokens per second according to the scale defined by the user. For example, if capacity is 100 and rate is 0.1, 10 tokens will be filled every second.
Multiple ways to collect a token:
func (tb *Bucket) Take(count int64) time.Duration {}
func (tb *Bucket) TakeAvailable(count int64) int64 {}
func (tb *Bucket) TakeMaxDuration(count int64, maxWait time.Duration) (
time.Duration, bool.) {}
func (tb *Bucket) Wait(count int64) {}
func (tb *Bucket) WaitMaxDuration(count int64, maxWait time.Duration) bool {}
Copy the code
Below, I simply implemented a minimalist token bucket with a default rate of QPS.
TokenBucket Demo
Struct structure
// object
type TbConfig struct {
QPS int64 // Limit QPS LLDB 200
MaxCap int64 // Maximum capacity of LLDB :1000
}
type TokenBucket struct {
*TbConfig
m sync.Mutex / / read/write locks
available int64 // Tokens are available
lastTime time.Time // Get the last token time
}
Copy the code
Inert Fill
func (tb *TokenBucket) fill(a) error {
n := time.Now()
timeUnit := n.Sub(tb.latestTime).Seconds()
fillCnt := int64(timeUnit) * tb.QPS // See the description below
if fillCnt <= 0 {
return nil
}
tb.available += fillCnt
// Prevent excessive overflow
if tb.MaxCap > 0 && tb.available > tb.MaxCap {
tb.available = tb.MaxCap
}
tb.latestTime = n
return nil
}
Copy the code
Available ** + (current token fetching time **n** – last token fetching time **tb.latestTime)/The time interval for placing tokens is QPS, so it is 1** ** Number of tokens placed each time ** tb.qps **
Bucket model
The idea of leaky bucket algorithm is very simple, as shown in the following figure (image source network). Water (request) enters the leaky bucket first, and the leaky bucket flows out of the water at a certain speed. When the inflow speed is too high, the water directly overflows, which can be seen from the leaky bucket algorithm can restrict the data transmission rate.
Simply put: the caller can only make consumption calls strictly in a predetermined interval order.
Ref: en.wikipedia.org/wiki/Leaky_…
Commonly used leaky bucket: github.com/uber-go/rat… 2.4 k Star
In addition to limiting the average transmission rate of traffic, some degree of burst transmission is required for many application scenarios.
The traditional Leaky Bucket, where the key point is that the Leaky Bucket runs at a constant rate, does not handle a large number of unexpected requests well.
For this, UberGo has modified Leaky Bucket a bit, introducing the concept of Maximum Slack.
If the request interval is smaller than a fixed rate, you can allocate the extra time buffer for future use to ensure the number of requests per second. If the interval time is far beyond the fixed rate, it will add such a large buffer to subsequent requests that even if a large number of subsequent requests arrive in an instant, the time will not be offset, thus losing the significance of limiting traffic. So maxSlack limits this buffer limit.
LeakyBucket Demo
As follows, a minimalist non-blocking leaky bucket is implemented.
Struct structure
// object
type LbConfig struct {
Rate float64 // Rate LLDB 200: 200 requests per second
MaxSlack int64 // Maximum relaxation, which can be understood as the maximum QPS released within the buffer time. The default value is 0. LLDB 10: If LLDB is larger than 10, the LLDB is forced to be 10
}
type LeakyBucket struct {
*LbConfig
m sync.Mutex / / read/write locks
perRequest time.Duration / / rate
bufferTime time.Duration // Extra time
slackTime time.Duration // Maximum relaxation time
lastTime time.Time // Get the last token time
}
Copy the code
There is no relaxation
That is, the token is obtained strictly at a predetermined interval.
func (lb *LeakyBucket) withoutSlack(a) error {
n := time.Now()
lb.bufferTime = lb.perRequest - n.Sub(lb.lastTime)
// If the value of spare time is positive, service denial is required
if lb.bufferTime > 0 {
return ErrNoTEnoughToken
} else {
lb.lastTime = n
}
return nil
}
Copy the code
There are relaxation implementations
That is, spare time for the later use of the token.
func (lb *LeakyBucket) withSlack(a) error{
n := time.Now()
// += indicates extra time to be accumulated
lb.bufferTime += lb.perRequest - n.Sub(lb.lastTime)
// If the value of spare time is positive, service denial is required
if lb.bufferTime > 0 {
return ErrNoTEnoughToken
} else {
lb.lastTime = n
}
// The maximum time allowed to cancel
if lb.bufferTime < lb.slackTime {
lb.bufferTime = lb.slackTime
}
return nil
}
Copy the code
The Demo source code
Source visible github.com/xiaoxuz/lim…
Related articles
www.cyhone.com/articles/an…
En.wikipedia.org/wiki/Token_…
En.wikipedia.org/wiki/Leaky_…
Call it a day
Call it a day. Thank you for your support
CSDN is updated irregularly, and the public account is updated continuously. Welcome to pay attention to it.