High Availability Architecture - How is traffic limiting implemented

What is current limiting?

As the name suggests, traffic restriction or flow control.

A good analogy is that old electrical switches are equipped with fuses. If someone uses a device with very high power, the fuses will blow out to protect the appliances from being burned out by a strong current.

Why use traffic restriction?

In theory, a complete external service system architecture should evaluate the system throughput and control the inlet flow based on upstream flow, flow rate, peak time point, peak QPS and load capacity of its own system at the initial stage of design.

When the traffic threshold is exceeded, the system can use service denial, queuing, or traffic diversion to ensure that the load is healthy.

If the system does not have a finite traffic policy, the system can only passively accept the sudden traffic exceeding its own load, and each sub-service in the system gradually disintegrates, and finally the whole service avalanches.

Small concept Review

QPS

QPS is a measure of the volume of traffic processed by a specific query server within a specified period of time. On the Internet, the performance of a DNS server is often measured by the query rate Per Second.

RPS

Requests Per Second refers to the number of Requests made by clients Per Second. Alibaba Cloud PTS interprets this term to mean that RPS is also called QPS in some places, which corresponds approximately to TPS (Transaction Per Second) of Loadrunner/ Jmeter without discussing “transactions” alone.

TPS

Transactions Per Second indicates the number of Transactions processed by the server Per Second. TPS typically consists of one message in and one message out, plus one user database access. (Service TPS = CAPS x Average TPS per call)

Current limiting grain size

Cluster current-limiting
Single current limiting

Cluster current-limiting

There are two modes of cluster traffic limiting

The gateway layer
The application layer

The gateway layer

Common gateway layer design, based on Nginx Lua Module to achieve the overall control. Here is a simple Lua demo.

local locks = require "resty.lock"
 
local function limiter(a)
    -- ngx dict
    local limiter = ngx.shared.limiter
    -- limiter lock
    local lock = locks:new("limiter_lock")
    localkey = gx.var.host.. ngx.var.uri-- add lock
    local elapsed, err =lock:lock("ngx_limiter:"..key)
    if not elapsed then
        return fail("failed to acquire the lock: ", err)
    end
    -- limit max value
    local limit = 5
    -- current value
    local current =limiter:get(key)
 
    - current limiting
    if current ~= nil and current + 1> limit then
       lock:unlock()
       return 0
    end
     
    if current == nil then
       limiter:set(key, 1.1) Initialization -
    else
        limiter:incr(key, 1)  -- +1
    end
    lock:unlock()
    return 1
end
ngx.print(limiter())
Copy the code

Learn about Lua-Resty-Lock: github.com/openresty/l…

Nginx.conf

http{...lua_shared_dict limiter_lock 10m;
    lua_shared_dict limiter 10m;
}
Copy the code

The application layer

The application layer is usually implemented by business code, which is based on Redis counting and ensures atomicity of Redis execution through Lua Script.

local key = "limiter:". KEYS[1]
local limit = tonumber(ARGV[1])
local expire_time = tonumber(ARGV[2])

local is_exists = redis.call("EXISTS", key)
if is_exists == 1 then
    if redis.call("INCR", key) > limit then
        return 0
    else
        return 1
    end
else
    redis.call("SET", key, 1)
    redis.call("EXPIRE", key, expire_time)
    return 1
end
Copy the code

Single current limiting

I can’t fight alone, I can’t fight in a cluster. Based on local cache, simple local count traffic limiting can be implemented independently of storage middleware. From a macro perspective, as long as the gateway layer load balancing service is highly available, the traffic of each node is not very different, and only the traffic control of a single node is concerned.

The above is the classification of traffic limiting granularity. The following is the specific traffic limiting algorithm model.

Current limiting model

The above Demo is based on the simple fixed time window model to achieve current limiting. However, when the critical point occurs, the instantaneous large flow impact,

There are two commonly used model classifications:

Time model
The barrel model

Time model

There are two time models:

Fixed window model
Sliding window model

Fixed time model

The code demo of the various granularity limiting modes discussed above is this way.

Take QPS as an example. The flow limit is 1000QPS. We will divide timeline into Windows at a fixed interval, and each window has an independent counter. This is the simplest flow limiting model, but its disadvantages are obvious. When there is a large flow impact at the critical point, the flow control cannot be satisfied.

As shown in the figure (picture source network), 1000QPS concurrency occurs in both 900ms and 1100ms. Although the flow limit requirements are met in a single window, in fact, the QPS at the critical point has reached 2000 and the service is overloaded.

Sliding time model

As shown in the figure (image source network), in order to avoid the impact of critical mass flow, the sliding time model will divide each window into N sub-windows, and each sub-window counts independently. In this way, the sum of w1+ W2 counts can be used to verify the flow limiting threshold.

The barrel model

There are also two types of bucket models:

The token bucket
bucket

Token bucket model

The token bucket algorithm works by putting tokens into the bucket at a constant rate. If the request needs to be processed, a token needs to be fetched from the bucket first. When no token is available in the bucket, the service is denied. However, token buckets allow a degree of burst traffic, which solves the problem that traffic is often sudden in real Internet applications.

Ref: en.wikipedia.org/wiki/Token_…

As shown below (picture source network):

There are two ways to implement the algorithm:

Ticker Defines a Ticker that continuously generates tokens and imports them into a bucket. The problem is that it consumes system resources. If you limit the flow based on a dimension, multiple buckets will be created, corresponding to multiple Tickers, and the resource consumption will be terrible.
Inert Fill Defines an Inert Fill function. This function is called before each token fetch. If the current time is later than lastAccessTime, it calculates how many tokens can be generated during that time, adds the generated tokens to the token bucket and updates the data. In this way, you only need to evaluate once when you get the token.

Token count method in bucket

Number of tokens in bucket = number of remaining tokens + (time of token fetching this time – time of token fetching last time)/ time interval for token placing * Number of tokens placed each time

Commonly used token buckets such as github.com/juju/rateli… 2K Star

Multiple ways to populate tokens:

func NewBucket(fillInterval time.Duration, capacity int64) *Bucket
Copy the code

The default token bucket. FillInterval adds a token to the bucket for a long time. Capacity is the capacity of the bucket.

func NewBucketWithQuantum(fillInterval time.Duration, capacity, quantum int64) *Bucket
Copy the code

As the default, the only difference is that the number of tokens populated at a time is Quantum instead of 1.

func NewBucketWithRate(rate float64, capacity int64) *Bucket
Copy the code

Fill the number of tokens per second according to the scale defined by the user. For example, if capacity is 100 and rate is 0.1, 10 tokens will be filled every second.

Multiple ways to collect a token:

func (tb *Bucket) Take(count int64) time.Duration {}
func (tb *Bucket) TakeAvailable(count int64) int64 {}
func (tb *Bucket) TakeMaxDuration(count int64, maxWait time.Duration) (
    time.Duration, bool.) {}
func (tb *Bucket) Wait(count int64) {}
func (tb *Bucket) WaitMaxDuration(count int64, maxWait time.Duration) bool {}
Copy the code

Below, I simply implemented a minimalist token bucket with a default rate of QPS.

TokenBucket Demo

Struct structure

// object
type TbConfig struct {
  QPS    int64 	// Limit QPS LLDB 200
  MaxCap int64	// Maximum capacity of LLDB :1000
}

type TokenBucket struct {
	*TbConfig
	m         sync.Mutex		/ / read/write locks
	available int64					// Tokens are available
	lastTime  time.Time			// Get the last token time
}
Copy the code

Inert Fill

func (tb *TokenBucket) fill(a) error {
   n := time.Now()
   timeUnit := n.Sub(tb.latestTime).Seconds()

   fillCnt := int64(timeUnit) * tb.QPS // See the description below
   if fillCnt <= 0 {
      return nil
   }

   tb.available += fillCnt

   // Prevent excessive overflow
   if tb.MaxCap > 0 && tb.available > tb.MaxCap {
      tb.available = tb.MaxCap
   }

   tb.latestTime = n
   return nil
}
Copy the code

Available ** + (current token fetching time **n** – last token fetching time **tb.latestTime)/The time interval for placing tokens is QPS, so it is 1** ** Number of tokens placed each time ** tb.qps **

Bucket model

The idea of leaky bucket algorithm is very simple, as shown in the following figure (image source network). Water (request) enters the leaky bucket first, and the leaky bucket flows out of the water at a certain speed. When the inflow speed is too high, the water directly overflows, which can be seen from the leaky bucket algorithm can restrict the data transmission rate.

Simply put: the caller can only make consumption calls strictly in a predetermined interval order.

Ref: en.wikipedia.org/wiki/Leaky_…

Commonly used leaky bucket: github.com/uber-go/rat… 2.4 k Star

In addition to limiting the average transmission rate of traffic, some degree of burst transmission is required for many application scenarios.

The traditional Leaky Bucket, where the key point is that the Leaky Bucket runs at a constant rate, does not handle a large number of unexpected requests well.

For this, UberGo has modified Leaky Bucket a bit, introducing the concept of Maximum Slack.

If the request interval is smaller than a fixed rate, you can allocate the extra time buffer for future use to ensure the number of requests per second. If the interval time is far beyond the fixed rate, it will add such a large buffer to subsequent requests that even if a large number of subsequent requests arrive in an instant, the time will not be offset, thus losing the significance of limiting traffic. So maxSlack limits this buffer limit.

LeakyBucket Demo

As follows, a minimalist non-blocking leaky bucket is implemented.

Struct structure

// object
type LbConfig struct {
  Rate     float64 // Rate LLDB 200: 200 requests per second
  MaxSlack int64   // Maximum relaxation, which can be understood as the maximum QPS released within the buffer time. The default value is 0. LLDB 10: If LLDB is larger than 10, the LLDB is forced to be 10
}

type LeakyBucket struct {
	*LbConfig
	m          sync.Mutex			/ / read/write locks
	perRequest time.Duration		/ / rate
	bufferTime time.Duration		// Extra time
	slackTime  time.Duration		// Maximum relaxation time
	lastTime   time.Time			// Get the last token time
}
Copy the code

There is no relaxation

That is, the token is obtained strictly at a predetermined interval.

func (lb *LeakyBucket) withoutSlack(a) error {
	n := time.Now()
	lb.bufferTime = lb.perRequest - n.Sub(lb.lastTime)
	// If the value of spare time is positive, service denial is required
	if lb.bufferTime > 0 {
		return ErrNoTEnoughToken
	} else {
		lb.lastTime = n
	}
	return nil
}
Copy the code

There are relaxation implementations

That is, spare time for the later use of the token.

func (lb *LeakyBucket) withSlack(a) error{
	n := time.Now()
  // += indicates extra time to be accumulated
	lb.bufferTime += lb.perRequest - n.Sub(lb.lastTime)

	// If the value of spare time is positive, service denial is required
	if lb.bufferTime > 0 {
		return ErrNoTEnoughToken
	} else {
		lb.lastTime = n
	}

	// The maximum time allowed to cancel
	if lb.bufferTime < lb.slackTime {
		lb.bufferTime = lb.slackTime
	}
	return nil
}
Copy the code

The Demo source code

Source visible github.com/xiaoxuz/lim…

www.cyhone.com/articles/an…

En.wikipedia.org/wiki/Token_…

En.wikipedia.org/wiki/Leaky_…

Call it a day

Call it a day. Thank you for your support CSDN is updated irregularly, and the public account is updated continuously. Welcome to pay attention to it.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

High Availability Architecture – How is traffic limiting implemented

What is current limiting?

Why use traffic restriction?

Small concept Review

Current limiting grain size

Cluster current-limiting

The gateway layer

The application layer

Single current limiting

Current limiting model

Time model

Fixed time model

Sliding time model

The barrel model

Token bucket model

TokenBucket Demo

Struct structure

Inert Fill

Bucket model

LeakyBucket Demo

Struct structure

There is no relaxation

There are relaxation implementations

The Demo source code

Related articles

Call it a day

High Availability Architecture – How is traffic limiting implemented

What is current limiting?

Why use traffic restriction?

Small concept Review

Current limiting grain size

Cluster current-limiting

The gateway layer

The application layer

Single current limiting

Current limiting model

Time model

Fixed time model

Sliding time model

The barrel model

Token bucket model

TokenBucket Demo

Struct structure

Inert Fill

Bucket model

LeakyBucket Demo

Struct structure

There is no relaxation

There are relaxation implementations

The Demo source code

Related articles

Call it a day

Related Posts

Redis implements distributed locks: he says that his distributed locks are very smooth

Introduction to Apache Flume

This article takes you through the analysis of the source code for Mybatis initialization based on XMl configuration