Before the speech

When developing a high concurrency system, we may encounter high interface access frequency, in order to ensure high availability and stability of the system, this time needs to do traffic limiting, you may use a Web Server such as Nginx to control requests, or you may use some popular class library implementation. Limiting traffic is a major killer in highly concurrent systems, so let’s look at what limiting algorithms are before we design them.

Current limiting

The purpose of traffic limiting is to limit the speed of concurrent access requests or requests within a time window to protect the system. Once the rate reaches the limit, service denial, queuing or waiting, and degradation can be processed. The system is protected by limiting concurrent (or time-window) requests to denial of service (directed to an error page or informed that the resource is unavailable), queuing (e.g., seckill, comment, order), and demotion (return of bottom data or default data).

As shown in figure:

As shown in the cartoon, the interface access frequency of the service may be very fast when the traffic comes up in a certain period of time. If we do not limit the interface access frequency, the server may fail under too high pressure, and data loss may also occur at this time. Therefore, it is necessary to limit the traffic.

The flow limiting algorithm helps us control how often each interface or program function is called. It acts as a kind of fuse, preventing the system from crashing due to excessive access frequency or concurrency. We might see a response header like this when calling some third-party interface:

X-RateLimit-Limit: 60         // 60 requests per second
X-RateLimit-Remaining: 22     // How many times are there currently
X-RateLimit-Reset: 1612184024 // Limit the reset time
Copy the code

In the above HTTP Response, the Response header tells the caller what the frequency of stream limiting is at the server end, so as to ensure the upper limit of interface access at the back end. In order to solve the traffic limiting problem, there are many algorithms, all of which have different purposes. The common strategy is to reject the exceeded requests, or queue the exceeded requests.

Generally speaking, the common treatment means of current limiting are:

  • counter
  • The sliding window
  • bucket
  • The token bucket

counter

The counter is one of the simplest traffic limiting algorithms. Its principle is: in a period of time, the request is counted, and compared with the threshold value to determine whether the flow limiting is needed. Once the critical point of time is reached, the counter is cleared.

This is like you go to the car, the carriage stipulated how many positions, full will not let the car, otherwise it is overloaded, was caught by the traffic police uncle will be fined, if our system is not a fine thing, may directly collapsed.

  1. You can set a variable in a programcountWhen a request comes I will put this number+ 1While recording the request time.
  2. Judge when the next request comes incountWhether the count value exceeds the set frequency, and whether the current request time and the first request time are in1Within minutes.
  3. If the1If the number of requests exceeds the specified frequency within a minute, subsequent requests are rejected.
  4. If the interval between the request and the first request is greater than the counting cycle, andcountIf the value is still within the current limiting range, resetcount.

Code implementation:

package main

import (
    "log"
    "sync"
    "time"
)

type Counter struct {
    rate  int           // The maximum number of requests allowed in the count period
    begin time.Time     // Count the start time
    cycle time.Duration // Count cycles
    count int           // Total number of requests received in the count period
    lock  sync.Mutex
}

func (l *Counter) Allow(a) bool {
    l.lock.Lock()
    defer l.lock.Unlock()

    if l.count == l.rate- 1 {
        now := time.Now()
        if now.Sub(l.begin) >= l.cycle {
            // If the speed is allowed, reset the counter
            l.Reset(now)
            return true
        } else {
            return false}}else {
        // If the rate limit is not reached, count is increased by 1
        l.count++
        return true}}func (l *Counter) Set(r int, cycle time.Duration) {
    l.rate = r
    l.begin = time.Now()
    l.cycle = cycle
    l.count = 0
}

func (l *Counter) Reset(t time.Time) {
    l.begin = t
    l.count = 0
}

func main(a) {
    var wg sync.WaitGroup
    var lr Counter
    lr.Set(3, time.Second) // The maximum number of requests within 1s is three
    for i := 0; i < 10; i++ {
        wg.Add(1)
        log.Println("Create request :", i)
        go func(i int) {
          if lr.Allow() {
              log.Println("Respond to request :", i)
          }
          wg.Done()
        }(i)

        time.Sleep(200 * time.Millisecond)
    }
    wg.Wait()
}
Copy the code

OutPut:

2021/02/01 21:16:12 Create request: 0 2021/02/01 21:16:12 Respond request: 0 2021/02/01 21:16:12 Create request :1 2021/02/01 21:16:12 Respond request: 1 2021/02/01 21:16:12 Create request: 2 2021/02/01 21:16:13 Create request: 3 2021/02/01 21:16:13 Create request: 4 2021/02/01 21:16:13 Create request: 5 2021/02/01 21:16:13 Create request: 6 2021/02/01 21:16:13 Create request: 6 2021/02/01 21:16:13 Create request: 6 2021/02/01 21:16:13 Create request: 7 2021/02/01 21:16:13 Response request: 7 2021/02/01 21:16:14 Create request: 8 2021/02/01 21:16:14 Create request: 9Copy the code

It can be seen that we set one request every 200ms, which is obviously higher than the maximum limit of three requests per second. After running, we find that requests numbered 2, 3, 4, 8 and 9 are discarded, indicating that traffic limiting is successful.

So the question is, if there’s a requirement for an interface/queryA maximum of 200 accesses per minute are allowed, assuming a user sends 200 requests in the last few milliseconds of 59 seconds, when 59 seconds is upCounterZeroed out, he sends another 200 requests the next second. This is in line with our design logic, which is also the design defect of the counter method. The system may bear a large number of requests from malicious users, and even break down the system.

The diagram below:

The big problem with this simple approach is that it doesn’t handle unit time boundaries very well.

The sliding window

Sliding window is a critical point of counter defects, the so-called Sliding window is a flow control technology, this word appeared in the TCP protocol. The sliding window divides the fixed time slice and moves with the passage of time. The fixed number of movable grids is counted and the threshold is judged.

As shown in figure:

In the figure above, we use the red dotted line to represent a time window (one minute). Each time window has 6 cells, each cell is 10 seconds. The time window moves one space to the right every 10 seconds to see the direction of the red arrow. We set an independent Counter for each grid. If a request is accessed at 0:45, we set the Counter of the fifth grid +1 (0:40~0:50). When judging the flow limiting, we need to add up the counts of all grids and compare them with the set frequency.

So how does sliding Windows solve the problem we encountered above? Take a look at the graph below:

When the user sends 200 requests within 0:59 seconds, the counter of the sixth grid will record +200. When the time window moves one second to the right, the counter has already recorded the 200 requests sent by the user. Therefore, if the user sends the request again, the flow limiting will be triggered and the new request will be rejected.

Actually counter is sliding window ah, it’s just only a grid, so want to make more accurate current limiting do only needs to be divided into more of the grid, in order to more accurately we don’t know how much this setting a grid, the grid number affects the precision of the sliding window algorithm, there are the concept of time slice, cannot fundamentally solve the problem of critical point.

  • Related algorithm implementation github.com/RussellLuo/…

bucket

Leaky Bucket is a Leaky Bucket with a fixed capacity that leaks water at a fixed rate. If you have ever used a faucet, you know that when you turn on the tap, water will drip into the bucket, and a leaky bucket refers to a hole under the bucket where water can come out. If the tap is turned on too high, the water will flow too fast, which can cause the bucket to fill up and overflow.

As shown in figure:

A bucket of fixed capacity, with water coming in and water going out. We can’t predict how much water will flow in, or how fast it will flow. However, for the outflow of water, the bucket can fix the outflow rate (treatment speed), thus achieving flow shaping and flow control effect.

Code implementation:

type LeakyBucket struct {
    rate       float64 // Fixed water outlet rate per second
    capacity   float64 // The capacity of a bucket
    water      float64 // The current amount of water in the bucket
    lastLeakMs int64   // Pail last leak time stamp ms

    lock sync.Mutex
}

func (l *LeakyBucket) Allow(a) bool {
    l.lock.Lock()
    defer l.lock.Unlock()

    now := time.Now().UnixNano() / 1e6
    eclipse := float64((now - l.lastLeakMs)) * l.rate / 1000 // Execute leakage first
    l.water = l.water - eclipse                              // Calculate the amount of water remaining
    l.water = math.Max(0, l.water)                           / / barrel
    l.lastLeakMs = now
    if (l.water + 1) < l.capacity {
        // Try to add water, and the water is not full
        l.water++
        return true
    } else {
        // When the water is full, refuse to add water
        return false}}func (l *LeakyBucket) Set(r, c float64) {
    l.rate = r
    l.capacity = c
    l.water = 0
    l.lastLeakMs = time.Now().UnixNano() / 1e6
}
Copy the code

Leaky bucket algorithm has the following characteristics:

  • The drain bucket has a fixed capacity, and the rate of outlet water is a constant (outflow request)
  • If the bucket is empty, no water needs to flow
  • Water droplets can flow into the drain bucket at any rate (inflow request)
  • If incoming water droplets exceed the bucket’s capacity, incoming water droplets overflow (new request denied)

The drain bucket limits the constant outflow rate (that is, the outflow rate is a constant value), so the maximum rate is the outflow rate, and the burst flow cannot occur.

Token bucket algorithm

Token Bucket algorithm is one of the most commonly used algorithms in Traffic Shaping and Rate Limiting. Typically, the token bucket algorithm is used to control the amount of data sent to the network and to allow the delivery of burst data.

We have a fixed bucket with tokens in it. The bucket is empty at first, and the system adds tokens to the bucket at a fixed rate until the bucket is full and any additional requests are discarded. When a request comes in, a token is removed from the bucket, and the request is denied or blocked if the bucket is empty.

Implementation code:

type TokenBucket struct {
    rate         int64 // Fixed token input rate, r/s
    capacity     int64 // The capacity of a bucket
    tokens       int64 // Number of tokens in the bucket
    lastTokenSec int64 // The last time the bucket put the token timestamp s

    lock sync.Mutex
}

func (l *TokenBucket) Allow(a) bool {
    l.lock.Lock()
    defer l.lock.Unlock()

    now := time.Now().Unix()
    l.tokens = l.tokens + (now-l.lastTokenSec)*l.rate // Add the token first
    if l.tokens > l.capacity {
        l.tokens = l.capacity
    }
    l.lastTokenSec = now
    if l.tokens > 0 {
        // Get the token
        l.tokens--
        return true
    } else {
        // If there is no token, reject it
        return false}}func (l *TokenBucket) Set(r, c int64) {
    l.rate = r
    l.capacity = c
    l.tokens = 0
    l.lastTokenSec = time.Now().Unix()
}
Copy the code

The token bucket has the following characteristics:

  • Tokens are placed into the token bucket at a constant rate
  • Maximum storage in bucketsBWhen the bucket is full, the newly added token is discarded or rejected
  • If there are insufficient tokens in the bucketN, the token will not be deleted and the request will be curbed (discarded or blocked waiting)

The token bucket limits the average inflow rate (allows burst requests, can be processed as long as there are tokens, supports taking 3 tokens at a time, 4 tokens…) And allow a certain amount of burst traffic.

The small knot

Currently commonly used is the token bucket, this paper introduces several common flow limiting algorithm implementation, the article is not easy to write, a focus not lost.