This is the 25th day of my participation in the August Genwen Challenge.More challenges in August

When the processing capacity of the system cannot cope with the sudden increase of external requests, traffic limiting measures must be taken to prevent the system from crashing.

Current limiting objectives:

  • Prevent being washed down by sudden flow
  • Prevents malicious requests and attacks
  • Ensure the healthy and stable running of the Cluster service center (Traffic Shaping)
  • Fine-grained resource volume (request volume) control for API economy

1 Traffic limiting indicator

At present, the mainstream current limiting methods mostly use HPS as the current limiting index.

1.1 TPS

Transactions Per Second (TPS) refers to the number of Transactions Per Second. A transaction is the process of sending the first request within a transaction to receiving the response to the last request, as measured by the elapsed time and the number of transactions completed.

However, for practical purposes, limiting traffic by transaction is not practical. To complete a transaction in a distributed system requires the cooperation of multiple systems. For example, when we shop in the e-commerce system, multiple services such as order, inventory, account and payment need to be coordinated to complete. Some services need to be returned asynchronously, so it may take a long time to complete a transaction. If traffic limiting is carried out according to TPS, the time granularity may be very large, making it difficult to accurately evaluate the response performance of the system.

1.2 HPS

Hits Per Second (HPS) refers to the number of Hits Per Second (the number of requests received by the server from the client Per Second). It refers to the total number of Web page links and submit buttons clicked by the user within one second. It is generally proportional to TPS and is one of the very important performance indicators in B/S system.

If a request completes a transaction, TPS and HPS are equivalent. However, in distributed scenarios, multiple requests may be required to complete a transaction, so TPS and HPS indicators cannot be treated equally.

1.3 QPS

QPS (Queries Per Second) indicates the query rate Per Second. QPS is the number of queries a server can respond to per second (the number of SQL queries executed per second in the database). Obviously, this is not comprehensive enough to describe the increase, delete and change, so it is not recommended to use QPS as a system performance indicator.

HPS and QPS are equivalent if there is only one server in the background. However, in distributed scenarios, each request requires multiple servers to complete the response.

2 Current limiting solution

2.1 Fixed Window counter

The implementation idea of Fixed Window counter algorithm is very simple. Maintain a counter within a Fixed unit time, and reset the counter to zero if it detects that the unit time has passed. Count limits first maintain a counter that treats the unit time period as a window, and the counter records the number of requests received by that window.

  • When the number of times is below the limit threshold, access is allowed and the counter +1
  • When the number of times is greater than the limit threshold, access is denied
  • After the current time window has passed, the counter is cleared to zero

Assume the unit time is 1 second and the limit threshold is 3. The counter is incremented by 1 for each request within 1 second of time. If the counter accumulates more than 3 times, all subsequent requests are rejected. Wait until the 1s end, the counter clear 0, start counting again. The diagram below:

The pseudocode is as follows:

    /** * Fixed window time algorithm *@return* /
    boolean fixedWindowsTryAcquire(a) {
        long currentTime = System.currentTimeMillis();  // Get the current system time
        if (currentTime - lastRequestTime > windowUnit) {  // Check whether it is within the time window
            counter = 0;  // count clear 0
            lastRequestTime = currentTime;  // Open a new window
        }
        if (counter < threshold) {  // Less than the threshold
            counter++;  // Count increment by 1
            return true;
        }

        return false;
    }
Copy the code

There is a problem

However, this algorithm has an obvious critical problem: assuming a limit threshold of 5 requests and a unit time window of 1s, if we concurrently have 5 requests in the first 0.8-1s and 1-1.2s per unit time, respectively. None of them exceed the threshold, but if you count 0.8-1.2s, the number of concurrent requests is up to 10, which has exceeded the definition of the threshold of 1s per unit time not exceeding 5.

2.2 Sliding Window counter

Sliding Window counter algorithm solves the problem of fixed Window critical value. It divides a unit time period into n periods, records the number of interface access times in each period, and deletes expired periods based on time.

A diagram explains the sliding window algorithm, as follows:

Assuming the unit time is 1s again, the sliding window algorithm divides it into five small periods, so the sliding window is divided into five small cells. Each cell represents 0.2s. Every 0.2s, the time window slides one space to the right. Then, each cycle has its own counter. If the request arrives in 0.83s, the counter for 0.8 to 1.0s is incremented by 1.

How does sliding Windows solve critical problems?

Suppose we still have 5 requests in 1s, and 5 requests come in 0.8-1.0s (say 0.9s) in the yellow box. After the 1.0s point, five more requests fall on the purple grid. If it’s a fixed window algorithm, it won’t be limited, but if it’s a sliding window, it’ll move one cell to the right after every little period. After 1.0s, it will move one space to the right. The current time interval is 0.2-1.2s. Requests in this area have exceeded the limit 5.

TIPS: When the sliding window’s lattice period is divided more, then the sliding window’s scrolling will be smoother, and the statistics of flow limiting will be more accurate.

The pseudo-code of the sliding window algorithm is as follows:

    /** * Small period divided by unit time (unit time is 1 minute, 10 seconds a small window, a total of 6 cells) */
    private int SUB_CYCLE = 10;

    /** * Number of traffic limiting requests per minute */
    private int thresholdPerMin = 100;

    /** * counter, k- is the start time of the current window value in seconds, value is the current window count */
    private final TreeMap<Long, Integer> counters = new TreeMap<>();

   /** ** sliding window time algorithm */
    boolean slidingWindowsTryAcquire(a) {
        long currentWindowTime = LocalDateTime.now().toEpochSecond(ZoneOffset.UTC) / SUB_CYCLE * SUB_CYCLE; // Get the current time in which small period window
        int currentWindowNum = countCurrentWindow(currentWindowTime); // Total requests for the current window

        // The current exceeds the threshold
        if (currentWindowNum >= thresholdPerMin) {
            return false;
        }

        // count +1
        counters.get(currentWindowTime)++;
        return true;
    }

   /** * Count requests for the current window */
    private int countCurrentWindow(long currentWindowTime) {
        // Calculate the window start position
        long startTime = currentWindowTime - SUB_CYCLE* (60s/SUB_CYCLE-1);
        int count = 0;

        // Iterate over the stored counter
        Iterator<Map.Entry<Long, Integer>> iterator = counters.entrySet().iterator();
        while (iterator.hasNext()) {
            Map.Entry<Long, Integer> entry = iterator.next();
            // Delete invalid expired child window counters
            if (entry.getKey() < startTime) {
                iterator.remove();
            } else {
                // Adds up all the counters of the current windowcount =count + entry.getValue(); }}return count;
    }
Copy the code

Although the sliding window algorithm solves the critical problem of fixed window, once the flow limit is reached, the request will be rejected directly. So we’re going to lose some requests, which is actually not very friendly for the product. The advantage of sliding time window is that it solves the defect of flow counter algorithm, but it also has two problems:

  • Traffic over must be discarded or degraded logic
  • The flow control is not precise enough to limit the flow concentrated in a short period of time, nor can peak filling

2.3 Leaky Bucket Algorithm

As shown in the figure below, water droplets continue to drop into the drain bucket and flow out at a constant speed at the bottom. If the rate at which water drops in is greater than the rate at which it flows out, it will overflow when the amount of stored water exceeds the size of the bucket. Here are the rules:

  • The request came and put it in the bucket
  • Bucket requests were rejected when the bucket was full
  • The service picks up the request from the bucket and processes it

You can see that the drop corresponds to the request. It is characterized by wide inflow and strict outflow. No matter how many requests are made and how big the request rate is, all the requests are sent out at a fixed rate, corresponding to which the service processes the requests at a fixed rate. In the face of sudden requests, the service processing speed is the same as normal, which is actually not what we want. In the face of sudden traffic, we hope to improve the user experience while maintaining a stable system, that is, to process requests faster, rather than following the rules like normal traffic. Token buckets can be more aggressive when dealing with surprise traffic.

The pseudocode of the leaky bucket algorithm is as follows:

   /** * Number of treatments per second (effluent rate) */
    private long rate;

    /** * The current amount of water */
    private long currentWater;

    /** * last refresh time */
    private long refreshTime;

    /** * Bucket capacity */
    private long capacity;

    /** * Leaky bucket algorithm *@return* /
    boolean leakybucketLimitTryAcquire(a) {
        long currentTime = System.currentTimeMillis();  // Get the current system time
        long outWater = (currentTime - refreshTime) / 1000 * rate; // Discharge water =(current time - last refresh time)* discharge water rate
        long currentWater = Math.max(0, currentWater - outWater); // The current amount of water = the previous amount of water in the bucket - the amount of water out
        refreshTime = currentTime; // Refresh time

        // If the current amount of water is still less than the bucket capacity, request release
        if (currentWater < capacity) {
            currentWater++;
            return true;
        }
        
        // The current amount of water is greater than or equal to the bucket capacity
        return false;
    }
Copy the code

2.4 Token Bucket Algorithm (Token Bucket)

A token bucket works in a similar way to a leaky bucket, except that the leaky bucket pours tokens into the bucket at a constant speed, and then requests are processed by the server only after receiving a token. Of course, there is a limit to the size of the token bucket, assuming that when the bucket is full, the tokens generated by cruise control will be discarded. Rules:

  • Place the token in the bucket at constant speed
  • The number of tokens exceeded the bucket limit, discarded
  • When the request comes, ask for the token in the bucket first. If the request is successful, it will be processed. Otherwise, it will be rejected

It can be seen that when the token bucket deals with sudden traffic, if there are 100 tokens in the bucket, these 100 tokens can be taken away immediately, rather than being consumed at a constant speed like a leaking bucket. So token buckets perform better when dealing with sudden traffic.

Token bucket algorithm pseudo-code implementation is as follows:

    /** * Number of processes per second (number of tokens placed) */
    private long putTokenRate;
    
    /** * last refresh time */
    private long refreshTime;

    /** * Token bucket capacity */
    private long capacity;
    
    /** * The number of tokens in the bucket */
    private long currentToken = 0L;

    /** * Leaky bucket algorithm *@return* /
    boolean tokenBucketTryAcquire(a) {
        long currentTime = System.currentTimeMillis();  // Get the current system time
        long generateToken = (currentTime - refreshTime) / 1000 * putTokenRate; // Token generated =(current time - last refresh time)* token placement rate
        currentToken = Math.min(capacity, generateToken + currentToken); // Current token count = previous token count in bucket + token count in bucket
        refreshTime = currentTime; // Refresh time
        
        // There is a token in the bucket
        if (currentToken > 0) {
            currentToken--; // Number of tokens -1
            return true;
        }
        
        return false;
    }
Copy the code

2.5 Distributed Traffic Limiting

The core of the counter limiting is the INCRBY and EXPIRE instructions, where the test cases are. In general, the counter algorithm is prone to uneven conditions, and the transient QPS may exceed the system’s capacity.

-- Gets the first key passed in when the script is called (used as a key for limiting traffic)
local key = KEYS[1]
-- Gets the value of the first argument passed in when the script is called.
local limit = tonumber(ARGV[1])
-- Gets the TTL of the rate limiting interval of the counter
local ttl = tonumber(ARGV[2])

-- Gets the current traffic size
local curentLimit = tonumber(redis.call('get', key) or "0")

-- Whether the traffic limit is exceeded
if curentLimit + 1 > limit then
    -- Return (reject)
    return 0
else
    -- Does not exceed value + 1
    redis.call('INCRBY', key, 1)
    If the concurrency count stored in key is 0, the current window is a new window whose expiration time is set to the window's expiration time
    if (current_permits == 0) then
          redis.call('EXPIRE', key, ttl)
      end
    -- Return (release)
    return 1
end
Copy the code

The logic of this Lua script is straightforward:

  • throughKEYS[1]Gets the key of a traffic limiting indicator
  • throughARGV[1]Gets the limit parameter passed, which is the current limiting value
  • throughARGV[2]The TTL of the traffic limiting interval was obtained
  • throughredis.call, get the corresponding value of key (default is 0), and then judge with limit, if the value exceeds the limit, it indicates that the flow is restricted. Otherwise, useINCRBYAdded 1, unrestricted flow (need to handle initialization case, setTTL)

There is a problem with the above code. If the key exists before and the TTL is not set, the limit logic will be in effect forever (after the limit is triggered).

Token bucket algorithm is also the algorithm used in Guava. It also uses the method of calculation to connect the time with the number of tokens:

-- key
local key = KEYS[1]
-- Maximum number of tokens stored
local max_permits = tonumber(KEYS[2])
-- Number of tokens generated per second
local permits_per_second = tonumber(KEYS[3])
-- Number of tokens requested
local required_permits = tonumber(ARGV[1])

The next request can get the start time of the token
local next_free_ticket_micros = tonumber(redis.call('hget', key, 'next_free_ticket_micros') or 0)

-- Current time
local time = redis.call('time')
-- time[1] returns seconds, and time[2] is ms
local now_micros = tonumber(time[1]) * 1000000 + tonumber(time[2])

-- Query whether getting the token timed out (passed in microseconds)
if (ARGV[2] ~ =nil) then
    -- The timeout for retrieving the token
    local timeout_micros = tonumber(ARGV[2])
    local micros_to_wait = next_free_ticket_micros - now_micros
    if (micros_to_wait> timeout_micros) then
        return micros_to_wait
    end
end

-- Number of tokens currently stored
local stored_permits = tonumber(redis.call('hget', key, 'stored_permits') or 0)
-- Add token interval (1s for 1000000ms)
Calculate how many microseconds it takes to produce a token
local stable_interval_micros = 1000000 / permits_per_second

-- Supplementary token
if (now_micros> next_free_ticket_micros) then
    local new_permits = (now_micros - next_free_ticket_micros) / stable_interval_micros
    stored_permits = math.min(max_permits, stored_permits + new_permits)
    -- Updates the next time a token can be fetched after replenishment
    next_free_ticket_micros = now_micros
end

-- Consume tokens
local moment_available = next_free_ticket_micros
Required_permits <=stored_permits or required_permits>stored_permits
local stored_permits_to_spend = math.min(required_permits, stored_permits)
local fresh_permits = required_permits - stored_permits_to_spend;
With fresh_permits>0, there is not enough token bucket left and need to wait some time
local wait_micros = fresh_permits * stable_interval_micros

Redis provides the redis.replicate_commands() function to implement this function, which persists and replicates data changes on a transactional basis, allowing random writes within Lua scripts
redis.replicate_commands()
Number of tokens left in storage: number of tokens left in the bucket - number of applications this time
redis.call('hset', key, 'stored_permits', stored_permits - stored_permits_to_spend)
redis.call('hset', key, 'next_free_ticket_micros', next_free_ticket_micros + wait_micros)
redis.call('expire', key, 10)

-- Returns the length of time to wait
Return to 0 (Moment_available ==now_micros) indicates that there are enough tokens left in the bucket that there is no need to wait
return moment_available - now_micros
Copy the code