5 traffic limiting algorithms, 7 traffic limiting methods, blocking burst traffic?

Hello everyone, MY name is Alang. I need to use current limiting in my work recently. This article introduces common methods of current limiting.

The article continues to be updated, you can follow the public account program ape Alang or visit the unread code blog. This article Github.com/niumoo/Java… Already included, welcome Star.

preface

In recent years, with the popularity of microservices, dependencies between services have become stronger, invocation relationships have become more complex, and stability between services has become more and more important. In the case of sudden surge of requests, malicious user access, or high request frequency that brings great pressure to downstream services, we often need to ensure the stability of services through caching, traffic limiting, fuse degradation, load balancing and other methods. Among them, current limiting is an indispensable part. This article introduces the relevant knowledge of current limiting.

1. The current limit

Flow limiting, as its name implies, limits the number of requests or concurrent requests. By limiting the number of requests in a time window to ensure the normal operation of the system. If our service has limited resources and processing power, we need to limit upstream requests that call our service to prevent our service from being stopped due to resource exhaustion.

There are two concepts to understand in limiting traffic.

Threshold: The number of requests allowed in a unit of time. If the QPS limit is 10, a maximum of 10 requests can be received within one second.
Rejection policy: Indicates the rejection policy of the request that exceeds the threshold. Common rejection policies include direct rejection and queuing.

2. Fixed window algorithm

Fixed window algorithm, also called counter algorithm, is a simple and convenient current limiting algorithm. The number of requests within one second is accumulated by a counter that supports atomic operations. When the number of requests within one second reaches the traffic limiting threshold, the rejection policy is triggered. Every second, the counter is reset to 0 and the recount begins.

2.1. Code implementation

Here is a simple code implementation, with a QPS limit of 2. The code here is optimized so that instead of starting a separate thread to reset the counter every second, each call does a time interval calculation to determine whether the counter is reset first.

/ * * *@author https://www.wdbyte.com
 */
public class RateLimiterSimpleWindow {
    / / threshold
    private static Integer QPS = 2;
    // Time window (ms)
    private static long TIME_WINDOWS = 1000;
    / / counter
    private static AtomicInteger REQ_COUNT = new AtomicInteger();
    
    private static long START_TIME = System.currentTimeMillis();

    public synchronized static boolean tryAcquire(a) {
        if ((System.currentTimeMillis() - START_TIME) > TIME_WINDOWS) {
            REQ_COUNT.set(0);
            START_TIME = System.currentTimeMillis();
        }
        return REQ_COUNT.incrementAndGet() <= QPS;
    }

    public static void main(String[] args) throws InterruptedException {
        for (int i = 0; i < 10; i++) {
            Thread.sleep(250);
            LocalTime now = LocalTime.now();
            if(! tryAcquire()) { System.out.println(now +"Restricted flow");
            } else {
                System.out.println(now + "Do something."); }}}}Copy the code

Running results:

20:53:43.038922 Do something 20:53:43.291435 Do something 20:53:43.543087 Being restricted 20:53:43.796666 Do something 20:53:44.050855 Do something 20:53:44.303547 Traffic is restricted 20:53:44.555008 Traffic is restricted 20:53:44.809083 Do something 20:53:45.063828 Do something 20:53:45.314433 Traffic is restrictedCopy the code

It can be seen from the output result that the operation is about 3 times per second. Due to the limited QPS of 2, there will be an average of one current limiting. It seems ok, but if we think about it, we will find that this simple method of limiting the flow is problematic. Although we limit the QPS to 2, when we encounter the critical mutation of the time window, such as the last 500ms in 1s and the first 500ms in 2s, although the total time is 1s, It can be asked four times.

This can be verified by simply modifying the test code:

// Sleep for 400ms to reach the time window faster.
Thread.sleep(400);
for (int i = 0; i < 10; i++) {
    Thread.sleep(250);
    if(! tryAcquire()) { System.out.println("Restricted flow");
    } else {
        System.out.println("Do something."); }}Copy the code

The resulting output can be seen as 4 consecutive requests, with an interval of 250 ms without being restricted. :

20:51:17.395087 Do something 20:51:17.653114 do something 20:51:17.903543 Do something 20:51:18.154104 is restricted 20:51:18.405497 Do something 20:51:18.655885 Do something 20:51:18.906177 Do something 20:51:19.158113 Is restricted 20:51:19.410512 Do something 20:51:19.661629 Do somethingCopy the code

3. Sliding window algorithm

We have already known the implementation of fixed window algorithm and its existing problems, and sliding window algorithm is an improvement of fixed window algorithm. Since the fixed window algorithm has problems when it encounters the critical mutation of the time window, shouldn’t we just adjust the time window before encountering the next time window?

Below is a schematic of the sliding window.

In the example above, the window slides once every 500ms, and it can be found that the shorter the window sliding interval is, the smaller the probability of the critical mutation of the time window is. However, as long as there is a time window, the critical mutation of the time window is still possible.

3.1. Code implementation

The following is a simple sliding window limiting tool class based on the above sliding window idea.

package com.wdbyte.rate.limiter;

import java.time.LocalTime;
import java.util.concurrent.atomic.AtomicInteger;

/** * Sliding window stream limiting tool class **@author https://www.wdbyte.com
 */
public class RateLimiterSlidingWindow {
    /** * Threshold */
    private int qps = 2;
    /** * Total size of time window (ms) */
    private long windowSize = 1000;
    /** ** how many subwindows */
    private Integer windowCount = 10;
    /** * Window list */
    private WindowInfo[] windowArray = new WindowInfo[windowCount];

    public RateLimiterSlidingWindow(int qps) {
        this.qps = qps;
        long currentTimeMillis = System.currentTimeMillis();
        for (int i = 0; i < windowArray.length; i++) {
            windowArray[i] = new WindowInfo(currentTimeMillis, new AtomicInteger(0)); }}/** * 1. Calculate current window * 2. Update current window count & reset expired window count * 3. Whether the current QPS exceeds the limit * *@return* /
    public synchronized boolean tryAcquire(a) {
        long currentTimeMillis = System.currentTimeMillis();
        // 1. Calculate the current time window
        int currentIndex = (int)(currentTimeMillis % windowSize / (windowSize / windowCount));
        // 2. Update current window count & reset expired window count
        int sum = 0;
        for (int i = 0; i < windowArray.length; i++) {
            WindowInfo windowInfo = windowArray[i];
            if ((currentTimeMillis - windowInfo.getTime()) > windowSize) {
                windowInfo.getNumber().set(0);
                windowInfo.setTime(currentTimeMillis);
            }
            if (currentIndex == i && windowInfo.getNumber().get() < qps) {
                windowInfo.getNumber().incrementAndGet();
            }
            sum = sum + windowInfo.getNumber().get();
        }
        // 3. Whether the current QPS exceeds the limit
        return sum <= qps;
    }

    private class WindowInfo {
        // Window start time
        private Long time;
        / / counter
        private AtomicInteger number;

        public WindowInfo(long time, AtomicInteger number) {
            this.time = time;
            this.number = number;
        }
        // get... set...}}Copy the code

The following is the test case. Set QPS to 2 and test 20 times at an interval of 300 milliseconds. The expected success times are about 12 times.

public static void main(String[] args) throws InterruptedException {
    int qps = 2, count = 20, sleep = 300, success = count * sleep / 1000 * qps;
    System.out.println(String.format("Current QPS limit :%d, current tests :%d, interval :% DMS, Expected successes :%d", qps, count, sleep, success));
    success = 0;
    RateLimiterSlidingWindow myRateLimiter = new RateLimiterSlidingWindow(qps);
    for (int i = 0; i < count; i++) {
        Thread.sleep(sleep);
        if (myRateLimiter.tryAcquire()) {
            success++;
            if (success % qps == 0) {
                System.out.println(LocalTime.now() + ": success, ");
            } else {
                System.out.print(LocalTime.now() + ": success, "); }}else {
            System.out.println(LocalTime.now() + ": fail");
        }
    }
    System.out.println();
    System.out.println("Actual number of successful tests :" + success);
}
Copy the code

Here are the results of the test.

Current QPS limit is :2, current test times :20, interval :300ms, expected success times :12 16:04:27.077782: success, 16:04:27.380715: success, 16:04:27.684244: Fail 16:04:27.989579: Success, 16:04:28.293347: success, 16:04:28.597658: Fail 16:04:28.901688: Fail 16:04:29.205262: 812188: Fail 16:04:30.115316: Fail 16:04:30.420596: 725897: SUCCESS, 16:04:31.028599: FAIL 16:04:31.331047: Fail 16:04:31.634127: 242380: FAIL 16:04:32.547626: Fail 16:04:32.847965: Success, actual test number of success :11Copy the code

4. Sliding log algorithm

Sliding log algorithm is another method to realize current limiting, which is relatively simple. Basic logic is to record all the request of the point in time, new request comes to judge the recent number of requests within a specified time range is more than a specified threshold, thus to determine whether to reach the current limit, this way did not have the time window mutation, current limiting is accurate, but because of the need to keep a record of each request point in time, so the memory more.

4.1. Code implementation

The following is a simple implementation of a sliding logging algorithm, because sliding logging stores a single record per request and can take up too much memory. So the following implementation is not really a serious sliding log, more like a sliding window algorithm that splits a second of time into 1000 time Windows.

package com.wdbyte.rate.limiter;

import java.time.LocalTime;
import java.util.HashSet;
import java.util.Set;
import java.util.TreeMap;

/** * Set QPS to 2. **@author https://www.wdbyte.com
 */
public class RateLimiterSildingLog {

    /** * Threshold */
    private Integer qps = 2;
    /** * Records the timestamp of the request, and the number */
    private TreeMap<Long, Long> treeMap = new TreeMap<>();

    /** * Clean request record interval, 60 seconds */
    private long claerTime = 60 * 1000;

    public RateLimiterSildingLog(Integer qps) {
        this.qps = qps;
    }

    public synchronized boolean tryAcquire(a) {
        long now = System.currentTimeMillis();
        // Delete old data once every 60 seconds
        if(! treeMap.isEmpty() && (treeMap.firstKey() - now) > claerTime) { Set<Long> keySet =new HashSet<>(treeMap.subMap(0L, now - 1000).keySet());
            for(Long key : keySet) { treeMap.remove(key); }}// Count the number of current requests
        int sum = 0;
        for (Long value : treeMap.subMap(now - 1000, now).values()) {
            sum += value;
        }
        // If the QPS limit is exceeded, return false
        if (sum + 1 > qps) {
            return false;
        }
        // Record the request
        if (treeMap.containsKey(now)) {
            treeMap.compute(now, (k, v) -> v + 1);
        } else {
            treeMap.put(now, 1L);
        }
        return sum <= qps;
    }

    public static void main(String[] args) throws InterruptedException {
        RateLimiterSildingLog rateLimiterSildingLog = new RateLimiterSildingLog(3);
        for (int i = 0; i < 10; i++) {
            Thread.sleep(250);
            LocalTime now = LocalTime.now();
            if (rateLimiterSildingLog.tryAcquire()) {
                System.out.println(now + "Do something.");
            } else {
                System.out.println(now + "Restricted flow"); }}}}Copy the code

Set the threshold QPS to 3 in the code, and run to get the following log:

20:51:17.395087 Do something 20:51:17.653114 do something 20:51:17.903543 Do something 20:51:18.154104 is restricted 20:51:18.405497 Do something 20:51:18.655885 Do something 20:51:18.906177 Do something 20:51:19.158113 Is restricted 20:51:19.410512 Do something 20:51:19.661629 Do somethingCopy the code

5. Leaky bucket algorithm

The bucket bucket algorithm is an image of the metaphor, producer-consumer model can be used here, the request is a producer, every request, like a drop of water, the request arrives in a queue (bucket), while the bottom there is a hole, constantly leaking water droplets, as consumers increasingly in the consumer the contents of the queue, The consumption rate (leakage rate) is equal to the limiting threshold. That is, if QPS is 2, it is consumed once every 1s / 2= 500ms. A leaky bucket has a size, such as the capacity of a queue. If the number of requests exceeds the specified capacity, a rejection policy will be triggered.

Below is a schematic diagram of the leaky bucket algorithm.

It can be known from the introduction that the consumption processing in the leaky bucket mode can always be carried out at a constant speed, which can well protect its system from being washed down by the sudden flow. However, this is also the disadvantage of the leaky bucket mode. If the QPS is 2 and two requests come in at the same time, the two requests cannot be processed at the same time, because only one request can be processed every 1s / 2= 500ms.

6. Token bucket algorithm

Token bucket algorithm is also a common idea to implement traffic limiting. The most commonly used stream limiting tool class RateLimiter in Google’s Java development kit Guava is an implementation of token bucket. The idea behind token buckets is similar to the relationship between producers and consumers.

As a producer, the system service adds tokens to the bucket (container) at a specified frequency. For example, if the QPS is 2, the system service adds one token to the bucket every 500ms. If the number of tokens in the bucket reaches the threshold, the system service does not add any more tokens.

Request execution As a consumer, each request needs to fetch a token from the bucket and continue to execute after receiving the token. If no token is available in the bucket, the reject policy will be triggered, which can be timeout waiting or directly reject the request, thus achieving the purpose of traffic limiting.

The following is a schematic diagram of the token bucket traffic limiting algorithm.

Consider the following characteristics of token bucket implementations.

1S/Threshold (QPS) = Interval for adding tokens.
The capacity of a bucket is equal to the threshold of traffic limiting. When the number of tokens reaches the threshold, the bucket cannot be added.
Can adapt to traffic burst, N requests only need to obtain N tokens from the bucket can continue processing.
There is a startup process where the token bucket starts with no tokens in the bucket and then adds tokens at the token addition interval. If a threshold number of requests comes in at startup, the rejection policy will be triggered because there are not enough tokens in the bucket, but tools such as RateLimiter have optimized this problem.

6.1. Code implementation

RateLimiter, a stream limiting tool class in Google’s Java development toolkit Guava, is an implementation of the token bucket. We will not implement it manually in daily development, so we will use RateLimiter directly for testing.

Introducing dependencies:

<exclusion>
  	<groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
  	<version>31.0.1 - jre</version>
</exclusion>
Copy the code

RateLimiter Traffic Limiting Experience:

// qps 2
RateLimiter rateLimiter = RateLimiter.create(2);
for (int i = 0; i < 10; i++) {
    String time = LocalDateTime.now().format(DateTimeFormatter.ISO_LOCAL_TIME);
    System.out.println(time + ":" + rateLimiter.tryAcquire());
    Thread.sleep(250);
}
Copy the code

The code limits QPS to 2, which is to generate a token every 500ms, but the program retrieves the token every 250ms, so only one of the two retrieves will succeed.

17:19:06.797557:true 17:19:07.061419:false 17:19:07.316283:true 17:19:07.5667446 :false 17:19:07.817035:true 17:19:08.072483:false 17:19:08.3263447 :true 17:19:08.577661:false 17:19:08.830252:true 17:19:09.085327:falseCopy the code

Think 6.2.

While we demonstrated the implementation of RateLimiter in the Google Guava toolkit, we need to consider the way tokens are added. If tokens are added at specified intervals, we need to open a thread to add them periodically. If there are many interfaces and many instances of RateLimiter, The number of threads will increase, which is obviously not a good idea. Google has clearly taken this issue into account, and RateLimiter calculates whether the token is sufficient each time the token is fetched. It calculates whether the token is sufficient by storing the difference between the time when the next token is generated and the time when the current token is obtained, combined with the threshold value, and records the generation time of another token for the next call.

Below is a code analysis of the resync() method of SmoothRateLimiter, a subclass of the RateLimiter class in Guava, where you can see the token calculation logic.

void resync(long nowMicros) { // The current microsecond time
    // Whether the current time is greater than the next token generation time
    if (nowMicros > this.nextFreeTicketMicros) { 
      	NewPermits = (current time - next token generation time)/token generation interval.
      	// If QPS is 2, coolDownIntervalMicros is 500000.0 microseconds.
        double newPermits = (double)(nowMicros - this.nextFreeTicketMicros) / this.coolDownIntervalMicros();
				// Update token store storedPermits.
      	this.storedPermits = Math.min(this.maxPermits, this.storedPermits + newPermits);
				// Update the next token generation time nextFreeTicketMicros
      	this.nextFreeTicketMicros = nowMicros; }}Copy the code

7. Redis distributed traffic limiting

Redis is an open source in-memory database that can be used as a database, cache, messaging middleware, and more. Redis is single-threaded and operates in memory, so it is very fast. Thanks to the various features of Redis, it is very convenient to use Redis to implement a stream limiting tool.

The following demonstrations are based on the Spring Boot project and require the following dependencies.

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
Copy the code

Configure Redis information.

spring:
  redis:
    database: 0
    password: 
    port: 6379
    host: 127.0. 01.
    lettuce:
      shutdown-timeout: 100ms
      pool:
        min-idle: 5
        max-idle: 10
        max-active: 8
        max-wait: 1ms
Copy the code

7.1. Fixed window current limiting

Fixed window limiting in Redis is realized by incR command, incR command usually use increment count; If we use timestamp information as the key, we can count requests per second to limit traffic.

There are two things to note here.

For a non-existent key, the value is always 1 when it is added for the first time.
INCR and EXPIRE command operations should be committed in a single atomic operation to ensure that each key is set to EXPIRE correctly, otherwise there will be memory overruns due to key values not being deleted automatically.

Because of the complexity of implementing transactions in Redis, atomic operations are implemented directly with lua scripts. Here is the content of the Lua script.

local count = redis.call("incr",KEYS[1])
if count == 1 then
  redis.call('expire',KEYS[1],ARGV[2])
end
if count > tonumber(ARGV[1]) then
  return 0
end
return 1
Copy the code

Here is the lua script call test code implemented using RedisTemplate in Spring Boot.

/ * * *@author https://www.wdbyte.com
 */
@SpringBootTest
class RedisLuaLimiterByIncr {
    private static String KEY_PREFIX = "limiter_";
    private static String QPS = "4";
    private static String EXPIRE_TIME = "1";

    @Autowired
    private StringRedisTemplate stringRedisTemplate;

    @Test
    public void redisLuaLimiterTests(a) throws InterruptedException, IOException {
        for (int i = 0; i < 15; i++) {
            Thread.sleep(200);
            System.out.println(LocalTime.now() + "" + acquire("user1")); }}/** * counter limit current **@param key
     * @return* /
    public boolean acquire(String key) {
        // The current number of seconds is the key
        key = KEY_PREFIX + key + System.currentTimeMillis() / 1000;
        DefaultRedisScript<Long> redisScript = new DefaultRedisScript<>();
        redisScript.setResultType(Long.class);
        // Lua files are stored in resources directory
        redisScript.setScriptSource(new ResourceScriptSource(new ClassPathResource("limiter.lua")));
        return stringRedisTemplate.execute(redisScript, Arrays.asList(key), QPS, EXPIRE_TIME) == 1; }}Copy the code

Although QPS is limited to 4 in the code, because this kind of stream limiting implementation takes the milliseconds timestamp as the key, there will be the problem of critical window mutation. The following is the operation result. It can be seen that the QPS exceeds the limit value of 4 due to the change of the time window.

17:38:23.122044 true
17:38:23.695124 true
17:38:23.903220 true
#There is a time window here, so let's continuetrue17:38:24.106206 true 17:38:24.313458 true 17:38:24.519431 true 17:38:24.724446 true 17:38:24.932387 false 137912 true 17:38:25.355595 true 17:38:25.558219 true 17:38:25.765801 true 17:38:25.969426 false 17:38:26. 176220 true 17:38:26. 381918 trueCopy the code

7.3. Sliding window current limiting

Through the above test of Redis flow limiting mode based on INCR command, we have found the problems caused by fixed window flow limiting. In the third part of this article, we have introduced the advantages of sliding window flow limiting, which can greatly reduce the problems caused by window critical mutation. So how to use Redis to achieve sliding window flow limiting?

Here, ZSET ordered set is mainly used to achieve sliding window current limiting. ZSET has the following characteristics:

Key values in a ZSET collection can be sorted automatically.
The value in the ZSET set cannot have duplicate values.
ZSET sets can be easily obtained using the ZCARD command.
ZSET can be used to easily remove a range of keys using the ZREMRANGEBYLEX command.

Based on the above four features, you can write a Zset-based sliding window limiting Lua script.

- KEYS [1] : current limiting the key
--ARGV[1]: timestamp - time window
--ARGV[2]: current timestamp (as score)
- ARGV [3] : threshold
--ARGV[4]: unique value of score
-- 1. Remove the data before the time window
redis.call('zremrangeByScore', KEYS[1].0, ARGV[1])
-- 2. Count the current number of elements
local res = redis.call('zcard', KEYS[1])
-- 3. Check whether the threshold is exceeded
if (res == nil) or (res < tonumber(ARGV[3])) then
    redis.call('zadd', KEYS[1], ARGV[2], ARGV[4])
    return 1
else
    return 0
end
Copy the code

Here is the lua script call test code implemented using RedisTemplate in Spring Boot.

@SpringBootTest
class RedisLuaLimiterByZset {

    private String KEY_PREFIX = "limiter_";
    private String QPS = "4";

    @Autowired
    private StringRedisTemplate stringRedisTemplate;

    @Test
    public void redisLuaLimiterTests(a) throws InterruptedException, IOException {
        for (int i = 0; i < 15; i++) {
            Thread.sleep(200);
            System.out.println(LocalTime.now() + "" + acquire("user1")); }}/** * counter limit current **@param key
     * @return* /
    public boolean acquire(String key) {
        long now = System.currentTimeMillis();
        key = KEY_PREFIX + key;
        String oldest = String.valueOf(now - 1 _000);
        String score = String.valueOf(now);
        String scoreValue = score;
        DefaultRedisScript<Long> redisScript = new DefaultRedisScript<>();
        redisScript.setResultType(Long.class);
        // Lua files are stored in resources directory
        redisScript.setScriptSource(new ResourceScriptSource(new ClassPathResource("limiter2.lua")));
        return stringRedisTemplate.execute(redisScript, Arrays.asList(key), oldest, score, QPS, scoreValue) == 1; }}Copy the code

The limit QPS in the code is 4, and the running result information is consistent with it.

17:36:37.150370 true
17:36:37.716341 true
17:36:37.922577 true
17:36:38.127497 true
17:36:38.335879 true
17:36:38.539225 false
17:36:38.745903 true
17:36:38.952491 true
17:36:39.159497 true
17:36:39.365239 true
17:36:39.570572 false
17:36:39.776635 true
17:36:39.982022 true
17:36:40.185614 true
17:36:40.389469 true
Copy the code

Here introduced Redis to achieve the two ways of limiting traffic, of course, the use of Redis can also achieve leakage bucket and token bucket two kinds of traffic limiting algorithm, here will not do the demonstration, interested in your own research.

8. To summarize

This article introduces several ways to realize traffic limiting, mainly window algorithm and bucket algorithm, both of which have their own advantages.

The window algorithm is simple to implement, logical and clear, and can get the current QPS situation intuitively, but there will be critical mutation of time window, and there is no queue to buffer like buckets.
Although the bucket algorithm is a little complex, it is not easy to calculate the QPS situation, but the bucket algorithm also has advantages.
- In leaky bucket mode, the consumption rate is constant, protecting the system and shaping traffic. However, it cannot respond quickly to sudden traffic.
- The token bucket mode can face burst traffic but has a slow acceleration at startup, although this has been optimized in common open source tools.

Single-node and distributed traffic limiting

The window algorithm based on code form and bucket algorithm traffic limiting shown above are applicable to single-machine traffic limiting. If distributed traffic limiting is required, you can calculate the traffic limiting threshold of each service in combination with the registry and load balancing, but this will reduce certain accuracy. If the accuracy requirement is not too high, you can use it.

Redis traffic limiting can be used for distributed traffic limiting due to its single-node nature. You can use Redis to implement a variety of algorithms that can be used for limiting traffic. You can also use open source tools such as Redisson, which has encapsulated Redis based traffic limiting.

Other traffic limiting tools

Guava’s stream limiting tool kit has been mentioned in this paper, but it is standalone after all. There are also many distributed stream limiting tools in the open source community. For example, Alibaba’s Sentinel is a good tool.

As always, the code for this article is at: github.com/niumoo/Java…

reference

Redis INCR: Redis. IO/commands/in…

Rate Limiting Wikipedia:en.wikipedia.org/wiki/Rate_l…

SpringBoot Redis:www.cnblogs.com/lenve/p/109…

To subscribe to

You can search App Alang on wechat or visit the unread code blog to read. This article Github.com/niumoo/Java… Already included, welcome Star.