High Concurrency System Current Limiting (1)

introduce

When developing high-concurrency systems, there are many ways to protect the system, such as caching, degradation, and limiting traffic. Caching improves system access speed and increases system processing power. Degradation refers to temporarily shielding non-core services when the core process services are affected by system problems. The purpose of flow limiting is to protect the system by limiting the speed of concurrent access. Once the speed limit is reached, denial of service (directed to an error page or informed that the resource is unavailable), queuing or waiting (e.g., seckill, comment and order), degradation (return to bottom or default data, such as product details page inventory is available by default) can be achieved.

The common current limiting methods are as follows:

Limit the total number of concurrent transactions, such as database connection pools, thread pools
Limit instantaneous concurrency, such as the limit_CONN module of Nginx
Limit the average rate within a time window, such as Guava’s RateLimiter, Nginx’s limit_req module
Limit the remote interface call rate
Limit the consumption rate of MQ

In terms of application scenarios, traffic limiting can be classified into application-level traffic limiting, distributed traffic limiting, and access traffic limiting.

Current limit algorithm

Common traffic limiting algorithms: token buckets, leaky buckets, counters (rough traffic limiting)

Token bucket algorithm

The token bucket algorithm is a bucket that holds fixed capacity tokens and adds tokens to the bucket at a fixed rate. The algorithm is described as follows:

Assuming a limit of 2r/s, the token is added to the bucket at a fixed rate of 500ms
A bucket can hold a maximum of B tokens. When the bucket is full, newly added tokens are discarded or rejected
When a packet of n bytes arrives, n tokens are removed from the bucket and the packet is sent to the network
If there are less than n tokens in the bucket, the token is not removed and the packet is throttled (either discarded or waiting in the buffer)

Bucket algorithm

Leaky buckets can be used for traffic shaping and flow control. The algorithm is described as follows:

A leaky bucket with a fixed capacity that drains water droplets at a constant rate
If the bucket is empty, no water needs to flow out
Water droplets can flow into the drain bucket at any rate
If the inflow of water droplets exceeds the capacity of the bucket, the outflow of water droplets overflows (is discarded), while the leaky bucket capacity remains constant

counter

Sometimes we use a counter to limit the flow, mainly to control the total number of concurrent requests, such as database connection pool size, thread pool size, and the number of concurrent seconds. Traffic limiting is implemented when the total number of global requests or the total number of requests within a certain period reaches a specified threshold. It is a simple method of traffic limiting based on the total number of requests rather than the average rate.

The above is the basic flow limiting algorithm. The main differences between token buckets and leaky buckets are as follows:

The token bucket adds tokens to the bucket at a constant rate, while the leak bucket drains requests at a constant constant rate.
Token buckets allow a degree of burst (which can be handled as long as there are tokens), and the main purpose of a leaky bucket is to smooth the inflow rate.

Apply stage limiting

Total number of concurrent/connections/requests for traffic limiting

There is always a TPS/QPS threshold for an application system. If the threshold is exceeded, the system will not respond to user requests or will respond very slowly, so it is best to overload protection to prevent a flood of requests from overwhelming the system.

For example, one of the configurations of Connector in Tomcat has the following parameters:

AcceptCount: If all Tomcat threads are busy responding, new connections are queued. If the number exceeds the queue size, connections are rejected

MaxConnections: Indicates the maximum number of instantaneous connections. The exceeded number will be queued

MaxThreads: The maximum number of threads that Tomcat can start to process requests. If the number of requests being processed consistently exceeds the maximum number of threads, the response will be slow or even deadlocked.

In addition, MySQL and Redis have similar configurations to limit the number of connections.

Total number of traffic limiting resources

Pooling techniques can be used to limit the total number of resources, such as connection pools and thread pools, beyond which you can wait or throw exceptions

The total number of concurrent requests/requests on a flow limiting interface

The total number of concurrent requests or requests on the interface must be limited when the interface may crash due to sudden access or heavy traffic.

You can use AtomicLong or Semaphore in Java for limiting traffic.

If (atomy.incrementandGet () > limit){// reject request} // process request} finally {atomy.decrementandget (); }Copy the code

Suitable for traffic limiting services that can be degraded or need overload protection, such as buying up services (out of queue or directly notifying that they are out of stock). Open platforms that limit the number of trial requests a user can make to an interface can also use this approach. This traffic limiting mode is simple and violent without smoothing. Therefore, you need to use it based on the actual situation.

Number of time window requests on the traffic limiting interface

If you want to limit the number of requests per second/minute/day on an interface, you need to limit the number of requests. You can use Guava’s Cache to store counters with an expiration time of 2s(to ensure that counts are recorded within 1s). Get the current timestamp number of seconds as the key for counting statistics and limiting traffic.

LoadingCache<Long, AtomicLong> counter = CacheBuilder.newBuilder() .expireAfterWrite(2, TimeUnit.SECONDS) .build(new CacheLoader<Long, AtomicLong>() { @Override public AtomicLong load(Long key) throws Exception { return new AtomicLong(0); }}); long limit = 1000; Long currentSeconds = System.currentTimemillis () / 1000; If (counter.get(currentSeconds).incrementandGet () > limit){system.out.println (" currentSeconds :" + currentSeconds); continue; } // Business processing}Copy the code

This approach is also crude and simple without smoothing, but is sufficient for scenarios such as large requests per second or per minute. If some basic services are called by many other systems, requesting too many cars will require a speed limit for calls per second/minute.

Number of smooth traffic limiting interface requests

The flow limiting method described above can cause problems when instantaneous requests are allowed. In some scenarios, burst requests need to be processed as average rate requests. We can assume that the token bucket and leaky bucket algorithms we introduced earlier meet our requirements.

The token bucket algorithm provided by Guava RateLimiter can be used for SmoothBursty and Smooth WarmingUp implementations.

SmoothBursty

Burst requests are allowed and subsequent requests are shaped to a fixed rate.

// The bucket capacity is 5 and five tokens are added every second (one token is added every 200ms) RateLimiter limiter = ratelimite.create (5); System.out.println(limiter.acquire()); // Limiter.acquire () : consume a token (can consume more than one token at a time). System.out.println(limiter.acquire()); System.out.println(limiter.acquire()); System.out.println(limiter.acquire()); System.out.println(limiter.acquire()); System.out.println(limiter.acquire());Copy the code

The result is as follows:

0.0
0.194412
0.196526
0.19883
0.197466
0.19534
Copy the code

The first one gets the token directly, and the others wait about 200ms to get the token. The result of execution can be seen as a fixed request rate. Limiter.acquire () above acquires one token at a time, of course we can acquire more than one token at a time. For example, acquire five at a time using limiter.acquire(5).

/** * Acquires permits from this {@link RateLimiter} if it can be acquired immediately without delay. * * <p>This method  is equivalent to {@code tryAcquire(permits, 0, anyUnit)}. * * @param permits the number of permits to acquire * @return {@code true} if the permits were acquired, {@code false} otherwise * @throws IllegalArgumentException if the requested number of permits is negative or zero * @since permitting */ public Boolean tryAcquire(int permits) {return tryAcquire(MICROSECONDS) permitting; }Copy the code

Let’s look at an example of a mid-dormancy burst:

RateLimiter limiter = RateLimiter.create(2);
System.out.println(limiter.acquire());
Thread.sleep(2000L);
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());

System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
Copy the code

The execution result is as follows:

0.0
0.0
0.0
0.0
0.499609
0.491548
Copy the code

When the thread is asleep, it will generate tokens and hoard them in the bucket. If the bucket capacity is 2, it will hoard up to 2 tokens for 1s. Transient tokens allow for bursts of future requests.

One might be a little confused by what ratelimite.create (2) means, where 2 represents the number of permits available per second, but why is the capacity of the bucket also 2? That’s because there is a maximum number of seconds sudden SmoothBursty (maxBurstSeconds) parameters, bucket capacity rate = maximum number of seconds sudden (permitsPerSecondmaxBurstSeconds), maxBurstSeconds this parameter defaults to 1, Then RateLimiter. Create (2) has a bucket capacity of 2.

static RateLimiter create(double permitsPerSecond, SleepingStopwatch stopwatch) {RateLimiter RateLimiter = new SmoothBursty(stopwatch, 1.0 /* maxBurstSeconds */); rateLimiter.setRate(permitsPerSecond); return rateLimiter; }Copy the code

SmoothBursty calculates the time of the next token addition by averaging the rate and the time of the last token addition. Another bucket is required to hold tokens that have not been used for a while (the number of tokens that can be burst). RateLimiter also provides the tryAcquire() method for non-blocking or time-out token consumption.

/** * Acquires a permit from this {@link RateLimiter} if it can be acquired immediately without * delay. * * <p>This method is equivalent to {@code tryAcquire(1)}. * * @return {@code true} if the permit was acquired, {@code false} otherwise * @since 14.0 */ public Boolean tryAcquire() {return tryAcquire(1, 0, MICROSECONDS); } /** * Acquires a permit from this {@code RateLimiter} if it can be obtained without exceeding the * specified {@code timeout}, or returns {@code false} immediately (without waiting) if the permit * would not have been granted before the timeout expired. * * <p>This method is equivalent to {@code tryAcquire(1, timeout, unit)}. * * @param timeout the maximum time to wait for the permit. Negative values are treated as zero. * @param unit the time unit of the timeout argument * @return {@code true} if the permit was acquired, {@code false} otherwise * @throws IllegalArgumentException if the requested number of permits is negative or zero */ public boolean tryAcquire(long timeout, TimeUnit unit) { return tryAcquire(1, timeout, unit); }Copy the code

SmoothBurst allows a certain amount of burst, but if the amount is too high, the system may not be able to handle it. Therefore, a smoothing rate limiting tool is needed, which gradually approaches the average fixed rate after the system is cold started (that is, the rate is lower at the beginning and gradually approaches the set fixed rate). RateLimiter provides SmoothWarmingUp to fulfill this requirement.

Smooth WarmingUp

RateLimiter. Create (Double permitsPerSecond, Long warmupPeriod, TimeUnit unit) :

//permitsPerSecond: indicates the number of new tokens per second. //warmupPeriod: indicates the interval between the cold start rate and the average rate. Public static RateLimiter Create (double permitsPerSecond, long warmupPeriod, TimeUnit unit) { checkArgument(warmupPeriod >= 0, "warmupPeriod must not be negative: %s", warmupPeriod); Return the create (permitsPerSecond, warmupPeriod, unit, 3.0, SleepingStopwatch createFromSystemTimer ()); }Copy the code

Let’s look at an example:

RateLimiter limiter = RateLimiter.create(5, 1000, TimeUnit.MILLISECONDS);
for (int i = 0; i < 5; i++){
    System.out.println(limiter.acquire());
}
Thread.sleep(1000L);
for (int i = 0; i < 5; i++){
    System.out.println(limiter.acquire());
}
Copy the code

The result is as follows:

0.0
0.516492
0.354163
0.219246
0.195781
0.0
0.366611
0.222908
0.196205
0.196275
Copy the code

From the execution result, it can be seen that a relatively large rate slowly approaches the average rate. So we’re going to stop there.

conclusion

This paper mainly introduces the algorithm of current limiting and application – level current limiting. However, if an application is deployed on multiple machines, application traffic limiting can only be applied to a single application, not all applications. Distributed traffic limiting and access layer traffic limiting are needed to solve the problem of fully restricted traffic.

Reference Book: Core Technology of Website Architecture with 100 million Traffic