This article is from the official account:Java Universe (wechat: Javagogo)

Original link:Mp.weixin.qq.com/s/z6LNroSuY…

By Bing Yue

For business systems, high concurrency is to support “massive user requests” and QPS can be hundreds of times higher than normal.

If the situation of high concurrency is not considered, even if the business system normally runs well, once the amount of concurrency increases, various weird business problems will frequently occur. For example, in the e-commerce business, there may be user order loss, abnormal inventory deduction, oversold and other problems.

Traffic limiting is a means of service degradation. As the name suggests, it protects the system by limiting the traffic of the system.

To configure traffic limiting properly, you need to know the throughput of the system. Therefore, traffic limiting is carried out in combination with capacity planning and manometry.

When external requests approach or reach the maximum threshold of the system, traffic limiting is triggered and other measures are taken to degrade the system and protect the system from being overwhelmed. Common degradation strategies include delayed processing, denial of service, and random denial.

This policy is similar to the thread pool in Java concurrent programming. As we all know, the thread pool can configure different rejection policies when the task is full. For example:

  • AbortPolicy, which discards the task and throws an exception

  • DiscardPolicy, discards the task without throwing an exception

  • DiscardOldestPolicy, DiscardOldestPolicy, etc

Java thread pool is a small function point in development, but it can be extended to the design and architecture of the system to transfer and reuse knowledge reasonably.

There is a key point in the traffic limiting scheme, that is, how to judge the current traffic has reached the maximum value we set. There are different implementation strategies, and the following is a simple analysis.

1. Counter method

Generally speaking, we use the number of requests per unit time for traffic limiting, also known as QPS. The most straightforward idea of counting QPS is to implement a counter.

The counter method is the simplest of the traffic limiting algorithms. We assume that an interface limits the number of accesses within 100 seconds to no more than 10000. We maintain a counter, and each time a new request comes in, the counter increases by 1.

At this point,

  • If the counter value is less than the flow limit value and the time elapsed from the last request is less than 100 seconds, the request is allowed to pass, otherwise the request is rejected
  • If the interval is exceeded, the counter is zeroed out

The following code uses AtomicInteger as a counter for reference:

Public Class CounterLimiter {private static Long startTime = System.currentTimemillis (); public Class CounterLimiter {private static Long startTime = System.currentTimemillis (); Private static final AtomicInteger ZERO = new AtomicInteger(0); Private static final int Interval = 10000; Private static int limit = 100; Private AtomicInteger requestCount = ZERO; Public Boolean tryAcquire() {long now = system.currentTimemillis (); If (now < startTime + interval) {if (requestCount.get() < limit) { requestCount.incrementAndGet(); return true; } return false; } else {// reset requestCount = ZERO; startTime = now; return true; }}}Copy the code

The counter policy limits traffic and can be extended from a single point to a cluster. It is suitable for the distributed environment.

Single-point traffic limiting uses memory. If you expand traffic limiting to a cluster, you can use a separate storage node, such as Redis or Memcached, to store traffic in the cluster. You can set the expiration time at a fixed interval.

The counter strategy has one major disadvantage: it is not friendly to critical flow and the limiting is not smooth enough.

Imagine a scenario where we limit users to placing no more than 100,000 orders per minute, and now send 100,000 requests each one second at the intersection of two time Windows. In other words, within two seconds of the window switch, the system received 200,000 order requests, which may exceed the system threshold and affect the service stability.

The optimization of the counter algorithm, is to avoid the double window limit request, can use the sliding window algorithm to achieve, interested students can go to know.

2. Leaky bucket and token bucket algorithms

Leaky bucket algorithm and token bucket algorithm are more widely used in practice and are often compared.

Leaky bucket algorithm can be compared with leaky bucket. Assuming that there is a fixed capacity bucket and a hole is drilled at the bottom to leak water, we control the processing of requests by controlling the speed of leakage to achieve the function of flow limiting.

The rejection strategy of leak-bucket algorithm is simple: if the external request exceeds the current threshold, it will accumulate in the bucket until the overflow, and the system does not care about the overflow traffic.

The leaky bucket algorithm limits the request rate from the exit and does not have the critical problem of the above counter method. The request curve is always smooth.

One of its core problems is that the filtering of requests is so precise. We often say “clean water means no fish”, but the same is true in flow limiting. We limit orders to 100,000 times per second. Do I have to turn it down?

In most business scenarios, the answer is no. Although the traffic is limited, the system is still expected to allow a certain burst of traffic. In this case, the token bucket algorithm is needed.

In the token bucket algorithm, it was assumed that we have a constant to the size of the barrel, the volume of this barrel and setting threshold are concerned, there is lot of token bucket, through a fixed rate, in there in a token, if the bucket is full, just throw in the token, finally the maximum number of tokens can be stored in the bucket will never be more than the size of the barrel. When a request comes in, an attempt is made to remove a token from the bucket, and if the bucket is empty, the request is rejected.

Have you ever used Google’s Guava open source toolkit? RateLimiter, a tool class for limited flow policy in Guava, is very convenient to use. RateLimiter implements traffic limiting based on the token bucket algorithm.

RateLimiter will throw tokens into the bucket at a certain frequency until the thread gets the token. RateLimiter’s API can be applied directly, mainly through acquire and tryAcquire.

Acquire blocks, and the tryAcquire method is non-blocking.

Here is a simple example:

Public class LimiterTest {public static void main(String[] args) throws InterruptedException { permitsPerSecond RateLimiter limiter = RateLimiter.create(100); for(int i=1; i<200; I++) {if (limiter tryAcquire (1)) {System. Out. Println (" first "+ I +" request succeeds "); }else{system.out.println (" + I +"); }}}}Copy the code

Comparison of different traffic limiting algorithms

Counter algorithm implementation is relatively simple, especially suitable for clustering, but to consider the critical situation, you can apply the sliding window strategy for optimization, of course, also depends on the specific flow limiting scenario.

Leaky bucket algorithm and token bucket algorithm. Leaky bucket algorithm provides strict traffic limiting, while token bucket algorithm allows burst traffic to a certain extent in addition to traffic limiting. In real development, we don’t need to control traffic so precisely, so the token bucket algorithm is used more often.

If we set the traffic peak to permitsPerSecond=N, which is the number of requests per second, the counter algorithm will show 2N traffic, the leaky bucket algorithm will always limit N traffic, and the token bucket algorithm will allow more than N, but not as high as 2N.


Welcome big guys to pay attention to the public account of the Java universe (wechat: Javagogo), refuse hydrology, harvest dry goods!