**

This article is originally published in my public account [Craig Wuji], I hope you can pay attention to me, at least a life, become Coding King!!

**

preface

Start with a classic piece of dialogue

Erkang: Crape Myrtle, I want to chase you to the horizon.

Crape Myrtle: NOW I feel excited and scared and excited and happy and blissful, just worried…

Erkang: Don’t worry. Let’s just enjoy this moment. It’s a priceless opportunity.

Crape myrtle: just, the last time you embrace, has been a long time for a long time.

Erkang: Like you, too much.

Crape MYRtle: Me too.

Erkang: What did you say? I didn’t hear you!

Crape myrtle: Me too, me too, me too. I have as much as you have! No, no, I have more than you.

Erkan: You can’t have more than me, because I’m full!

Crape myrtle: if you are full, then I will overflow!

With no rate limit, if every user could send as many requests as they wanted, this would result in “spikes” that would be “exciting and scary” for our system.

Rate Limiting refers to a series of algorithms that limit the frequency at which each user can invoke the API to prevent the API from being used excessively. This article will introduce several classical limiting algorithms, I believe that we must have a harvest after the patience to see the code word is not easy, don’t forget to “see”, “forward” oh.

  • Counter algorithm

  • Sliding time window algorithm

  • Bucket algorithm

  • Token bucket algorithm

The body of the

01 Counter Algorithm

Counter algorithm implementation, relatively simple and rough.

We can directly limit the number of requests we can receive per second through a counter. For example, if the QPS is set to 1000, the implementation idea is to start the timer from the first request, within the next 1s, each request, increment the count by one, if the cumulative number reaches 1000, then all subsequent requests will be rejected. Wait until the 1s end, restore the count to 0, restart the count.

Implementing counters in the Java language is simple, and thread safety is a concern because of concurrent requests. So you can use atomic classes like AtomicLong, whose own incrementAndGet methods are thread-safe via CAS (Compare And Swap). Each time a request comes in, the counter is increased by one and compared to the threshold.

Why is the implementation of the counter simple and crude?

If 1,000 requests have been received in the first half of a second, the request can only be rejected in the next half second. We call this phenomenon “spike phenomenon”.

02 Sliding time window algorithm

In view of the phenomenon of spike caused by instantaneous large flow, the sliding time window algorithm arises at the historic moment.

This algorithm divides time into small time segments, similar to the CPU’s time slice design. After each time segment, our time window slides one space to the right, and each time segment has its own counter.

We add up counters for all time segments when calculating the total number of requests for the entire time window. The finer the time window is divided, the smoother the rolling of the sliding window will be, and the more accurate the statistics of current limiting will be.

The sliding time window algorithm is also commonly used in TCP, which will be covered in the networking section.

03 Leaky bucket algorithm

Leaky bucket algorithm as the name suggests, we imagine a leaky bucket, which could also be a funnel.

When the request comes in, it’s like water pouring into a funnel, with a big top and a small top. Water accumulates in the funnel and slowly flows out through a small opening at the bottom. No matter how big the flow is above, the velocity of the flow below remains the same.

The leaky bucket algorithm allows the server to take no account of the caller and receive at a constant rate, such as processing a request every 10 milliseconds.

However, leaky bucket algorithm also has some defects, because the capacity of a leaky bucket is always limited. If you’re asking too much, you’re really echoing the words “I’m going to flood out!” . If the bucket is full, incoming requests are discarded.

The algorithm can be implemented in many ways. For example, in Java, data structures such as queues can be used as leaky buckets, and requests can be put into queues when they arrive. Requests are then periodically fetched from the queue and executed through a thread pool, or multiple tasks can be fetched at once and executed concurrently.

04 Token bucket algorithm

The token bucket algorithm can be said to be an upgrade to the leaky bucket algorithm, which controls the rate of receiving requests by the size of the exit. On this basis, token bucket algorithm also further improves the processing of burst traffic.

The token bucket algorithm also has a bucket for storing a certain number of tokens, which are then generated at a set rate and placed in the bucket.

It needs to get the token each time the request arrives. A call to the server can only be made after a token has been obtained. If not, either wait or be rejected.

Tokens are generated and placed in the bucket on a continuous basis, and if fewer requests are made and tokens are consumed more slowly than they are generated, more and more tokens are in the bucket. If there is always a large number of available tokens in the bucket, incoming requests can then be executed directly with the token.

The token bucket algorithm can be implemented by using a queue to store tokens, using a thread pool to periodically generate tokens in the queue, each request, a token from the queue, and continue to execute.

05 summary

Through this article, we introduce several classical traffic limiting algorithms. However, the scope of this paper is limited to the scope of single machine. If it is in a clustered distributed environment, the implementation idea and design pattern of the algorithm will be somewhat changed, and I will introduce it to you later.

However, in my opinion, there is no superior or inferior algorithm, and it is most important to find the right one for your scene.

Welcome to my public account “Criag Mowgli”