The interfaces of back-end services have access limits. If the external QPS or concurrency exceeds the access limits, the application will break down. Therefore, interface calls are typically protected by limiting traffic to prevent system failures due to unexpected requests.

Generally speaking, traffic limiting is classified into two types: concurrent traffic limiting and QPS traffic limiting. Concurrent traffic limiting limits the maximum number of concurrent requests at a time. QPS traffic limiting limits the number of requests within a period of time.

From the scope of the level of the point of view of single machine and distributed traffic limiting, the former is for single machine, the latter is for cluster, their ideas are the same, but the scope is not the same, this paper is the analysis of single machine traffic limiting.

Next we look at concurrent number limiting and QPS limiting.

Concurrent traffic limiting

Concurrency limiting limits the number of concurrent streams at the same time, so regardless of thread safety, we can achieve this by simply using an int variable, pseudo code is as follows:

int maxRequest=100;
int nowRequest=0;

public void request() {if(nowRequest>=maxRequest){
        return; } nowRequest++; // Call interface try{invokeXXX(); }finally{ nowRequest--; }}Copy the code

Obviously, the above implementation has thread-safety issues, the most direct way is to lock:

int maxRequest=100;
int nowRequest=0;
 
public void request() {if(nowRequest>=maxRequest){
        return ;
    }
	synchronized(this){
         if(nowRequest>=maxRequest){
        	return; } nowRequest++; } // Call interface try{invokeXXX(); }finally{ synchronized(this){ nowRequest--; }}}Copy the code

AtomicInteger can also be used:

int maxRequest=100;
AtomicInteger nowRequest=new AtomicInteger(0);
 
public void request() {for(;;) { int currentReq=nowRequest.get();if(currentReq>=maxRequest){
            return;
        }
        if(nowRequest.compareAndSet(currentReq,currentReq+1)){
            break; } // Call interface try{invokeXXX(); }finally{ nowRequest.decrementAndGet(); }}Copy the code

Those of you familiar with the JDK and sending packages will say why bother, that’s what Semaphore does. That’s right, actually the easiest way to do this is with semaphores:

int maxRequest=100;
Semaphore reqSemaphore = new Semaphore(maxRequest);
 
public void request() {if(! reqSemaphore.tryAcquire()){return; } // Call interface try{invokeXXX(); }finally{ reqSemaphore.release(); }}Copy the code

All roads lead to Rome, and concurrent flow limiting is relatively simple, generally using semaphores is good.

QPS current-limiting

QPS traffic limiting limits the number of requests within a period of time (usually 1 second).

Counter method

The simplest way to do this is to use a count variable of type int as the counter: the pre-request counter +1, if the threshold is exceeded and the interval from the first request is less than 1s, then the flow is limited.

The pseudocode is as follows:

int maxQps=100;
int count;
long timeStamp=System.currentTimeMillis();
long interval=1000;

public synchronized boolean grant(){
	long now=System.currentTimeMillis();
    if(now<timeStamp+interval){
        count++;
        return count<maxQps;
    }else{
        timeStamp=now;
        count=1;
        return true; }}Copy the code

This method is simple to implement, but there is a critical problem. If 100 requests come in the last 500ms of the first second and 100 requests come in the first 500ms of the second, the maximum QPS in this 1 second is actually 200. The diagram below:



Counter method will have critical problems, mainly or the statistical accuracy is too low, this can be solved by sliding window algorithm

The sliding window

We use an array of length 10 to represent QPS requests in 1 second, with each element of the array corresponding to the number of requests in 100ms. Use a sum variable to code the current 1s of requests. Expired values are also eliminated every 100ms.

The pseudocode is as follows:

int maxQps=100;
AtomicInteger[] count=new AtomicInteger[10];
long timeStamp=System.currentTimeMillis();
long interval=1000;
AtomicInteger sum;
volatile int index;

public void init() {for(int i=0; i<count.length; i++){ count[i]=new AtomicInteger(0); } sum=new AtomicInteger(0); } public synchronized booleangrant(){
    count[index].incrementAndGet();
    returnsum.incrementAndGet()<maxQps; } // Execute public void every 100msrun(){
    index=(index+1)%count.length;
    int val=count[index].getAndSet(0);
    sum.addAndGet(-val);
}Copy the code

The smaller the sliding window, the higher the accuracy and the higher the resource consumption.

Bucket algorithm

Leaky bucket algorithm idea is that there is a fixed size bucket, water (request) quickly and slowly into the leaky bucket, the leaky bucket at a certain speed of water. An overflow occurs when the bucket is full.



As you can see from Wikipedia, the leaky bucket algorithm has two implementations, one is as a meter, the other is as a queue. Most articles on the web do not mention two implementations and are confused about the two concepts.

As a meter

The first implementation is equivalent to token buckets, but expressed in a different way.

The pseudocode is as follows:

long timeStamp=System.currentTimeMillis(); Int bucketSize=100; Int rate=10; // How many requests per ms int count; Public synchronized Booleangrant(){
    long now = System.currentTimeMillis();
    if(now>timeStamp){
         count = Math.max(0,count-(now-timeStamp)*rate); 
         timeStamp = now;
    }
 
    if(count+1<=bucketSize){
        count++;
        return true;
    }else{
        return false; }}Copy the code

This implementation allows burst traffic within a period of time. For example, when there is no water in the bucket at the beginning, 100 requests come in 1ms. The 100 requests will not be restricted, but each ms can only accept 10 requests at most (for example, if 100 requests come in the next 1ms, 90 requests will be restricted).

It achieves the same effect as a token bucket.

As a queue

The second implementation is implemented with a queue. When a request arrives, if the queue is not full, it will join the queue, otherwise it will reject the new request. Requests are pulled from the queue for execution at a constant rate.



The pseudocode is as follows:

Queue<Request> queue=new LinkedBlockingQueue(100);
int gap;
int rate;

public synchronized boolean grant(Request req){
	if(! queue.offer(req)){return false; }} // Separate threads execute voidconsume() {while(true) {for(int i=0; i<rate; I++){Request req=queue.poll();if(req==null){break;}
            req.doRequest();
        }
        Thread.sleep(gap);
    }
}Copy the code

For this algorithm, the speed of the request is fixed and the traffic burst is not allowed.

For example, if the bucket is empty and 100 requests come in 1ms, only the first 10 will be accepted and the rest will be rejected. Note the difference with the as a Meter implementation above.

** However, the second leaky bucket algorithm is equivalent to the first leaky bucket algorithm when the bucket size is equal to the amount of water flowing out of each ticket. ** That is,as a queue is a special implementation of as a meter. When bucketSize==rate, the request speed is constant and no burst traffic is allowed.

Token bucket algorithm

The idea of the token bucket algorithm is that there are at most N tokens in the bucket, and tokens will be added to the bucket at a certain rate. Each request needs to take out the corresponding token from the token bucket to permit. If there is no token in the bucket, the flow will be limited.



The token bucket algorithm is equivalent to the leaky bucket algorithm as a meter implementation above, which can limit the average data transmission rate while allowing some degree of burst transmission. Pseudo code:

int token;
int bucketSize;
int rate;
long timeStamp=System.currentTimeMillis();

public synchronized boolean grant(){
	long now=System.currentTimeMillis();
    if(now>timeStamp){
         token=Math.max(bucketSize,token+(timeStamp-now)*rate);
         timeStamp=now;
    }
    if(token>0){
        token--;
    	return true;
    }else{
        return false; }}Copy the code

Two implementations of the leaky bucket algorithm and the comparison of the token bucket algorithm

The leaky bucket algorithm of AS a Meter is the same as the token bucket algorithm, but the thought Angle is different.

As a Queue leaky bucket algorithm can restrict the data transfer rate, while token bucket and AS a meter leaky bucket can limit the average data transfer rate while allowing some degree of burst transmission.

The token bucket algorithm is commonly used in the industry. For example, RateLimiter in Guava is based on the token bucket algorithm. Of course, different business scenarios will have different needs, and the specific choice should be combined with the scenario.

End

This paper introduces the common traffic limiting algorithms in back-end systems. For each algorithm, there are corresponding pseudo-codes, which should be easy to understand. However, the pseudo-code only describes the general idea, and does not pay attention to some details and efficiency issues.