The interfaces of back-end services have access limits. If the external QPS or concurrency exceeds the access limits, the application will break down. Therefore, interface calls are typically protected by limiting traffic to prevent system failures due to unexpected requests.
Generally speaking, traffic limiting is classified into two types: concurrent traffic limiting and QPS traffic limiting. Concurrent traffic limiting limits the maximum number of concurrent requests at a time. QPS traffic limiting limits the number of requests within a period of time.
From the scope of the level of the point of view of single machine and distributed traffic limiting, the former is for single machine, the latter is for cluster, their ideas are the same, but the scope is not the same, this paper is the analysis of single machine traffic limiting.
Next we look at concurrent number limiting and QPS limiting.
More articles can be found on my blog: github.com/farmerjohng…
Concurrent traffic limiting
Concurrency limiting limits the number of concurrent streams at the same time, so regardless of thread safety, we can achieve this by simply using an int variable, pseudo code is as follows:
int maxRequest=100;
int nowRequest=0;
public void request(a){
if(nowRequest>=maxRequest){
return ;
}
nowRequest++;
// Call the interface
try{
invokeXXX();
}finally{ nowRequest--; }}Copy the code
Obviously, the above implementation has thread-safety issues, the most direct way is to lock:
int maxRequest=100;
int nowRequest=0;
public void request(a){
if(nowRequest>=maxRequest){
return ;
}
synchronized(this) {if(nowRequest>=maxRequest){
return ;
}
nowRequest++;
}
// Call the interface
try{
invokeXXX();
}finally{
synchronized(this){ nowRequest--; }}}Copy the code
AtomicInteger can also be used:
int maxRequest=100;
AtomicInteger nowRequest=new AtomicInteger(0);
public void request(a){
for(;;) {int currentReq=nowRequest.get();
if(currentReq>=maxRequest){
return;
}
if(nowRequest.compareAndSet(currentReq,currentReq+1)) {break; }}// Call the interface
try{
invokeXXX();
}finally{ nowRequest.decrementAndGet(); }}Copy the code
Those of you familiar with the JDK and sending packages will say why bother, that’s what Semaphore does. That’s right, actually the easiest way to do this is with semaphores:
int maxRequest=100;
Semaphore reqSemaphore = new Semaphore(maxRequest);
public void request(a){
if(! reqSemaphore.tryAcquire()){return ;
}
// Call the interface
try{
invokeXXX();
}finally{ reqSemaphore.release(); }}Copy the code
All roads lead to Rome, and concurrent flow limiting is relatively simple, generally using semaphores is good.
QPS current-limiting
QPS traffic limiting limits the number of requests within a period of time (usually 1 second).
Counter method
The simplest way to do this is to use a count variable of type int as the counter: the pre-request counter +1, if the threshold is exceeded and the interval from the first request is less than 1s, then the flow is limited.
The pseudocode is as follows:
int maxQps=100;
int count;
long timeStamp=System.currentTimeMillis();
long interval=1000;
public synchronized boolean grant(a){
long now=System.currentTimeMillis();
if(now<timeStamp+interval){
count++;
return count<maxQps;
}else{
timeStamp=now;
count=1;
return true; }}Copy the code
This method is simple to implement, but there is a critical problem. If 100 requests come in the last 500ms of the first second and 100 requests come in the first 500ms of the second, the maximum QPS in this 1 second is actually 200. The diagram below:
Counter method will have critical problems, mainly or the statistical accuracy is too low, this can be solved by sliding window algorithm
The sliding window
We use an array of length 10 to represent QPS requests in 1 second, with each element of the array corresponding to the number of requests in 100ms. Use a sum variable to code the current 1s of requests. Expired values are also eliminated every 100ms.
The pseudocode is as follows:
int maxQps=100;
AtomicInteger[] count=new AtomicInteger[10];
long timeStamp=System.currentTimeMillis();
long interval=1000;
AtomicInteger sum;
volatile int index;
public void init(a){
for(int i=0; i<count.length; i++){ count[i]=new AtomicInteger(0);
}
sum=new AtomicInteger(0);
}
public synchronized boolean grant(a){
count[index].incrementAndGet();
return sum.incrementAndGet()<maxQps;
}
// Execute every 100ms
public void run(a){
index=(index+1)%count.length;
int val=count[index].getAndSet(0);
sum.addAndGet(-val);
}
Copy the code
The smaller the sliding window, the higher the accuracy and the higher the resource consumption.
Bucket algorithm
Leaky bucket algorithm idea is that there is a fixed size bucket, water (request) quickly and slowly into the leaky bucket, the leaky bucket at a certain speed of water. An overflow occurs when the bucket is full.
As you can see from Wikipedia, the leaky bucket algorithm has two implementations, one is as a meter, the other is as a queue. Most articles on the web do not mention two implementations and are confused about the two concepts.
As a meter
The first implementation is equivalent to token buckets, but expressed in a different way.
The pseudocode is as follows:
long timeStamp=System.currentTimeMillis();// The last time grant was called
int bucketSize=100;/ / barrel size
int rate=10;// How many requests flow out per ms
int count;// The current amount of water
public synchronized boolean grant(a){
long now = System.currentTimeMillis();
if(now>timeStamp){
count = Math.max(0,count-(now-timeStamp)*rate);
timeStamp = now;
}
if(count+1<=bucketSize){
count++;
return true;
}else{
return false; }}Copy the code
This implementation allows burst traffic within a period of time. For example, when there is no water in the bucket at the beginning, 100 requests come in 1ms. The 100 requests will not be restricted, but each ms can only accept 10 requests at most (for example, if 100 requests come in the next 1ms, 90 requests will be restricted).
It achieves the same effect as a token bucket.
As a queue
The second implementation is implemented with a queue. When a request arrives, if the queue is not full, it will join the queue, otherwise it will reject the new request. Requests are pulled from the queue for execution at a constant rate.
The pseudocode is as follows:
Queue<Request> queue=new LinkedBlockingQueue(100);
int gap;
int rate;
public synchronized boolean grant(Request req){
if(! queue.offer(req)){return false;}
}
// Separate thread execution
void consume(a){
while(true) {for(int i=0; i<rate; i++){// Execute the request
Request req=queue.poll();
if(req==null) {break;}
req.doRequest();
}
Thread.sleep(gap);
}
}
Copy the code
For this algorithm, the speed of the request is fixed and the traffic burst is not allowed.
For example, if the bucket is empty and 100 requests come in 1ms, only the first 10 will be accepted and the rest will be rejected. Note the difference with the as a Meter implementation above.
** However, the second leaky bucket algorithm is equivalent to the first leaky bucket algorithm when the bucket size is equal to the amount of water flowing out of each ticket. ** That is,as a queue is a special implementation of as a meter. When bucketSize==rate, the request speed is constant and no burst traffic is allowed.
Token bucket algorithm
The idea of the token bucket algorithm is that there are at most N tokens in the bucket, and tokens will be added to the bucket at a certain rate. Each request needs to take out the corresponding token from the token bucket to permit. If there is no token in the bucket, the flow will be limited.
The token bucket algorithm is equivalent to the leaky bucket algorithm as a meter implementation above, which can limit the average data transmission rate while allowing some degree of burst transmission. Pseudo code:
int token;
int bucketSize;
int rate;
long timeStamp=System.currentTimeMillis();
public synchronized boolean grant(a){
long now=System.currentTimeMillis();
if(now>timeStamp){
token=Math.max(bucketSize,token+(timeStamp-now)*rate);
timeStamp=now;
}
if(token>0){
token--;
return true;
}else{
return false; }}Copy the code
Two implementations of the leaky bucket algorithm and the comparison of the token bucket algorithm
The leaky bucket algorithm of AS a Meter is the same as the token bucket algorithm, but the thought Angle is different.
As a Queue leaky bucket algorithm can restrict the data transfer rate, while token bucket and AS a meter leaky bucket can limit the average data transfer rate while allowing some degree of burst transmission.
The token bucket algorithm is commonly used in the industry. For example, RateLimiter in Guava is based on the token bucket algorithm. Of course, different business scenarios will have different needs, and the specific choice should be combined with the scenario.
End
This paper introduces the common traffic limiting algorithms in back-end systems. For each algorithm, there are corresponding pseudo-codes, which should be easy to understand. However, the pseudo-code only describes the general idea, and does not focus on some details and efficiency issues, so the next article will analyze the common flow limiting API: Guava’s RateLimiter source implementation, to give readers a clearer understanding of the flow limiting.
Refer to the article
Blog.zhuxingsheng.com/blog/counte…
Blog.51cto.com/leyew/86030…
En.wikipedia.org/wiki/Leaky_…