Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

This article also participated in the “Digitalstar Project” to win a creative gift package and creative incentive money

Before reading benefits, free access to oh computer essential classic books

For an excellent RPC framework, stream limiting is an essential function.

In the last article on service registration and discovery, we looked at one of the core functions of the microservices architecture. In this article, we will focus on another core function point of microservices: flow control.

In a micro service system, the system is based on a series of intrinsic function of micro service, if a service, because of abnormal traffic or for other reasons, resulting in abnormal response, so the same can also affect the calls to the service of other services, leading to a series of chain reaction, eventually lead to the whole system crash. We discussed solutions to this problem in our previous article on avalanche effects of microservices architecture, but today we will take a closer look at limiting flow control in the solution.

The introduction

Traffic limiting, for microservices architecture, is most directly related to the system carrying capacity. Any system had its service ability limit, if on the request link, some service request quantity more than its carrying capacity, then the request will not be able to normal response on a link, and at this point, if the client side for not returning request retry (retry), then the child service to already exceeding the maximum limit load, even rougher. In this mode, after dragging down a sub-service on the link, the upstream service may be affected, resulting in a continuous expansion of the impact range, and then other normal services also fail, resulting in an avalanche effect, which accelerates the failure of the whole system.

The way to solve this problem is to limit the flow. If this phenomenon is detected (the error rate increases, RT increases, or the service load exceeds its security threshold), some policies are directly enabled to discard new requests until the service load is recovered to make the whole system secure and reliable. That’s the purpose of limiting traffic. However, the difficulty of this mechanism is not to choose which framework or service to use, but whether there is a way to accurately grasp the load upper limit of each sub-service in the system, and the ability to integrate, and further automate the adjustment of traffic limiting strategy.

concept

Before explaining traffic limiting, let’s first understand the upper limit of service requests, also known as the service carrying capacity, which is the maximum number of requests a service can support in a certain period of time. Only when the service carrying capacity is quantified and can be measured, can certain corresponding measures be taken according to the measured value.

Service carrying capacity refers to the amount of processing per unit time. In the morning rush hour of Beijing subway, subway stations will do one thing, that is, limit the flow! The idea is straightforward: you want to limit requests to a certain range for a certain amount of time, keep the system from being overwhelmed, and maximize the throughput of the system.

My home bandwidth, for example, is 100 megabits, which means it can transmit up to 100 megabits of data per second. So how does Unicom limit this ceiling? If I were Unicom, I might have the following aspects:

  • The total number of bits transmitted per second is as fast as it can be as long as it does not exceed 100m bits per second
  • On average, it takes 0.01ns for each bit to be transmitted. Therefore, if 1bits are not transmitted, the transmission can continue only after 0.01ns
  • You only need to exceed 60*100Mb in 60 seconds
  • .

If you look at the schemes above, you will find that it is not so easy to achieve real restrictions. Because each scheme implementation principle is different, which means that the code implementation is different.

Now, if we use scenario 2, it will crash when we actually test it or go live, because the QPS control is not as controlled as we expected at all, because the recalculation process may have exceeded 0.01ns. Obviously, to control flow as accurately as possible, the following two questions need to be answered:

  • How to define the calculation method of traffic? Should I choose 1s, 10s or 60s?
  • What should I do if the traffic is over? Does it return a null value or a default value?

Obviously, after we think clearly about the above two problems, the implementation scheme can be basically decided. A new request has been received. You just need to confirm that the current service is still capable of processing the request. In this case, determine whether to discard the traffic or return another value based on the actual situation.

Commonly used way

The common methods of current limiting are as follows:

  • counter
  • The sliding window
  • bucket
  • The token bucket

Next, I will explain the above four methods of limiting traffic in depth, first explaining the principle, then the implementation, and finally analyzing their characteristics.

counter

Determine the maximum access to the method MAX, each time before entering the method counter +1, compare the results with the maximum concurrency MAX, if greater than or equal to MAX, then directly return; If less than MAX, continue execution.

Counter implementation, simple and crude. Over a period of time, count, compare with the threshold, and clear the counter to 0 when the critical point of time is reached.

In the figure above, we use a time slice of 1 minute, or 60 seconds, in which the maximum number of requests to be processed is 1000. If the maximum number is exceeded, the service is denied (depending on the actual business scenario).

The principle of

The idea of the counter is as follows:

  • Set the maximum access req_max within a unit time T (such as 10s), and maintain the counter count within a unit time T;
  • When the request arrives, determine whether the time goes to the next unit time;
  • If so, reset the counter to 0;
  • If not, count count, and determine whether count exceeds the maximum access req_max, if so, deny access.

implementation

For the above counter principle, the code implementation is as follows:

class CounterController {
 public:
  CounterController(int max, int duration) {
    max_ = max;
    duration_ = duration;
    last_update_time_ = time(nullptr);
  }

  bool IsValid() {
     uint64_t now = time(nullptr);
     if (now < last_update_time_ + duration_) {
        ++req_num_;
        return max_ > req_num_;
     } else {
       last_update_time_ = now;
       req_num_ = 1;
       return max_ > req_num_;
     }
  }
 private:
  int max_;
  int duration_;
  int last_update_time_;
  int req_num_ = 0;
};
Copy the code

In the implementation code, there are four member variables:

  • Max_ indicates the maximum number of requests to be processed in a time slice
  • Duration_ Code time slice, in seconds
  • Last_update_time_ Time when the counter was last updated
  • Req_num_ Number of requests processed in the current time slice

Where, the implementation of the counter flow limiting scheme is implemented in the member function IsValid(), that is, whether the request IsValid. In this function, we first judge whether the difference between the current timestamp and the last update timestamp exceeds the time slice, if the current timestamp is in the time slice after the last update, the number of requests +1, and then judge whether the number of requests exceeds the processing upper limit of the time slice. If it is not in the time slice since the last update, the update time and the number of requests are reset.

The characteristics of

  • Advantages: Simple implementation and easy to understand
  • Disadvantages:
    • System services are unavailable within a period of time. Taking Figure 1 as an example, the time slice is 60s. The number of requests processed reaches the upper limit in the first second of the time slice, and all requests will be rejected in the following 59s
    • At the time of the slice switch, twice as many requests as the upper limit may be generated. Again, take Figure 1 as an example. The time slice is 60s, and no requests come in the first 59s of the time slice. 1000 requests come in the first 60s, and then the time slice switches, and 1000 requests come in the first second of the new time slice. This means 2000 requests come in 2 seconds (the last second of the previous time slice + the first second of the current time slice), which obviously exceeds the upper limit of our time slice and may cause the system to crash

The sliding window

The counter sliding window algorithm is an improvement of the counter fixed window algorithm, which solves the problem that the traffic request may be twice as large as the threshold when the fixed window is switched.

The sliding window means dividing a fixed time slice and moving it as time goes by, which cleverly avoids the critical point problem of the counter. In other words, these fixed number of movable grids will be counted to judge the threshold, so the number of grids affects the accuracy of the sliding window algorithm.

TCP also uses sliding Windows to control network traffic. Interested students can read the principle of SLIDING Windows in TCP.

Counter mode is a special sliding window, whose window size is 1 time slice;

The principle of

The sliding window algorithm divides a timing window into several small Windows on the basis of fixed Windows, and then each small window maintains an independent counter. When the requested time is greater than the maximum time of the current window, the timing window is shifted forward by a small window. When panning, discard the data in the first small window, then set the second small window to the first small window, and create a new small window at the end of the new window, put the new request in the new small window. At the same time, ensure that the number of requests from all small Windows in the entire window cannot exceed the set threshold.

implementation

class SlidingWindowController { public: SlidingWindowController(int window_size, int limit, int split_num) { limit_ = limit; window_size_ = window_size; counters_.resize(split_num); split_num_ = split_num; } int IsValid() { uint64_t now_ms = 0; GetCurrentTimeMs(&now_ms); int window_num = std::max(now_ms - window_size_ - start_time_, 0) / (window_size_ / split_num_); SlidingWindow(window_num); int count = 0; for(int i = 0; i < split_num_; ++i){ count += counters_[i]; } if(count >= limit){ return false; }else{ counters_[index] ++; return true; } return true; } private: void SlidingWindow(int window_num) { if (window_num == 0) { return; } int slide_num = std::min(window_num, split_num_); for (int i = 0; i < slide_num; ++i) { index_ = (index_ + 1) % split_num; counters_[index_] = 0; } start_time_ = start_time_ + wind_num * (window_size_ / split_num_); } int window_size_; // Window size, in milliseconds int limit_; STD ::vector<int> counters_; uint64_t start_time_; // window start time int index_ = 0; // Current window timer index int split_num_; };Copy the code

The characteristics of

  • The problem of twice the threshold traffic request is avoided when the counter fixed window algorithm is switched
  • The implementation accuracy depends on the granularity of window subdivision. The finer the subdivision is, the more blocks the window is divided into, the smoother the flow limiting control is

bucket

The idea of leaky bucket algorithm is very simple. The water (request) enters the leaky bucket first, and the leaky bucket flows out of the water at a certain speed. When the inflow speed is too high, the water directly overflows.

Water droplets flow out of the bucket at a fixed rate, and drop into the bucket at any rate, the capacity of the bucket does not change.

Inflow: To put water droplets into a bucket at any rate.

Discharge: The discharge of water droplets from a bucket at a fixed rate.

Because the capacity in the bucket is fixed, water droplets in the bucket may overflow if the rate of inflow is greater than the rate of outflow. Then the overflow water drop requests are denied access, or directly invoke the service degradation method. Assuming it’s the same time.

However, for many scenarios, in addition to limiting the average data transfer rate, some degree of burst transmission is required. In this case, the leaky bucket algorithm may not be suitable, but the token bucket algorithm is more suitable.

The principle of

When a request comes in, it goes first into the funnel, which processes it at a constant rate, smoothing out the flow. When the requested traffic is too large, the funnel overflows when it reaches its maximum capacity and the request is discarded. From the system’s point of view, we don’t know when requests will come in or at what rate, which leaves the system vulnerable. But if you add a layer of funnel limiting, you can ensure that requests flow out at a constant rate. From the system’s point of view, requests are always coming in at a smooth rate, thus protecting the system.

implementation

class LeakyBucketController { public: LeakyBucketController(int rate) { capacity_ = rate; last_update_time_ = time(nullptr); Uint64_t now = time(nullptr); int out = (new - last_update_time_) * rate; if (out > 0) { last_update_time_ = now; } water_ = STD :: Max (0, water_ -out); If (water_ < capacity_) {++water_; return true; } return false; } private: int capacity_; int water_ = 0; uint64_t last_update_time_; };Copy the code

The characteristics of

  • The leakage rate of the leakage bucket is fixed. Therefore, even if the flow rate is variable, it becomes a steady flow with a fixed rate after passing through the funnel, which can protect the downstream system

  • The problem of sudden traffic cannot be solved. Suppose we set the funnel rate to 10 per second and the bucket capacity to 50. Now all of a sudden 100 requests come in, so only 50 requests are accepted and the other 50 are rejected. At this point, you might think that receiving 50 requests in a flash would solve the traffic problem. No, the 50 requests were only accepted, but not immediately processed. The processing speed was still the 10 / SEC we set, so it didn’t solve the traffic burst problem. Next we will talk about the token bucket algorithm can solve the problem of traffic burst to a certain extent.

The token bucket

Token bucket algorithm is an improvement of funnel algorithm, which can not only limit the traffic, but also allow a certain degree of traffic burst.

The token bucket algorithm puts the tokens into the bucket at a constant rate. At this time, if there is a token in the bucket, the token can be directly obtained and the request can be processed. Based on this principle, the problem that the leaky bucket algorithm cannot process the sudden traffic is solved.

The principle of

In the token bucket algorithm, tokens are put into the bucket at a constant rate. The bucket also has a certain amount of capacity, and if it’s full, tokens can’t be put in it. When a request comes in, it is restricted to getting a token in the bucket, how it got the token, the request is processed and consumed, otherwise, the request is discarded.

implementation

class TokenBucketController { public: TokenBucketController(int num, uint64_t duration) : duration_(duration), rate_(num > 0 ? (num / duration_ / 1000) : 0), limit_(num), modulo_(num > 0 ? (num % (duration * 1000)) : 0) { GetCurrentTimeMs(&last_update_time_); ::curr_idx = 0; } bool IsValid(); private: void Update(); const uint64_t duration_; const int rate_; const int limit_; const uint64_t modulo_; uint64_t last_update_time_; uint64_t loss_ = 0; uint64_t counts_ = 0; }; void TokenBucketController::Update() { uint64_t cur_time_ms; GetCurrentTimeMs(&cur_time_ms); uint64_t time_passed_since_last_update = cur_time_ms - last_update_time_; if (time_passed_since_last_update == 0) { return; } if (counts_ == static_cast<uint64_t>(limit_)) { last_update_time_ = cur_time_ms; return; } uint64_t count_to_add = rate_ * time_passed_since_last_update; loss_ += modulo_ * time_passed_since_last_update; if (loss_ >= duration_ * 1000) { count_to_add += loss_ / duration_ / 1000; loss_ %= (duration_ * 1000); } counts_ += count_to_add; if (counts_ > static_cast<uint64_t>(limit_)) { counts_ = limit_; } last_update_time_ = cur_time_ms; } bool TokenBucketController::IsValid() { if (limit_ < 0) { return true; } if (counts_ >= 1) { counts_ -= 1; return true; } Update(); if (counts_ >= 1) { counts_ -= 1; return true; } return false; }Copy the code

In implementation, the difference between token buckets and leaky buckets is that one controls in and one controls out. In the InValid function, check whether there are tokens in the bucket and return true if there are, otherwise, Update tokens in the bucket (Update function) and then check if there are any tokens available.

The characteristics of

The token bucket algorithm is the most widely used in the industry to limit the average rate of calls while allowing a certain degree of traffic burst.

conclusion

Let’s make a simple comparison of the four traffic limiting strategies in this paper as a summary of this paper. Counter algorithm: the algorithm is simple to implement and easy to understand. However, at the time of time slice switching, the flow twice as much as the threshold is easy to occur, which can also be said to be a simple version of the sliding window algorithm (there is only one window).

Sliding window algorithm: it solves the double threshold problem in the counter algorithm. Its flow control precision depends on the number of Windows. The more Windows there are, the more accurate the precision control will be.

Leaky bucket algorithm: Water drops are poured into the bucket at any rate. If the water drops in the bucket are not full, the service can be accessed and the burst traffic cannot be processed.

Token bucket algorithm: Generates corresponding tokens at a fixed rate (average rate) and puts them in the bucket. The client only needs to obtain the tokens in the bucket to access the service request. Compared with the leaky bucket algorithm, the token bucket algorithm can handle certain burst traffic.

Token bucket algorithm is to limit the flow by controlling the rate of token generation.

Leaky bucket algorithm controls the flow rate of requests out of buckets to limit traffic.

Simply understood as: token bucket control in, leak bucket control out.

If you want to keep your system from crashing, use token buckets. If you want to keep someone else’s system from crashing, use the leaky bucket algorithm.

The above four flow limiting algorithms have their own characteristics, and the specific use should be combined with their own scenes to select, there is no best algorithm, only the most appropriate algorithm. For example, the token bucket algorithm is generally used to protect its own system by limiting the traffic of callers to protect its own system from being overwhelmed by sudden traffic. If the actual processing capability of the system is stronger than the configured traffic limit, traffic bursts can be allowed to some extent. In this way, the actual processing rate is higher than the configured rate, making full use of system resources. The funnel algorithm is generally used to protect third-party systems. For example, a third-party system needs to invoke a third-party interface. To protect the third-party system from being overwhelmed by its own invocation, the funnel algorithm can be used to limit traffic to ensure that the traffic flows smoothly to the third-party interface.

Algorithms are dead, and the essence of the ideas in algorithms is worth learning. The actual scene can be flexibly used, or that there is no best algorithm, only the most appropriate algorithm.