Scenario 1

In the field of distributed, large projects have been broken up separate many services, and is a good project organization calls each service, to achieve aggregate user resources, because it is distributed, each micro service deployed separately for each server, so that we can easily scale micro service resources, increase the number of servers, for example, dynamic sends a request, Sent via algorithm, selection of server applications, but the system processing capacity is limited, a large number of requests continue to put pressure on the system to crash the service, is not available, such as interface malicious attacks, a large number of requests to the service interface, lead to the service interface does not work, other first thought is service. But this section is not primarily about service isolation, but rather another flow limiting that prevents unplanned requests from pressuring the system.

2 current limiting

Traffic limiting, in addition to controlling traffic, also aims to control user behavior and avoid spam requests. For example, users in Tieba should strictly control their posts, replies, likes and other behaviors, strictly limiting a certain behavior to N times within a specified period of time

2.1 Current limiting scheme

  • Application layer limiting: NGINX, gateway
  • Current limiting algorithm: token bucket, leaky bucket, counter can also be implemented rough current limiting

2.2 Application Layer Traffic limiting

  • NGINX

Everyone is familiar with, not only can do lightweight Web server, itself can also do load balancing, limiting traffic

The server receives hundreds of millions of requests every day. How do you control the access per request time?

NGINX uses geo,map,limit_req to limit traffic. See the following configuration:

## IP white list
geo $whiteiplist {
     default 1;
     127.0.0.1 0;
     10.0.0.0/8 0;
  
}
map $whiteiplist $limit {
     1 $limit_key;
     0 "";
}
## Interface whitelist
map $uri $limit2{
     default $limit;
     /api/sample "";

}
limit_req_status 406;
### Frequency control
limit_req_zone $limit2 zone=freq_controll:100m rate=10r/s;
limit_req_zone $limit2 zone=freq_controll_2:100m rate=500r/m;
Copy the code

Use the limiting algorithm defined above in location

location / { limit_req zone=freq_controll burst=5 nodelay; limit_req zone=freq_controll_2 burst=10 nodelay; error_page 406 =406 @f406; Location@f406 {access_log syslog:server=127.0.0.1:12301; return 406; }}Copy the code

I was also confused when I noticed the two strange parameters burst and nodelay in limit_req, and understood the logic and algorithm behind it.

The algorithm of limit_REq module belongs to token bucket algorithm. The burst parameter is used to handle some sudden load: suppose there are 120 simultaneous requests to the server in a second. Using a traditional funnel algorithm, the additional 20 requests are either rejected outright or put in a queue. The token-bucket algorithm is a different story. There are actually 100 tokens in the token bucket. However, 10 concurrent requests are allowed. So 10 more requests will be rejected.

  1. If nodelay is not configured, the 10 requests will be queued. It was removed at a rate of 0.001 seconds, with a total consumption of 0.1 seconds. 110 requests were processed in 1.1 seconds. In fact, this wait is unnecessary.

  2. With Nodelay configured, the extra ten requests will be processed normally, but the number of bursts will be emptied. Wait until the token is replenished before the request is re-received. 110 requests were processed in 1 second. However, in the same case, the request will not be received until the token is replenished.

  • The gateway current-limiting

The gateway mentioned here is the client gateway, and the representative ones are Zuul and Spring Cloud Gateway. The unified entry limit prevents high-frequency requests between servers from causing service crash

Using the RequestRateLimiter filter in the Spring Cloud Gateway can be used to limit the flow. Using the RateLimiter implementation to determine whether the current request is allowed to continue, if the request is too large the default return HTTP 429- too many requests status.

  1. Add dependencies to pom.xml:
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis-reactive</artifactId>
</dependency>
Copy the code
  1. Add a configuration class for traffic limiting policies. There are two policies: one is to limit traffic according to username in the request parameter, and the other is to limit traffic according to the access IP address.
@Configuration
public class RedisRateLimiterConfig {
    @Bean
    KeyResolver userKeyResolver(a) {
        return exchange -> Mono.just(exchange.getRequest().getQueryParams().getFirst("username"));
    }

    @Bean
    public KeyResolver ipKeyResolver(a) {
        returnexchange -> Mono.just(exchange.getRequest().getRemoteAddress().getHostName()); }}Copy the code
  1. We use Redis for traffic limiting, so we need to add Redis and RequestRateLimiter configurations to limit traffic by IP for all GET requests
server:
  port: 9201
spring:
  redis:
    host: localhost
    password: 123456
    port: 6379
  cloud:
    gateway:
      routes:
        - id: requestratelimiter_route
          uri: http://localhost:8201
          filters:
            - name: RequestRateLimiter
              args:
                redis-rate-limiter.replenishRate: 1 Number of requests per second allowed to be processed
                redis-rate-limiter.burstCapacity: 2 # Maximum number of requests processed per second
                key-resolver: "#{@ipKeyResolver}" # Traffic limiting policy, corresponding to the policy Bean
          predicates:
            - Method=GET
logging:
  level:
    org.springframework.cloud.gateway: debug
Copy the code

2.2 Traffic limiting algorithm

2.2.1 counter

It is the simplest and easiest algorithm among traffic limiting algorithms. For example, we require a certain interface to receive no more than 10 requests within a minute. We can set a counter at the beginning, each request, the counter +1; If the value of this counter is greater than 10 and the interval between the request and the first request is less than 1 minute, then there are too many requests. If the interval between the request and the first request is more than 1 minute and the value of this counter is within the flow limiting range, reset the counter

Examples of handwritten counter code:

/** ** *
public class LimitService {
	private int limtCount = 60;// Limit the maximum access capacity
	AtomicInteger atomicInteger = new AtomicInteger(0); // Number of actual requests per second
	private long start = System.currentTimeMillis();// Get the current system time
	private int interval = 60;// The interval is 60 seconds
	public boolean acquire(a) {
		long newTime = System.currentTimeMillis();
		if (newTime > (start + interval)) {
			// Check if it is a cycle
			start = newTime;
			atomicInteger.set(0); // Clear to 0
			return true;
		}
		atomicInteger.incrementAndGet();// i++;
		return atomicInteger.get() <= limtCount;
	}
	static LimitService limitService = new LimitService();
	public static void main(String[] args) {
		ExecutorService newCachedThreadPool = Executors.newCachedThreadPool();
		for (int i = 1; i < 100; i++) {
			final int tempI = i;
			newCachedThreadPool.execute(new Runnable() {
				public void run(a) {
					if (limitService.acquire()) {
						System.out.println("You are not restricted and have normal access to logic I :" + tempI);
					} else {
						System.out.println("You've already been restricted."+ tempI); }}}); }}}Copy the code

Problem: Counters can cause critical problems if a large amount of traffic gathers at critical times, such as 10 requests in 59 seconds and 10 requests in 61 seconds. That’s 20 requests in 2 seconds, which violates our limit of 10 requests in 60 seconds.

This problem we can use the sliding window to solve the counter critical value problem

2.2.2 Counting sliding Windows

Principle of sliding window: Every time there is an access, determine whether the total access volume in the first N units of time exceeds the set threshold, and add 1 to the number of requests in the current time slice.

Sliding window counting can be used in many scenarios, such as limiting traffic to prevent system avalanches. Compared to the counting implementation, the sliding window implementation is smoother and automatically eliminates burrs.

Sliding window example: in one minute divided into 6 cells, each cell access time of 10 seconds, each cell has its own independent counter. Only 1000 requests can be allowed in 60 seconds

Consider: Why sliding Windows can solve critical high concurrency problems? Answer: Because the sliding window counts the number of requests per unit of time, the circle is the number of requests per unit of time, so you can control the number of requests per unit of time

2.2.3 Token bucket algorithm

The token bucket algorithm is a bucket that holds fixed capacity tokens and adds tokens to the bucket at a fixed rate.

The token bucket algorithm principle: assuming a limit of 2r/s, add tokens to the bucket at a fixed rate of 500 milliseconds; A bucket stores a maximum of B tokens. When the bucket is full, newly added tokens are discarded or rejected. When a packet of n bytes arrives, n tokens are removed from the bucket and the packet is sent to the network; If there are less than n tokens in the bucket, the token is not removed and the packet is curbed (either discarded or buffered).

The sample

  • Use RateLimiter to implement token bucket flow limiting

RateLimiter is an implementation class provided by Guava based on the token bucket algorithm, which can easily complete the flow limiting stunt and adjust the rate of token generation according to the actual situation of the system. Usually can be applied to panic buying current limiting to prevent flushing system; Limit the access volume of an interface or service per unit time. For example, some third-party services limit the access volume of users. Limit network speed, how many bytes can be uploaded and downloaded per unit of time, etc.

  1. Introducing Guava’s Maven dependencies.
<dependency>
  <groupId>com.google.guava</groupId>
  <artifactId>guava</artifactId>
  <version>25.1 the jre</version>
</dependency>
Copy the code
  1. Implement the token bucket algorithm using RateLimiter
@RestController
public class IndexController {
	@Autowired
	private OrderService orderService;
	// Explanation: 1.0 means that one token is generated per second and stored in the bucket
	RateLimiter rateLimiter = RateLimiter.create(1.0);
	// Place an order request
	@RequestMapping("/order")
	public String order(a) {
		// 1. Determine current limiting
		// If you do not get a token within 500 seconds, you will wait
		System.out.println("Token generation wait time :" + rateLimiter.acquire());
		boolean acquire = rateLimiter.tryAcquire(500, TimeUnit.MILLISECONDS);
		if(! acquire) { System.out.println("You are how to rob, also can't grab, because will always wait for, you give up first!");
			return "You are how to rob, also can't grab, because will always wait for, you give up first!";
		}
		// 2. Call the order interface directly if the limit is not met
		boolean isOrderAdd = orderService.addOrder();
		if (isOrderAdd) {
			return "Congratulations, buying success!";
		}
		return "Panic buying failed!"; }}Copy the code

This is a simple stand-alone stream limiting, which can also be modified to smooth the injection interface with annotations and AOP. 3. Encapsulation RateLimiter

  • Custom annotations
@Target(value = ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface ExtRateLimiter {
	double value(a);
	long timeOut(a);
}
Copy the code
  • Write the AOP
@Aspect
@Component
@Slf4j
public class RateLimiterAspect {

    /** * Whether the storage interface already exists */
    private static ConcurrentHashMap<String, RateLimiter> rateLimiterMap = new ConcurrentHashMap<>();

    @Pointcut("@annotation(extRateLimiter)")
    public void pointcut(ExtRateLimiter extRateLimiter) {}@Around(value = "pointcut(extRateLimiter)", argNames = "proceedingJoinPoint,extRateLimiter")
    public Object doBefore(ProceedingJoinPoint proceedingJoinPoint, ExtRateLimiter extRateLimiter) throws Throwable {

        // Get the configured rate
        double permitsPerSecond = extRateLimiter.permitsPerSecond();
        // Get the wait token wait time
        long timeOut = extRateLimiter.timeOut();
        RateLimiter rateLimiter = getRateLimiter(permitsPerSecond);
        boolean acquire = rateLimiter.tryAcquire(timeOut, TimeUnit.MILLISECONDS);
        if (acquire) {
            return proceedingJoinPoint.proceed();
        }
        HttpServletUtil.write2ExceptionResult("Perform downgrade method, pro, server busy! Please try again later!");
        return null;
    }

    private RateLimiter getRateLimiter(double permitsPerSecond) {
        // Get the current URL
        HttpServletRequest servletRequest = HttpServletUtil.getRequest();
        String requestURI = servletRequest.getRequestURI();
        RateLimiter rateLimiter = rateLimiterMap.get(requestURI);
        if (ObjectUtils.isEmpty(rateLimiter)) {
            rateLimiter = RateLimiter.create(permitsPerSecond);
            rateLimiterMap.put(requestURI, rateLimiter);
        }
        returnrateLimiter; }}Copy the code
  • Use the sample
@RequestMapping("/myOrder")
@ExtRateLimiter(value = 10.0, timeOut = 500)
public String myOrder(a) throws InterruptedException {
		System.out.println("myOrder");
		return "SUCCESS";
}
Copy the code

2.2.4 Leaky bucket algorithm

The Leaky Bucket Algorithm as a Meter can be used to Leaky Traffic Shaping and Traffic policing.

  • Leaky bucket algorithm principle: a fixed capacity leaky bucket, according to a constant fixed rate of water droplets; If the bucket is empty, no drops of water need to flow; Can flow water droplets into the drain bucket at any rate; If the incoming water droplets exceed the bucket’s capacity, the incoming water droplets overflow (are discarded), while the leaking bucket capacity remains constant.

  • Token buckets vs. leaky buckets:
  1. The token bucket adds tokens to the bucket at a fixed rate. Whether the request is processed depends on whether there are enough tokens in the bucket. When the number of tokens decreases to zero, the new request is rejected. Leaky bucket sends out requests at a constant rate. The inflow rate is arbitrary. When the number of incoming requests reaches the leaky bucket capacity, new incoming requests are rejected.
  2. Advantages and disadvantages: Token bucket limits the average inflow rate (it allows burst requests, which can be processed as long as there are tokens, and supports taking 3 tokens and 4 tokens at a time), and allows burst traffic to a certain extent. The drain bucket limits the constant outflow rate (that is, the outflow rate is a constant value, for example, the outflow rate is 1, not 1 once and 2 next time), so as to smooth the burst inflow rate.

Note: Token buckets allow a degree of burst, while leaky buckets are primarily intended to smooth the inflow rate;

  • Application scenario: The Leaky bucket algorithm can forcibly limit the data transfer rate, while the token bucket algorithm can limit the average data transfer rate and allow burst transmission to some extent. In the “token bucket algorithm”, as long as the token exists in the token bucket, data is allowed to be transmitted in a burst until the user-configured threshold is reached, so it is suitable for traffic with burst characteristics

  • implementation

@Slf4j
public class LeakyBucketLimiter {
    private ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(5);
 
    // The capacity of a bucket
    public int capacity = 10;
    // The current amount of water
    public int water = 0;
    // Flow speed /s
    public int rate = 4;
    // Last time to add water
    public long lastTime = System.currentTimeMillis();
 
    public void acquire(a) {
        scheduledExecutorService.scheduleWithFixedDelay(() -> {
            long now = System.currentTimeMillis();
            // Calculate the current amount of water
            water = Math.max(0, (int) (water - (now - lastTime) * rate /1000));
            int permits = (int) (Math.random() * 8) + 1;
            log.info("Request number:" + permits + ", current bucket allowance:" + (capacity - water));
            lastTime = now;
            if (capacity - water < permits) {
                If the bucket is full, reject it
                log.info("The current is limited.");
            } else {
                // There is capacity
                water += permits;
                log.info("Remaining capacity ="+ (capacity - water)); }},0.500, TimeUnit.MILLISECONDS);
    }
 
    public static void main(String[] args) {
        LeakyBucketLimiter limiter = newLeakyBucketLimiter(); limiter.acquire(); }}Copy the code
  • run

3 summary

In the actual business of distributed scenario with high concurrency and large traffic, it is very necessary to perform traffic limiting on the interface. The specific situation depends on the business scenario and the appropriate traffic limiting algorithm is selected. Alibaba has also developed a traffic limiting component, and readers can experience Sentinel later