1. What is a restrictor

A current limiter is a tool for limiting traffic in high concurrency scenarios.

The common realization methods of current limiter are as follows:

  1. Counter algorithm
  2. Bucket algorithm
  3. Token bucket algorithm

The pros and cons of the three algorithms can be found in this blog post breaking down the Guava RateLimiter line by line.

Compared with the calculator algorithm and the leaky bucket algorithm, the token bucket algorithm has no obvious disadvantages and can handle the burst traffic without abnormal traffic peak. Therefore, the token bucket algorithm is generally used in the implementation of the current limiter.

2. Implement a current limiter based on token bucket algorithm

Below, we implement a simple flow limiter based on the token bucket algorithm. The interface definition of the flow limiter is as follows:

public interface IRateLimiter {

    /** * returns true if the token is received and the request can pass, and falase if the token is not received and the request cannot pass@return* /
    boolean tryAcquire(a);

}
Copy the code

MyRateLimiter implements this interface: MyRateLimiter

import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;

public class MyRateLimiter implements IRateLimiter {

    private final AtomicInteger tokenBucket;

    private final int qps;

    // The number of microseconds required to generate a token
    private final long mircroSecondsPerToken;

    private final ScheduledExecutorService scheduledExecutorService =
            Executors.newScheduledThreadPool(1);

    /** * constructor *@paramQPS Number of requests per second */
    public MyRateLimiter(int qps) {
        this.qps =  qps;
        this.tokenBucket = new AtomicInteger(qps);
        // The subtle number needed to generate a token
        this.mircroSecondsPerToken = 1 * 1000 * 1000 / qps;
        scheduledExecutorService.schedule(() -> {
            /** * Create a timed task that generates a token every ${mircroSecondsPerToken} * While the "token bucket is full" and "add a token to the token bucket" operations are not atomic, * but because the two operations are single-threaded, And adding tokens only happens in this thread, so there is no need to lock these two steps to ensure atomicity */
            if (this.tokenBucket.get() < qps) {
                this.tokenBucket.incrementAndGet();
            }
        }, mircroSecondsPerToken, TimeUnit.MICROSECONDS);
    }

    @Override
    public boolean tryAcquire(a) {

        if (this.tokenBucket.get() < 1) {
            return false;
        } else {
            synchronized (this) {
                if (this.tokenBucket.get() < 1) {
                    return false;
                } else {
                    this.tokenBucket.decrementAndGet(); }}return true; }}}Copy the code

MircroSecondsPerToken is of type long, so the input QPS must be 10^n, 0<=n<=6. Here we change mircroSecondsPerToken to double, which is a bit more accurate, but since we’re using ScheduledExecutorService to create the scheduled task as a producer to produce the token, In ScheduledExecutorService’s schedule method, the time size can only be long, and there is no method with the time unit of double.

In addition, there are other drawbacks: We use a timed thread as the provider to place tokens into tokenBucket at a set frequency. When the bucket is full and no consumers are consuming tokens, our Provider thread continues to run, trying to place tokens into tokenBucket. It’s a bit of a waste of resources. Automatically starting and stopping the provider thread based on the state of the tokenBucket greatly increases the concurrency complexity.

Let’s take a look at how GUAVA’s RateLimiter is implemented and see if we can optimize our own current limiter in the same way that GUAVA RateLimiter implemented it.

3. GUAVA RateLimiter

3.1. Introduction

Guava RateLimiter is a Google-provided flow limiting tool. RateLimiter is based on the token bucket algorithm and can effectively limit the traffic of an interface on a single JVM instance.

3.2. Critical code analysis

Create a Guava RateLimiter with QPS = 2 and QPS = double

  // Finally rateLimiter is a SmoothBursty instance, SmoothBursty inherits from SmoothRateLimiter, remember SmoothRateLimiter, it will be mentioned later
  RateLimiter rateLimiter = RateLimiter.create(2D);

  public static RateLimiter create(double permitsPerSecond) {
    return create(permitsPerSecond, SleepingStopwatch.createFromSystemTimer());
  }

  static RateLimiter create(double permitsPerSecond, SleepingStopwatch stopwatch) {
    RateLimiter rateLimiter = new SmoothBursty(stopwatch, 1.0 /* maxBurstSeconds */);
    rateLimiter.setRate(permitsPerSecond);
    return rateLimiter;
  }
Copy the code

Get a token:

    rateLimiter.tryAcquire();
Copy the code

Let’s look inside the Ratelimiter.tryacquire () method:

public boolean tryAcquire(a) {
	// Gets a token by default and the timeout is set to 0
    return tryAcquire(1.0, MICROSECONDS);
}

public boolean tryAcquire(int permits, long timeout, TimeUnit unit) {
    // timeoutMicros=0 because it is coming in from tryAcquire()
    long timeoutMicros = max(unit.toMicros(timeout), 0);
    Permits >0 to obtain tokens
    checkPermits(permits);
    // 
    long microsToWait;
    // All operations in the RateLimiter instance use the same object to lock
    synchronized (mutex()) {
      // Get the current microsecond timestamp
      long nowMicros = stopwatch.readMicros();
      // See the canAcquire method below
      if(! canAcquire(nowMicros, timeoutMicros)) {return false;
      } else {
        // Permits = 1 nowMicros = current microsecond timestamp
        The reserveAndGetWaitLength method calculates how long it will take to get 1 token from now onmicrosToWait = reserveAndGetWaitLength(permits, nowMicros); }}// Use SleepingStopwatch to wait for the specified implementation. When the time is up, the lock is acquired and returns true
    stopwatch.sleepMicrosUninterruptibly(microsToWait);
    return true;
}

// Determine whether the current time (microsecond timestamp) is later than "token available time" after waiting for the specified time. If it is later, return true, indicating that the token can be obtained.
// Time to get a token is stored on SmoothRateLimiter's nextFreeTicketMicros, if the time is before nextFreeTicketMicros, the token cannot be obtained, otherwise it can.
private boolean canAcquire(long nowMicros, long timeoutMicros) {
    return queryEarliestAvailable(nowMicros) - timeoutMicros <= nowMicros;
}

// 
 final long reserveAndGetWaitLength(int permits, long nowMicros) {
    long momentAvailable = reserveEarliestAvailable(permits, nowMicros);
    return max(momentAvailable - nowMicros, 0);
 }

  final long reserveEarliestAvailable(int requiredPermits, long nowMicros) {
    // Resync is the key approach, the essence of Guava's SmoothRateLimiter, which creates tokens in a "lazy" manner
    resync(nowMicros);
    // Returns the last "time you can start creating tokens"
    long returnValue = nextFreeTicketMicros;
    // storedPermitsToSpend = Number of tokens requested and the one with the smaller number of tokens in the token bucket
    double storedPermitsToSpend = min(requiredPermits, this.storedPermits);
    FreshPermits = 0 if there are enough tokens in the bucket, freshPermits = number of missing tokens if there are not enough tokens in the bucket
    double freshPermits = requiredPermits - storedPermitsToSpend;
    // The time required to produce insufficient tokens
    long waitMicros =
        storedPermitsToWaitTime(this.storedPermits, storedPermitsToSpend)
            + (long) (freshPermits * stableIntervalMicros);
    
    // The key part is that instead of letting the current thread block for ${waitMicros} microseconds, the limiter adds the wait time for ${waitMicros} to nextFreeTicketMicros, so that the "next token production time" is delayed. The current request overspends, and later requests pick up the TAB.
    this.nextFreeTicketMicros = LongMath.saturatedAdd(nextFreeTicketMicros, waitMicros);
    // Subtract the number of tokens actually consumed from the token bucket
    this.storedPermits -= storedPermitsToSpend;
    return returnValue;
  }
  
  void resync(long nowMicros) {
    // if nextFreeTicket is in the past, resync to now
    if (nowMicros > nextFreeTicketMicros) {
      CoolDownIntervalMicros () returns the stableIntervalMicros attribute in SmoothRateLimiter
      // The stableIntervalMicros attribute represents the number of microseconds it takes to create a token for the current current limiter
      // newPermits calculated indicates the total number of tokens created from the time when tokens can be created from the next time to the current time.
      double newPermits = (nowMicros - nextFreeTicketMicros) / coolDownIntervalMicros();
      // add cards to SmoothRateLimiter#storedPermits
      storedPermits = min(maxPermits, storedPermits + newPermits);
      // Update the next time you can start creating tokens to the current timenextFreeTicketMicros = nowMicros; }}Copy the code

3.3. Lazy loading idea

Through the analysis of GUAVA RateLimiter’s tryAcquire source code in Section 3.2, it can be obviously found that although GUAVA RateLimiter also adopts the token bucket method, it does not adopt a single thread as a provider to generate tokens regularly. Instead, a “lazy loading” approach is used to calculate how many tokens should be in the current token bucket based on the current timestamp and the timestamp at which the token started production, only when the token is requested:

Time interval for token creation =1Seconds/QPS// QPS is the number of tokens allowed to be consumed per secondNumber of tokens in the token bucket = math.min (capacity of the token bucket, (current time stamp - token production start timestamp)/token creation interval)Copy the code

After calculating the number of tokens in the tokenbucket, let the consumer consume them. If there are not enough tokens, let the consumer consume them earlier. Then calculate the time it takes to produce these insufficient tokens:

Time required to produce insufficient tokens = time interval between token creation * Insufficient tokensCopy the code

And then update the token to start producing the timestamp, namely before the token to start producing the timestamp of the token bucket is no token, only at {token to start producing the timestamp} before the token bucket is no token, only token to start production before the timestamp of the token bucket is no token, only to the timestamp} {token to start production, Produces the first token:

Timestamp to start token production = current timestamp + time needed to produce insufficient tokensCopy the code

If you then request the token count from this RateLimiter, it will determine whether the current time is before ${token production timestamp}, if so, no token will be requested.

As mentioned above, GUAVA RateLimiter’s approach is an embodiment of the idea of “lazy loading”. When a token is requested, the time stamp is used to calculate the number of tokens that should be in the current token bucket. This avoids the situation where the system has to allocate resources to RateLimiter to perform operations when RateLimiter is not working.

4. Modify our current limiter code with the idea of lazy loading

Modify our current limiter code above with the idea of lazy loading

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;

public class MyRateLimiter implements IRateLimiter {

    private double tokenBucket;

    private final double qps;

    // The number of nanoseconds required to generate a token
    private final double nanoSecondsPerToken;

    // Start creating a nanosecond timestamp for the token
    private long timeBeginToCreateToken;

    /** * constructor *@paramQPS Number of requests per second */
    public MyRateLimiter(double qps) {
        this.qps =  qps;
        this.tokenBucket = qps;
        // The subtle number needed to generate a token
        this.nanoSecondsPerToken = 1 * 1000 * 1000 * 1000 / qps;
        this.timeBeginToCreateToken = System.nanoTime();
    }

    @Override
    public boolean tryAcquire(a) {
        synchronized (this) {
            long now = System.nanoTime();
            // Create a token
            double newToken = (now - timeBeginToCreateToken) / nanoSecondsPerToken;
            // Update the number of tokens in the token bucket
            tokenBucket = Math.min(qps, newToken + tokenBucket);
            // Update the time when token creation started
            timeBeginToCreateToken = now;

            if (tokenBucket > 0) {
                System.out.println("tokenBucket = " + tokenBucket);
                tokenBucket = tokenBucket - 1;
                return true;
            } else {
                return false; }}}Copy the code

5. To summarize

The idea of “lazy loading” in GUAVA RateLimiter’s implementation is worth learning. It can be used in everyday coding in many places, such as when a resource is not difficult to obtain, but takes up space after obtaining it, or after obtaining it, it will continue to consume system resources. In this case we can use “lazy loading”.