Leaky bucket algorithm, token bucket algorithm ideas and usage scenarios
Before introducing RateLimiter, let’s take a look at the leaky bucket algorithm and the token bucket algorithm. Let’s look at the ideas and application scenarios of the two algorithms:
Leaky bucket algorithm:
Combined with the above figure, the leaky bucket algorithm is to put the requests into the bucket and take them out of the bucket at a fixed rate. When the number of requests waiting in the bucket exceeds the upper limit (the bucket capacity is fixed), the subsequent requests will not be added to the bucket and the rejection strategy (such as downgrade) will be implemented.
This method is applicable to scenarios requiring a fixed rate. In most business scenarios, we do not need a strict rate and need to have certain capability to deal with sudden traffic, so we use the token bucket algorithm to limit traffic.
Token bucket algorithm:
Combined with the above figure, the token bucket algorithm to generate the token is at a fixed rate in barrels, each request need to get the token from the barrels, there is no access to the token request will be blocking the current limit (token is not enough time in the barrels), token consumption rate less than the speed of generation, the token bucket will be stored in these unused tokens (until the barrel ceiling), When there is a burst of incoming traffic, the token can be taken directly from the bucket without being curbed.
Whether the token bucket algorithm or bucket algorithm can use delay calculation way, delay computing refers to the time does not require a separate thread to generate the token or regular requests from the bucket, but by the thread calls current limiter to calculate whether there is enough tokens and need sleep time, delay calculation way can save a thread resources. The RateLimiter class provided by Guava implements traffic limiting in the form of deferred computation.
RateLimiter
Realize the principle of
Take a look at the RateLimiter class diagram:
SmoothBursty SmoothRateLimiter, SmoothWarmingUp, SmoothBursty SmoothRateLimiter, SmoothBursty SmoothWarmingUp SmoothWarmingUp is an upgrade of SmoothBursty (see below for more details). This section uses acquire() as the entry point and starts with the RateLimiter class.
acquire()
The principle of analysis
The whole implementation process of stream limiting is mainly divided into four steps: token production, token acquisition, blocking time calculation and blocking thread. Since RateLimiter has been abstracted, it indicates that the commonality has been extracted. The commonality in RateLimiter is the logic of blocking thread, so the common point of blocking thread has been extracted in acquire(). SmoothRateLimiter subclass SmoothRateLimiter subclass SmoothRateLimiter subclass SmoothRateLimiter subclass SmoothRateLimiter subclass SmoothRateLimiter subclass SmoothRateLimiter subclass SmoothRateLimiter subclass SmoothRateLimiter acquire()
@CanIgnoreReturnValue
public double acquire(a) {
// Get a token
return acquire(1);
}
public double acquire(int permits) {
// Advance the token and get the time needed to block
long microsToWait = reserve(permits);
// Sleep threads according to microsToWait (common)
stopwatch.sleepMicrosUninterruptibly(microsToWait);
return 1.0 * microsToWait / SECONDS.toMicros(1L);
}
final long reserve(int permits) {
checkPermits(permits);
synchronized (mutex()) {
returnreserveAndGetWaitLength(permits, stopwatch.readMicros()); }}final long reserveAndGetWaitLength(int permits, long nowMicros) {
long momentAvailable = reserveEarliestAvailable(permits, nowMicros);
return max(momentAvailable - nowMicros, 0);
}
// The details of producing the token, getting the token, and calculating the blocking time are implemented by subclasses
abstract long reserveEarliestAvailable(int permits, long nowMicros);
Copy the code
Written resumes (int permits, long nowMicros), SmoothRateLimiter, etc. are written widely, including permits-long nowMicros.
nextFreeTicketMicros
: Time when the next request is allowed. Token number is insufficient, need to be calculated from the current request thread is responsible for the delay token number and time consuming and update the value, even if need to wait for, the current thread wouldn’t go to block waiting for, but a token in advance, and the advance of the cost would be passed on to the next request, the aim is to reduce the thread block, detailed below source code analysisstableIntervalMicros
: The number of microseconds it takes to generate a token, as passed in by the constructorpermitsPerSecond
In microseconds (1 second = 1,000,000 microseconds)maxPermits
: Maximum number of tokens allowed in the bucketstoredPermits
: The number of unconsumed tokens currently cached in the bucket. When the token consumption rate is less than the token generation rate, the bucket will start to accumulate tokens, but this number is not greater thanmaxPermits
Written with these attributes, the core logic of the “reserveestavailable (int permits, long nowMicros)” method is as follows:
@Override
final long reserveEarliestAvailable(int requiredPermits, long nowMicros) {
// 1. Calculate the number of newly generated tokens according to nextFreeTicketMicros and update storedPermits that are not currently used
resync(nowMicros);
long returnValue = nextFreeTicketMicros;
// 2. Calculate the blocking wait time
2.1 Fetch unconsumed tokens from the bucket first. If there are not enough tokens in the bucket, see how many tokens can be fetched
double storedPermitsToSpend = min(requiredPermits, this.storedPermits);
// 2.2 Calculates whether you need to wait for new tokens (if the number of existing tokens in the bucket is insufficient) and, if so, the number of tokens to wait for
double freshPermits = requiredPermits - storedPermitsToSpend;
WaitMicros = storedPermitsToSpend = storedPermitsToSpend = freshpermitTospend = freshpermitTospend = freshpermitTospend = freshpermitTospend = freshpermitTospend = freshpermitTospend = freshPermits The storedPermitsToWaitTime method is an abstract method. SmoothBursty and SmoothWarmingUp have different implementation costs. Cost of producing new tokens = Number of tokens freshPermits * Time per token stableIntervalMicros
long waitMicros =
storedPermitsToWaitTime(this.storedPermits, storedPermitsToSpend)
+ (long) (freshPermits * stableIntervalMicros);
// update nextFreeTicketMicros
this.nextFreeTicketMicros = LongMath.saturatedAdd(nextFreeTicketMicros, waitMicros);
// 4, deduct the number of tokens, update the remaining tokens in the bucket
this.storedPermits -= storedPermitsToSpend;
// returnValue is assigned to nextFreeTicketMicros, indicating that the current cost will be borne by the next thread
return returnValue;
}
// Count the number of new tokens generated from nextFreeTicketMicros to the current time
void resync(long nowMicros) {
// if nextFreeTicket is in the past, resync to now
if (nowMicros > nextFreeTicketMicros) {
CoolDownIntervalMicros () is an abstract method, and the exact logic of each subclass is different
double newPermits = (nowMicros - nextFreeTicketMicros) / coolDownIntervalMicros();
// Update the number of inventory tokens in the bucket within the maximum limit
storedPermits = min(maxPermits, storedPermits + newPermits);
// Update nextFreeTicketMicros to the current timenextFreeTicketMicros = nowMicros; }}/** * Store permits * permitsToTake * <p>This always holds: {store permits * permitsToTake@code 0 <= permitsToTake <= storedPermits}
*/
abstract long storedPermitsToWaitTime(double storedPermits, double permitsToTake);
/** * Does not take time to produce a new token, implemented by subclasses */
abstract double coolDownIntervalMicros(a);
Copy the code
Summary of details:
- This method is mainly to realize the three logics of token production, token acquisition and calculation of blocking time, while token production and token acquisition are common. When calculating the blocking time, the total blocking time is divided into two parts. Total blocking time = cost of fetching storedPermitsToSpend existing token from bucket + cost of waiting to generate freshpermitStospend new token. For subclasses, the cost of generating new token is the same, except that the cost is different when acquiring existing token, so it is abstract
abstract long storedPermitsToWaitTime(double storedPermits, double permitsToTake)
Methods bySmoothBursty
andSmoothWarmingUp
Two subclasses two subclasses to implement. See the subsequent separate analysis of the two subclasses for details - Access token of the congestion cost transferred to the next thread (see the detailed logic code above analysis), that is to say, when the current thread request token, if you need to block waiting, then the waiting time by a thread to go to bear, the aim is to reduce the number of threads blocked, because the next thread request time is uncertain, It may be a long time before the next request is made, and the new token generated in that time has already satisfied the needs of the next thread, so no blocking is needed. The devil is in the details
- after
RateLimiter
andSmoothRateLimiter
To the core methodacquire()
In the end, the core difference between subclasses is the cost of acquiring inventory tokens in buckets and the cost of generating new tokens. Therefore, the two methods are abstracting and realized by subclasses. The ability of abstraction is the embodiment of code skill
Long storedPermitsToWaitTime(double storedPermits, Double permitsToTake) and abstract double coolDownIntervalMicros() methods in two subclasses of the implementation logic.
SmoothBursty
Token bucket algorithm implementation, can deal with sudden traffic, sudden Bursty mean, deal with sudden traffic here is a prerequisite, only under the condition of the barrels with inventory token, will release the corresponding sudden flow, inventory and the barrel token is saved during low flow, if the system has been in a high traffic, no stock barrel for token, When a burst of traffic comes, the device can only release traffic at a fixed rate.
Therefore, in this class, there is no extra cost to obtain the inventory token in the bucket. When the token in the bucket is enough to meet the token demand of the requesting thread, the thread will not be blocked, thus achieving the ability to deal with sudden traffic. However, the inventory token in the bucket has an upper limit, which is set by the constructor
Constructor source code parsing:
public static RateLimiter create(double permitsPerSecond) {
/* * The default RateLimiter configuration can save the unused permits of up to one second. This * is to avoid unnecessary stalls in situations like this: A RateLimiter of 1qps, and 4 threads, * all calling acquire() at these moments: * * T0 at 0 seconds * T1 at 1.05 seconds * T2 at 2 seconds * T3 at 3 seconds * * Due to the slight delay of T1, T2 would have to sleep till 2.05 seconds, and T3 would also * have to sleep till 3.05 seconds. */
return create(permitsPerSecond, SleepingStopwatch.createFromSystemTimer());
}
@VisibleForTesting
static RateLimiter create(double permitsPerSecond, SleepingStopwatch stopwatch) {
// The number of tokens generated in 1 second for the default bucket inventory limit
RateLimiter rateLimiter = new SmoothBursty(stopwatch, 1.0 /* maxBurstSeconds */);
rateLimiter.setRate(permitsPerSecond);
return rateLimiter;
}
Copy the code
PermitsPerSecond is how many tokens are generated per second, Stopwatch is the system timer, and New SmoothBursty(Stopwatch, 1.0 /* maxBurstSeconds */) hardcodes the maximum number of tokens stored in the bucket for one second. For example, if permitsPerSecond=10 is passed in, the token bucket class stores up to 10 tokens.
abstract long storedPermitsToWaitTime(double storedPermits, double permitsToTake)
Source code analysis
When retrieving an inventory token from a bucket, no extra wait time is required and 0 is returned
@Override
long storedPermitsToWaitTime(double storedPermits, double permitsToTake) {
return 0L;
}
Copy the code
abstract double coolDownIntervalMicros()
Source code analysis
Fixed returns a stableIntervalMicros value, which I talked about above, converted to microseconds from the permitsPerSecond passed in by the constructor
@Override
double coolDownIntervalMicros(a) {
return stableIntervalMicros;
}
Copy the code
SmoothWarmingUp
This class supports preheating function, preheating is for cooling system, when the system flow rate is low, in the system of the thread pool will release redundant threads, connection pool will release redundant connections, the cache will expire, the system will cool down, if also release full flow even sudden flow into the cooling system, the system pressure will rise abruptly, easy to appear problem, This is also a weakness of SmoothBursty, because in SmoothBursty implementation logic, when the system is cold (low traffic), the inventory token in the bucket will increase, and when there is full load traffic or even burst traffic, SmoothBursty will allow full load and burst traffic. Therefore, at this time, we cannot simply release the flow according to whether there is an inventory token in the bucket. We need to add another dimension: the cold and hot degree of the system.
To put it simply, the lower the flow is, the higher the number of tokens accumulated in the bucket will be (because the production rate will be higher than the consumption rate), and the colder the system will be, so the token production rate will be lower at this time, so as to achieve the purpose of preheating. Let’s first look at the key implementation ideas of this kind of preheating function based on the figure:
First explain the key parameters in the following figure:
coldIntervalMicros
: Token production rate when the system is coldest (when the time per unit token is maximum). This value will be analyzed laterstableIntervalMicros
: The number of microseconds that the stabilization phase takes to generate a token, as passed in by the constructorpermitsPerSecond
In microseconds (1 second = 1,000,000 microseconds)maxPermits
: Maximum number of tokens allowed in the bucketstoredPermits
: The number of unconsumed tokens currently cached in the bucket. When the token consumption rate is less than the token generation rate, the bucket will start to accumulate tokens, but this number is not greater thanmaxPermits
, the higher the value, the colder the systemthresholdPermits
: threshold, when the bucket inventory token numberstoredPermits
If the value is greater than this value, it indicates that the system cools down and needs to enter the warm-up phase, which increases the production time of a single token. Otherwise, the system enters the hot system phase and can produce tokens at a normal rate
Implementation idea: X-axis represents the number of current tokens in the bucket, and Y-axis represents the time consuming of producing a single token. It can be seen that when the number of tokens in the bucket is larger than thresholdPermits, the system enters the warm-up stage, and the time consuming of a single token on the Y-axis will increase. When the number of tokens in the bucket reaches maxPermits, The system is in the coldest stage, when the time of a single token is the longest, so as to achieve the purpose of warm-up, understand the implementation of the idea to look at the source code:
Let’s look at the constructor first:
public static RateLimiter create(double permitsPerSecond, long warmupPeriod, TimeUnit unit) {
checkArgument(warmupPeriod >= 0."warmupPeriod must not be negative: %s", warmupPeriod);
return create(
permitsPerSecond, warmupPeriod, unit, 3.0, SleepingStopwatch.createFromSystemTimer());
}
static RateLimiter create(
double permitsPerSecond,
long warmupPeriod,
TimeUnit unit,
double coldFactor,
SleepingStopwatch stopwatch) {
RateLimiter rateLimiter = new SmoothWarmingUp(stopwatch, warmupPeriod, unit, coldFactor);
rateLimiter.setRate(permitsPerSecond);
return rateLimiter;
}
Copy the code
An intra-package access constructor is invoked with the following parameters:
permitsPerSecond
: Steady phase rate, that is, the number of tokens generated per second in the steady phasewarmupPeriod
: preheating timeunit
: Unit of warmupPeriodcoldFactor
: cooldown factor, fixed here at 3.0,SmoothWarmingUp
The class does not provide constructors that can be modified (the only constructors that can be set are those called within the package, not by external classes). This parameter determines what is shown in the figure abovecoldInterval
Value, 3.0, indicates that the production time of the unit token at the coldest time of the system is three times that of the steady phase, such as the steady phase ratepermitsPerSecond
=10 (generate 10 tokens per second), each token generation takes 1 second, then each token generation takes 3 seconds at the beginning of warm-upstopwatch
: can be interpreted as a timer, which records the timing information of flow limiting and calculates the generation and consumption of tokens through the timing information
SmoothWarmingUp(
SleepingStopwatch stopwatch, long warmupPeriod, TimeUnit timeUnit, double coldFactor) {
super(stopwatch);
// Convert warmupPeriod to microseconds and assign the value to warmupPeriodMicros
this.warmupPeriodMicros = timeUnit.toMicros(warmupPeriod);
this.coldFactor = coldFactor;
}
Copy the code
RateLimiter. SetRate (permitsPerSecond) is called in this constructor; Method to initialize some rate-dependent important parameters based on the parameters passed in by the initializer above:
void doSetRate(double permitsPerSecond, double stableIntervalMicros) {
double oldMaxPermits = maxPermits;
// Token production speed when the system is coldest, fixed 3 times the normal rate (coldFactor fixed 3.0)
double coldIntervalMicros = stableIntervalMicros * coldFactor;
The default value is the total warmup time divided by half of the normal rate. If it is too small, it will enter the warmup stage too early and affect the performance. If it is too large, it will cause pressure on the system
thresholdPermits = 0.5 * warmupPeriodMicros / stableIntervalMicros;
// Maximum number of tokens
maxPermits =
thresholdPermits + 2.0 * warmupPeriodMicros / (stableIntervalMicros + coldIntervalMicros);
// See the following analysis for calculation logic
slope = (coldIntervalMicros - stableIntervalMicros) / (maxPermits - thresholdPermits);
// Set the current backlog in the bucket, the initial time must be the coldest time in the system, so the default value maxPermits in the bucket at initialization
if (oldMaxPermits == Double.POSITIVE_INFINITY) {
// if we don't special-case this, we would get storedPermits == NaN, below
storedPermits = 0.0;
} else {
storedPermits =
(oldMaxPermits == 0.0)? maxPermits// initial state is cold: storedPermits * maxPermits / oldMaxPermits; }}Copy the code
Summary of details:One thing to note is that by default the current limiter is initialized when the system is at its coldest, so the number of tokens in the bucket inventory is equal tomaxPermits
, then the graph looks like this:
The core parameters in the code are as follows:
warmupPeriodMicros
: passed in as a construct parameterwarmupPeriod
In microseconds, keeping the time in microseconds makes it more accuratecoldIntervalMicros
: Indicates the token production rate when the system is coldest. ColdIntervalMicros = coldFactor * warmupPeriodMicrosmaxPermits
: MaxPermits = thresholdPermits + 2.0 * warmupPeriodMicros/(stableIntervalMicros + coldIntervalMicros)2.0 * warmupPeriodMicros/(stableIntervalMicros + coldIntervalMicros)
Is to calculate the number of tokens generated in the warm-up phase in the figure above, which uses the physics knowledge of junior high school (can be baidu below)Find the average speed with uniform speed), in the process of uniform acceleration, find the average velocity of the whole process = (initial velocity + final velocity)/2, combined with the above figure, where the initial velocity isstableIntervalMicros
The final velocity is zerocoldIntervalMicros
, then we know the average speed and the total time of the warm-up stagewarmupPeriodMicros
, the number of tokens generated in the warm-up stage =2*warmupPeriodMicros
/ (stableIntervalMicros
+coldIntervalMicros
)slope
: Slope (unit: microsecond). In the preheating stage, the speed is accelerated at a fixed speed, and the generation of the latter token takes less time than the generation of the last token. Slope microseconds, here we need to talk about the calculation logic of the slope, and the calculation logic in the code can be interpreted as mathematical language: StableIntervalMicros subtle, maxpermit-thresholdpermits token number generated during the whole preheating stage, Slope =(coldIntervalMicros – stableIntervalMicros)/(maxpermit-thresholdpermits)storedPermits
: The default value is the coldest time of the system when the traffic limiter is initialized, storeDpermitting = Maxpermitting
abstract long storedPermitsToWaitTime(double storedPermits, double permitsToTake)
Source code analysis
Here token takes is divided into two parts, part is the token time-consuming preheat phase, stable phase token is a part of the time, when the request token, may mix these two tokens, such as 4 request token, while the number of tokens in the barrel is 22, the threshold value is 20, there are two of the four token is preheating period token, has two is a plateau token, It is drawn as follows:
The blue area represents the four tokens to be fetched, spanning two phases, where the time needed to be calculated separately
- The time spent in the warm-up phase is up here
maxPermits
According to the formula mentioned in the calculation logic, in the process of uniform acceleration, average speed =(initial speed + end speed)/2, then total time = average speed * token number =(initial speed + end speed) * token number)/2 - Stable phase time = fixed rate stableIntervalMicros * Number of tokens
Know the idea after a look at the source code:
long storedPermitsToWaitTime(double storedPermits, double permitsToTake) {
// Check whether the number of tokens in the current bucket is larger than thresholdPermits
double availablePermitsAboveThreshold = storedPermits - thresholdPermits;
long micros = 0;
ThresholdPermits indicates that the current system has been cooled down and needs to enter the warm-up period. The token duration during the warm-up period is calculated
if (availablePermitsAboveThreshold > 0.0) {
// Calculate how many tokens need to be removed from tokens exceeding the threshold value and calculate the time
double permitsAboveThresholdToTake = min(availablePermitsAboveThreshold, permitsToTake);
// Time spent in the warm-up phase
double length =
// Calculate the initial speed
permitsToTime(availablePermitsAboveThreshold)
// Calculate the end speed
+ permitsToTime(availablePermitsAboveThreshold - permitsAboveThresholdToTake);
// Total time = ((initial speed + end speed) * token number)/2
micros = (long) (permitsAboveThresholdToTake * length / 2.0);
permitsToTake -= permitsAboveThresholdToTake;
}
// Add the time of the stable phase token to the total time
micros += (long) (stableIntervalMicros * permitsToTake);
return micros;
}
// The mathematical problem is: it is known that every token produced, the time of the next token will increase slope microseconds. Then, with the knowledge of the initial time stableIntervalMicros, the time of producing the first permits token can be calculated. The formula that can be easily thought of is: Time = Initial time stableIntervalMicros+(Slope * permits)
private double permitsToTime(double permits) {
return stableIntervalMicros + permits * slope;
}
Copy the code
abstract double coolDownIntervalMicros()
Source code analysis
@Override
double coolDownIntervalMicros(a) {
// Warmup duration/maximum number of tokens
return warmupPeriodMicros / maxPermits;
}
Copy the code
SmoothWarmingUp#coolDownIntervalMicros() returns a fixed rate stableIntervalMicros.
MaxPermits = thresholdPermits + 2.0 * warmupPeriodMicros/(stableIntervalMicros + coldIntervalMicros);
- ThresholdPermits = 0.5 * warmupPeriodMicros/stableIntervalMicros
coldIntervalMicros
= stableIntervalMicros * coldFactorcoldFactor
Fixed is 3.0
Then maxpermitting can be converted to: (0.5 * warmupPeriodMicros/stableIntervalMicros) + 2.0 * warmupPeriodMicros/(stableIntervalMicros + StableIntervalMicros *3.0) : (0.5 * warmupPeriodMicros/stableIntervalMicros) + (warmupPeriodMicros / 2 stableIntervalMicros) Results in: MaxPermits = warmupPeriodMicros/stableIntervalMicros WarmupPeriodMicros/maxPermits = stableIntervalMicros The final result is stableIntervalMicros, feeling periodperiodmicros… ~ _ ~!
conclusion
Through the above analysis we can see that in SmoothBursty barrels in inventories tokens can be directly used to use, requires no additional time consuming, in order to deal with some unexpected traffic, but these inventory in front of the token is left behind during low flow, if the flow has been in full, no token balance, so sudden traffic came, By default, the maximum number of tokens generated within 1 second is the same. For example, if QPS is set to 10, there will be 10 tokens in the bucket at most. When QPS =20, it will consume only 1 second, and then it will enter the flow limiting state again.
SmoothWarmingUp, on the other hand, makes up for SmoothBursty by considering more complex scenarios. It system can be divided into heat and cold system in two stages, full flow or sudden flow for thermal system, may be harmless, because the system of all kinds of thread pool, caching, connection pool are all under the thermal system, compressive ability is strong, but for cooling system, the capacity of flow and flow will increase the system pressure, lead to all sorts of problems. Therefore, the idea of preheating is added to control the flow under the cold system, and the degree of cold and heat of the system is judged by the number of unconsumed tokens in the bucket inventory. Because when the system is cold, that is, when the system flow is small, the token consumption speed will be low, and the corresponding bucket inventory will rise. ThresholdPermits permits will be passed, and the idea is very clever. And in the code abstraction is very thorough, the two ideas of the common restrictor as far as possible after extraction, by their respective to achieve the difference logic. There is nothing that cannot be solved by adding a layer of abstraction. The key is the ability to discover abstraction
Both current limiter is an advance on the thinking of the token, is the price of the current thread access token (blocking time) the next thread has to be bear, this thread blocking probability can be reduced, because the next request is not necessarily come any time, may be too long for the next request, but this time the token is generated to meet the needs of the next thread, Then you don’t have to block.