Some thoughts about service limiting

The value of limiting the flow

Traffic limiting is certainly valuable as a self-protection mechanism against the uncertainties of the external world (burst traffic, unexpected users) when system resources are insufficient. But the sense of value is low, because 99.99% of the time the system works below the safety line and never even hits the line once a year. It’s like the law. It’s always there, but most of the time for most people it’s almost non-existent, or invisible.

A software system tends to have many hidden bugs, and the most commonly used features tend to have few bugs. Infrequently used features that have gone unnoticed for so long without a chance to resurface will remain hidden waiting to explode. Moreover, as the software system is updated iteratively, unfocused features are more likely to be overlooked by testers and forgotten from coverage testing. As a result, in the event of an unexpected scenario (Murphy’s Law), this ignored buggy code logic is awakened, which can then cause the system to fail or even crash. The current limiting function is one of these features that has not been addressed.

To address the value perception of flow limiting, engineers need to periodically rehearse the flow limiting functionality and repeat it many times using unit tests and stress tests. The bottom line is that you can do anything to trigger the current limiting function to reproduce the else logic that is almost never executed in the production environment.

Current limit algorithm

The two common basic traffic limiting algorithms in the industry are funnel algorithm and token algorithm, which are almost the same. The funnel algorithm can be understood as each request will consume a certain amount of air, and the air in the funnel is limited, and the rate at which the air can be obtained through leakage is also limited. The token algorithm can be understood as consuming one token per request, and the token bucket has a limited capacity to produce tokens, as well as a limited capacity to store tokens.

The request QPS of production environment is smooth on the curve, because most statistical systems adopt smoothing algorithm (mean value of a period of time), but in the actual operation process, there will be certain randomness and volatility, and there will be some sudden outliers, which is generally called burst flow.

The traffic limiting algorithm needs to consider this kind of burst traffic, which should be tolerated in a short period of time. The tolerance is also limited, that is, the tolerance time must be very short. The above two algorithms have this tolerance, which is reflected in the savings of air in the funnel and the savings of tokens in the token bucket. When the savings are exhausted, the tolerance reaches the upper limit.

Distributed current limiting

If your application is a single process, then limiting traffic is easy and the request counting algorithm can be done in memory. The current limiting algorithm has almost no loss and is pure memory calculation. However, the applications in the Internet world are distributed with multiple nodes, and the request processing capacity of each node is not necessarily the same. What we need to consider is the overall request processing capacity of the multiple nodes.

Just because a single process has 1W QPS of processing capacity does not mean that the overall request processing capacity is N * 1W QPS, because the overall processing capacity is also limited by the ability to share resources. This shared resource generally refers to the database, but can also be the CPU and disk resources shared by multiple processes on the same machine, and network bandwidth factors will also limit the overall request volume.

The request counting algorithm needs to be centralized in one place (flow limiting middleware). Applications need to request traffic (air, token buckets) from this centralized manager before processing requests. Each request requires one network IO, from the application to the current-limiting middleware.

For example, we can use Redis + Lua to realize this traffic limiting function, but the performance of Lua is much weaker than that of C. Usually, this traffic limiting algorithm can reach the maximum of about 1W QPS. You can also use the Redis-Cell module, which is implemented internally using Rust, and it can reach its maximum QPS of around 5W. They’re all at full capacity, but in a production environment we don’t want them to be at full capacity all the time.

So how to complete 10W QPS current limiting?

A simple idea is to divide the key of traffic limiting into buckets, and then use Redis cluster to expand the capacity, so that the application instructions of traffic limiting go through the hash bucket of the client to break up several points of the cluster, so as to disperse the pressure.

So how do you do the million QPS (1M) limiting?

The bandwidth resources and Redis instances required to use the above methods are also staggering. We may need dozens of Redis nodes plus hundreds of megabytes (1M * 20 bytes * 8) of bandwidth to do this.

At this point, we have to shift our thinking away from centralized control. For example, if a country has too many people, it needs to be managed separately by provinces, cities and counties — delegation of power.

We distributed the overall QPS according to the weight and carried out single-machine current limiting for each child node and each byte point in memory. If each node is equal, then each child node gets 1/n QPS.

It has the advantage of dispersing the limiting pressure and turning IO operations into pure memory calculations, which makes it easy to cope with extremely high QPS limiting. However, this also increases the complexity of the system, requiring a centralized configuration center to distribute THE QPS threshold to each child node, requiring each application byte point to register with this configuration center, and requiring a configuration management background to manage the QPS allocation of the system.

Such a control center service and SDK would often require a small team to complete without an easy-to-use, mature open source software. This is often unaffordable for small and medium-sized enterprises, especially given the low perceived value of limiting the flow.

Some thoughts about service limiting

The value of limiting the flow

Current limit algorithm

Distributed current limiting

So how to complete 10W QPS current limiting?

So how do you do the million QPS (1M) limiting?

Related Posts

The kill command is used in Linux

Remember to debug Arthas online once

RocketMQ Advanced Features (1)