In the microservice architecture based on Spring Cloud, it is necessary to add the traffic limiting function at the gateway. For example, the access frequency of a specific interface from a specified IP address is limited to 100 times /s.
The general principle is: on the basis of meeting requirements, the implementation is simple and easy to maintain.
The infrastructure of the entire platform is as follows:
Nginx -> [gateway1, gateway2,… -> [serviceA1, serviceA2, serviceB1,…
1. Single-node traffic limiting based on memory
A: Firstly, consider the single-machine current limiting based on memory, which has the advantages of simple implementation and good performance;
Q: However, to improve the availability and performance of the system, I need to deploy multiple gateway instances, which cannot share memory;
A: Assume that A traffic limiting policy is formulated to limit the access frequency of interface A to 100 times per second. If two gateways are deployed and load balancing is configured on Nginx, the traffic limiting frequency of each gateway is 50 times per second, which can basically meet the requirements.
Q: However, if I need to add another gateway instance, or if one of the two deployed gateway instances fails, the traffic limiting policy cannot be satisfied.
A: In this case, there needs to be A mechanism to sense that all gateway services are normal. Since it is based on the Spring Cloud platform, there must be a registry of services. In the example of Consul, you can save the traffic limiting policy to the key/value store on Consul. By invoking the interface of the registry at a certain frequency (for example, every 30 seconds), the gateway can sense the number of all gateway instances that are currently in normal state (let’s say N) and dynamically adjust its traffic limiting policy to 100/n times per second.
Q: When the gateway instance is added or abnormally suspended, the traffic limiting policy will be inaccurate for a short period of time (for example, 30s). However, given that such exceptions are less common and can be set for a shorter period of time, it would not be a problem if the requirements were less stringent.
Q: Another problem is that this implementation depends on the ratio of requests to gateways. For example, when forwarding requests are configured on nginx, the weight of gateway 1 is 3, the weight of gateway 2 is 1, and the weight of gateway 3 is 1. Therefore, the policy of gateway 1 should be set to limit the maximum access times per second to 60, and that of gateway 2 and gateway 3 is 20. That is, the gateway traffic limiting policy and nginx configuration are bound, this design is not reasonable. In addition, if gateway 3 fails unexpectedly, how to adjust the traffic limiting policies of gateways 1 and 2 becomes complicated.
2. Distributed traffic limiting (traffic limiting functions as separate RPC services)
A: Encapsulate the stream limiting function as A separate RPC service. After receiving the request, the gateway queries the request through the interface provided by the traffic limiting service and determines whether to permit or deny the request based on the returned result.
Q: In this way, a traffic limiting service needs to be deployed first, which increases the operation and maintenance cost. In addition, there is one more network overhead per request (gateway access traffic limiting service), so performance bottlenecks are likely to occur in RPC communication between the gateway and traffic limiting service. If the traffic limiting function provides a common HTTP interface, the performance is expected to be poor. If a binary protocol interface (such as thrift) is provided, then the gateway does some code rewriting (developed based on Spring Cloud and WebFlux, after all).
Overall, this is an implementation worth trying. Alibaba’s open source stream limiting system Sentinel realizes both distributed and memory-based stream limiting, which feels like a good choice. (Read the general introduction, not in-depth research)
3. Distributed traffic limiting based on Redis
A: Use redis single thread feature and Lua script to achieve distributed traffic limiting. When requests from multiple gateways access Redis, they are executed sequentially within Redis and there is no concurrency problem. A single request will involve multiple REDIS operations. Take token bucket algorithm as an example: To obtain the current number of tokens, the time of obtaining the last token, the update time and the number of tokens, etc., lua script can ensure atomicity, and also reduce the network overhead of the gateway accessing Redis for multiple times.
The key here is the Lua script. The Spring-Cloud-Gateway version of Spring Cloud.Greenwich has a flow-limiting filter, and its Lua script is as follows:
local tokens_key = KEYS[1]
local timestamp_key = KEYS[2]
local rate = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
local fill_time = capacity/rate
local ttl = math.floor(fill_time*10)
-- The number of current tokens
local last_tokens = tonumber(redis.call("get", tokens_key))
if last_tokens == nil then
last_tokens = capacity
end
-- The last time the token was fetched
local last_refreshed = tonumber(redis.call("get", timestamp_key))
if last_refreshed == nil then
last_refreshed = 0
end
local delta = math.max(0, now-last_refreshed)
-- Add token delta*rate to update token count
local filled_tokens = math.min(capacity, last_tokens+(delta*rate))
local allowed = filled_tokens >= requested
local new_tokens = filled_tokens
local allowed_num = 0
if allowed then
new_tokens = filled_tokens - requested
allowed_num = 1
end
Update the number and time of tokens in Redis
redis.call("setex", tokens_key, ttl, new_tokens)
redis.call("setex", timestamp_key, ttl, now)
return { allowed_num, new_tokens }
Copy the code
Q: In the actual test, there was no problem if only one gateway instance was enabled; If multiple gateway instances are enabled, the actual traffic limiting is incorrect. The cause is found as follows: The time of multiple servers on which the gateway is enabled is inconsistent.
A: When adding tokens to the token bucket at A specific rate, the formula is: Rate *(current time – last time the token was added), and the current time value is passed by the gateway. If the server time of multiple gateways is incorrect, then the logic of this script is wrong. One way is to always synchronize time, which is almost impossible to do; Another way to do this is to use the redis server time by changing line 6 local now = tonumber(ARGV[3]) to: local now = redis. Call (“time”)[1].
Note:
In Lua scripts, you should not set random values. The following is the relevant content:
When copying Lua scripts to affiliated nodes, or writing Lua scripts to AOF files, Redis needs to solve the following problem: If a Lua script is random or has side effects, when the script is run on a satellite node, or loaded from an AOF file and rerun, it may have completely different results from the previous run.
Consider the following code, where get_random_number() has a random nature. We execute this code in the SERVER SERVER and save the result of the random number to the key number:
# Make-believe examples that don't actually appear in a scripted environment redis> EVAL "return redis.call('set', KEYS[1], get_random_number())" 1 number OK redis> GET number "10086" Copy the code
Now, if the EVAL code were copied to the SLAVE SLAVE node, because of the random nature of get_random_number(), it would most likely generate a value completely different from 10086, such as 65535:
# Make-believe examples that don't actually appear in a scripted environment redis> EVAL "return redis.call('set', KEYS[1], get_random_number())" 1 number OK redis> GET number "65535" Copy the code
As you can see, writing scripts with randomness creates a serious problem: it breaks the consistency between server and satellite node data.
The same problem occurs when writing scripts with a random nature are loaded from an AOF file.
Randomness is only bad when a script with randomness is writing.
If a script is only performing read-only operations, randomness is harmless. For example, if the script simply executes the RANDOMKEY command, it is harmless; However, this script is harmful if a write is performed based on the result of RANDOMKEY after it is executed.
Similar to randomness, if the execution of a script is dependent on any side effects, the script may produce different results each time it is executed.
To solve this problem, Redis imposes a strict limit on the scripts that the Lua environment can execute — all scripts must be pure functions with no side effects.
To do this, Redis takes a number of measures for the Lua environment:
- Libraries (such as system time libraries) that do not provide access to system state states.
- Disallow the loadfile function.
- If a script tries to execute a write command (such as SET) after executing a command with a random nature (such as RANDOMKEY) or a command with side effects (such as TIME), Redis will prevent the script from continuing and return an error.
- If a script executes a random read command (such as SMEMBERS), an automatic lexicographic ordering is performed to ensure that the output of the script is in order before it is returned to Redis.
- Replace Lua with random generated functions defined by Redis
math
Table of the originalmath.randomFunctions andmath.randomseedFunction, the new function has the property that every time a Lua script is executed, unless explicitly calledmath.randomseed
, otherwise,math.random
The generated sequence of pseudo-random numbers is always the same.After this series of tweaks, Redis can ensure that scripts are executed:
- No side effects.
- There is no harmful randomness.
- For the same input parameters and data set, the same write command is always generated.
And then I actually tested it and there was no error, right? !
10.201.0.30:6379 >eval "local now = redis.call('time')[1]; return redis.call('set', 'time-test', now)" 0
OK
10.201.0.30:6379> get time-test
"1552628054"
Copy the code
So check the official document:
Redis. IO/commands/ev… .
Note: starting with Redis 5, the replication method described in this section (scripts effects replication) is the default and does not need to be explicitly enabled.
Starting with Redis 3.2, it is possible to select an alternative replication method. Instead of replication whole scripts, we can just replicate single write commands generated by the script. We call this script effects replication.
In this replication mode, while Lua scripts are executed, Redis collects all the commands executed by the Lua scripting engine that actually modify the dataset. When the script execution finishes, the sequence of commands that the script generated are wrapped into a MULTI / EXEC transaction and are sent to replicas and AOF.
This is useful in several ways depending on the use case:
- When the script is slow to compute, but the effects can be summarized by a few write commands, it is a shame to re-compute the script on the replicas or when reloading the AOF. In this case to replicate just the effect of the script is much better.
- When script effects replication is enabled, the controls about non deterministic functions are disabled. You can, for example, use the TIMEor SRANDMEMBER commands inside your scripts freely at any place.
- The Lua PRNG in this mode is seeded randomly at every call.
In order to enable script effects replication, you need to issue the following Lua command before any write operated by the script:
redis.replicate_commands() Copy the code
The function returns true if the script effects replication was enabled, otherwise if the function was called after the script already called some write command, it returns false, and normal whole script replication is used.
To put it simply: since Redis 3.2, an effect-based copy has been added to Redis master/slave copy or when writing to AOF files. Instead of copying the entire script, we can copy the individual write commands generated by the script, which means random values can be set in lua scripts, such as system time. Redis 5 and later, this is the default replication mode.