An overview of the
Fusing and flow control are essential functions in service gateways. Soul implements this functionality using a variety of mature components, which users can choose according to their preferences. This article describes how to use Ali’s Sentinel component for fusing and flow control in soul. This paper will first introduce the scenarios and significance of fusing and flow control. It then describes how to configure flow control and fuses using the Sentinel plug-in on soul. Finally, we briefly analyze how Soul uses Sentinel components from the source code level.
Fusing and flow control
Scene description
As the entrance of traffic, the service gateway protects subsequent services. The following two scenarios that have serious impacts on services are frequently encountered in production and must be addressed by the business gateway. In one case, during large promotions such as Double 11 or Double 12, the number of requests to the interface is several times higher than normal. If the capacity is not evaluated properly, this surge of requests can easily lead to the entire service being completely unavailable. Such outages are often caused not by bugs in business logic but by too many requests for resources. On the other hand, there are core services in the overall service architecture that multiple business processes depend on. However, all services suffer from unstable processing or service corruption, resulting in long request processing times or frequent exceptions. The exception to a business BUG may be a sudden, very random block, which is usually fixed automatically by slowing down the volume of requests, but if left unprotected there can be a domino effect causing the entire service to become unavailable. This scenario is slightly different from the first scenario, where there is an unmanageable spike in actual traffic, whereas the second scenario focuses on the chain reaction caused by inevitable and unpredictable jitter of the service itself.
Flow control
For the first scenario, we usually carry out traffic control. The core idea is that the service gateway ensures that the incoming requests are the amount that the service can bear, and the redundant requests are directly rejected or added to the waiting queue to ensure that the service will not break down and most of the requests can still be processed normally. When considering the strategy of flow control, we should mainly consider the following questions:
- At what Angle is the flow controlled?
- What is the threshold?
- What is the flow control strategy?
For the first problem, the normal idea is to monitor traffic through QPS, which is flow control when the number of requests per second exceeds a certain limit. But another idea is to monitor traffic from the number of concurrent transactions. This control scenario also makes sense, for example when the downstream application for some reason causes service instability and increased response latency, which for the gateway means reduced throughput and more thread count, and in extreme cases even thread pool exhaustion. In a sense, flow control through concurrency can protect the gateway service itself to a certain extent. For the second question, it is easier to understand the threshold value, which is the boundary of triggering flow control. In terms of QPS, it is the number of times per second when flow control starts, and in terms of concurrency, it is the number of threads in the request context that flow control exceeds. For the third problem, we generally have the following three solutions:
- Outright rejection, a strategy that is well understood, is simply to reject a service when the QPS is above the threshold and not forward the request to a subsequent service.
- Preheating starts. This strategy is aimed at the scenario where the system is in a low water level for a long time and the flow may increase suddenly. However, pulling the system to a high water level may overwhelm the system instantly. The way to start preheating is to let the threshold slowly increase, and gradually increase the threshold within a certain period of time until it reaches the setting, so as to give the cold system a preheating time to avoid the cold system being overwhelmed. Rejection is also triggered for requests that exceed the threshold.
- Uniform queuing, the core idea of this strategy is to let the request through at a fixed interval. When a request arrives, the current request is allowed to pass if the interval between the current request and the last passed request is not less than the preset value. Otherwise, the expected pass time of the current request is calculated. If the expected pass time of the request is less than the preset timeout time of the rule, the request will wait until the preset pass time (queuing for processing). If the expected pass time exceeds the maximum queuing time, the request is rejected.
fusing
A common treatment for the second scenario is to set up a service circuit breaker. To put it simply, when we detect a service that is abnormal, we stop accessing it so that it will not be overwhelmed by more requests. After a period of time, if the service is detected to be restored, the traffic will be sent over. We first need to determine whether the service is unstable/jitter. Then think about what we should do if we find a jitter service. How to determine whether the service is back to normal. There are three ways to judge whether the service is unstable or not.
- Slow call ratio: When the number of requests within a unit statistics period is greater than the minimum number and the number of requests exceeding the maximum duration is greater than the threshold, the service is abnormal and the fuse is triggered.
- Abnormal proportion: When the proportion of abnormal requests per unit statistical period is greater than the threshold, the service is judged abnormal and the fuse is triggered.
- Number of exceptions: When the number of abnormal requests per unit period reaches the threshold, the service is judged abnormal and the fuse is triggered.
When we judge that the service is abnormal according to the above three indicators and shut down the service, we can choose to report the error directly for the request within a certain period of time (the fuse duration) without blocking the upstream service and let the requestor decide how to deal with it. Or service degradation can be triggered directly. Service degradation can be roughly understood as requesting a simplified version of the business that dispenses with many non-core processes and only ensures that the process is finally processed (final consistency). Service fuses are automatically restored just like real fuses. Generally, the service is in the fusing state for a period of time after the fusing is triggered. Then the service enters the half-open state. If no error is reported in the next few requests and the response time is reasonable, the service is restored.
Sentinel plugin in Soul
Sentinel is an open source flow control component of Alibaba oriented to distributed service architecture. It mainly takes traffic as the entry point, and helps you guarantee the stability of micro services from multiple dimensions such as flow control, fuse downgrading and system adaptive protection. Soul, as an outstanding open source gateway in China, integrates Sentinel into its system as a plug-in, enabling users to use the flow control and service circuit breaker functions provided by Sentinel through simple configuration. Here is a brief overview of how to configure using the Sentinel plug-in in Soul.
Log on to the Soul Management platform and configure the plugin in The Plugins List – > Sentinel. The configuration of “selector” is not the focus of this article. Click “Add Rule” to set specific Settings as shown below.
In this configuration page, “Name”, “matching mode”, “condition”, “Log print”, “Enabled”, and “execution order” are the general configurations of the Soul plug-in. It is the configuration items in Processing that we need to focus on. These configuration items can be divided into two groups. The first four options are about fuses, and the last four options are about flow control. In Soul we can set both flow control and circuit breaker policies for a given set of requests. The following describes how to use each configuration item.
fusing
First, let’s look at the fusion-related configuration. There are four configuration items: Fusion-related threshold, whether to enable the fusion-related threshold, fusion-related window size, and the service abnormal detection mode without a name. Fuse switch indicates whether the fuse is on (1 on \0 not on). The size of the fuse window refers to the number of seconds after triggering the fuse to enter the half-open state. In the half-open state, if the request is normal, it will enter the normal state. If the request is still abnormal, the fuse will continue. Fuse – break determination mode and fuse – break threshold need to be combined. Three service exception determination methods of Sentinel are used in Soul. Respectively is:
- Percentage of slow calls, in which the threshold refers to the number of milliseconds that are judged to be slow calls. The ratio of slow calls is 1 by default and cannot be changed. That is, if all calls exceed the threshold within a unit statistics period, the fuse is triggered. This mode is the default mode for Sentinel.
- In this mode, the threshold refers to the upper limit of the proportion of abnormal requests within a unit statistical period. One number [0.0, 1.0] needs to be filled, indicating 0-100%
- In this mode, the threshold refers to the upper limit of the number of abnormal requests per unit statistical period.
Note that Soul uses the default parameters of Sentinel for statIntervalMs and minRequestAmount. One second and five times, respectively. Unit duration Specifies the statistical range of 1 second for exception detection. The counting starts again at the next second. The minimum number of requests means that a fuse will not be triggered if the number of requests is less than 5 within 1 second.
As shown in the above configuration, turn on the fusing configuration. If the service is abnormal for 5 requests within 1 second, the service will be fusing for 10 seconds. After 10 seconds, the service will enter the half-open state. If the service is requested during a meltdown, the Soul gateway will return a request error, protecting the back-end service from further requests.
Flow control
There are five configurations related to flow control: Flow Control Effect, Flow Limiting Threshold, Flow Control switch, and Flow Limiting Threshold Type from top to bottom and left to right. The first is the type of traffic limiting. We can choose “QPS” or “number of concurrent threads”. This parameter specifies the Angle from which we set the traffic limiting threshold. The threshold is the upper limit of QPS or the number of threads. When the threshold is reached, the traffic limiting policy is enabled. Specific traffic limiting policies can be configured in “Flow Control Effect”. In flow control policies, we can choose “direct rejection”, “warm up”, “uniform queuing”, and “warm up + uniform queuing”. Direct rejection is easier to understand, that is, when the NUMBER of QPS or threads reaches the threshold, the excess requests are returned with an error. Preheating indicates that the threshold gradually increases to the specified threshold within 10 seconds. That is, the threshold in the first 2-3 seconds is lower than the specified threshold, but the threshold gradually increases and reaches the specified threshold after 10 seconds. In this way, the system can warm up. If the request exceeds the threshold, the Soul Gateway returns an error message. The uniform queue mode strictly controls the time interval for each request. If the flow control type is QPS and the threshold is 10, soul controls the flow of one request to the back-end service every 100ms. If the waiting time for a request exceeds 500ms, an error message is returned. Note that if the flow limiting type is the number of concurrent threads, the flow control effect can only be “direct rejection”. This configuration indicates that the Soul gateway will ensure that the QPS of the service does not exceed 10, and any additional requests will be reported as an error.
It is important to note that the Sentinel component runs independently on each gateway of Soul. If the gateway is a cluster, then the actual amount sent to the following services during flow control needs to be multiplied by the number of Soul gateway services. That is, if our Soul gateway has three nodes deployed, all requests are evenly loaded across each node via nginx. The flow control we configured for one interface is 10 QPS, so the QPS that the actual backward service needs to deal with is 10*3. This situation is also considered in the case of fuses. If a service triggers a fuses on all three nodes, the service will not receive any more requests.
Sentinel plugin source code read
SentinelRuleHandle handles the processing logic when Sentinel rules are synchronized from the management node. The SentinelPlugin handles the processing logic when Sentinel rules are synchronized from the management node. The sentinelFall Handler handles the logic that triggers flow control or a fuse. So let me take a look at them one by one. First up is SentinelRuleHandle, source code:
public class SentinelRuleHandle implements PluginDataHandler { @Override public void handlerRule(final RuleData SentinelHandle SentinelHandle = gsonutils.getInstance ().fromjson (ruledata.gethandle (), SentinelHandle.class); sentinelHandle.checkData(sentinelHandle); List<FlowRule> flowRules = flowruleManager.getrules ().stream().filter(r ->! r.getResource().equals(getResourceName(ruleData))) .collect(Collectors.toList()); If (sentinelHandle getFlowRuleEnable () = = the SENTINEL_ENABLE_FLOW_RULE) {/ / if the open flow control / / sentinel flow control rules based on the configuration Settings FlowRule rule = new FlowRule(getResourceName(ruleData)); / / configuration threshold rule. SetCount (sentinelHandle. GetFlowRuleCount ()); / / flow control way QPS or thread rule. SetGrade (sentinelHandle. GetFlowRuleGrade ()); // Flow control behavior: 0. default(reject directly), 1. warm up, 2. rate limiter, 3. warm up + rate limiter rule.setControlBehavior(sentinelHandle.getFlowRuleControlBehavior()); flowRules.add(rule); } / / update all flow control configuration FlowRuleManager loadRules (flowRules); // Get all existing fuse configurations, Delete resourceName configuration with the new configuration List < DegradeRule > degradeRules = DegradeRuleManager. GetRules () stream () filter (r - >! r.getResource().equals(getResourceName(ruleData))) .collect(Collectors.toList()); If (sentinelHandle getDegradeRuleEnable () = = the SENTINEL_ENABLE_DEGRADE_RULE) {/ / if the open flow control / / sentinel fusing rules according to the configuration Settings DegradeRule rule = new DegradeRule(getResourceName(ruleData)); / / fusing threshold rule. SetCount (sentinelHandle. GetDegradeRuleCount ()); / / fusing judgment on the basis of 0: business, RT, 1: the exception thewire, 2: the exception rule. The count setGrade (sentinelHandle. GetDegradeRuleGrade ()); / / melting time window rule. SetTimeWindow (sentinelHandle. GetDegradeRuleTimeWindow ()); degradeRules.add(rule); } / / update all fuse configuration DegradeRuleManager. LoadRules (degradeRules); } @override public void removeRule(final RuleData RuleData) {// Delete the specified rule FlowRuleManager.loadRules(FlowRuleManager.getRules() .stream() .filter(r -> ! r.getResource().equals(getResourceName(ruleData))) .collect(Collectors.toList())); DegradeRuleManager.loadRules(DegradeRuleManager.getRules() .stream() .filter(r -> ! r.getResource().equals(getResourceName(ruleData))) .collect(Collectors.toList())); } @Override public String pluginNamed() { return PluginEnum.SENTINEL.getName(); } /** * return sentinel resource name. * * @param ruleData ruleData * @return string string */ public static String getResourceName(final RuleData ruleData) { return ruleData.getSelectorId() + "_" + ruleData.getName(); }}Copy the code
The plugin executes the logical code “SentinelPlugin” as follows
Public Class SentinelPlugin extends AbstractSoulPlugin {// Private Final SentinelFallbackHandler sentinelFallbackHandler; public SentinelPlugin(final SentinelFallbackHandler sentinelFallbackHandler) { this.sentinelFallbackHandler = sentinelFallbackHandler; } @Override protected Mono<Void> doExecute(final ServerWebExchange exchange, final SoulPluginChain chain, final SelectorData selector, final RuleData rule) { final SoulContext soulContext = exchange.getAttribute(Constants.CONTEXT); assert soulContext ! = null; / / generated from the plug-in configuration sentinel use resource name, the name corresponding to a flow control or fusing strategy String resourceName = SentinelRuleHandle. GetResourceName (rule); SentinelHandle SentinelHandle = gsonutils.getInstance ().fromjson (rule-gethandle (), SentinelHandle.class); sentinelHandle.checkData(sentinelHandle); // Introduce sentinel's official Transformer, The request to the sentinel processing return chain. The execute (exchange). The transform (new SentinelReactorTransformer < > (resourceName). DoOnSuccess (v -> { HttpStatus status = exchange.getResponse().getStatusCode(); if (status == null || ! status.is2xxSuccessful()) { exchange.getResponse().setStatusCode(null); throw new SentinelFallbackException(status == null ? HttpStatus.INTERNAL_SERVER_ERROR : status); }}) // Call sentinelFallbackHandler to return an error message when sentinel triggers flow control or fuses. sentinelFallbackHandler.fallback(exchange, UriUtils.createUri(sentinelHandle.getFallbackUri()), throwable)); } / / plug-in name sentinel @ Override public String named () {return PluginEnum. Sentinel. GetName (); } / / order 45 @ Override public int getOrder () {return PluginEnum. SENTINEL. GetCode (); } public static class SentinelFallbackException extends HttpStatusCodeException { public SentinelFallbackException(final HttpStatus statusCode) { super(statusCode); }}}Copy the code
The SentinelFallbackHandler, in soul, returns an error directly to both the fusing requested processing and the flow-controlled request
public class SentinelFallbackHandler implements FallbackHandler { @Override public Mono<Void> generateError(final ServerWebExchange exchange, final Throwable throwable) { Object error; If (throwable instanceof DegradeException) {// Trigger a fuse // HTTP status Set this parameter to 500 exchange.getResponse().setStatusCode(HttpStatus.INTERNAL_SERVER_ERROR); // Request body setting error = soulResultwrap. error(soulResultenum.service_result_error.getCode (), SoulResultEnum.SERVICE_RESULT_ERROR.getMsg(), null); } else if (throwable instanceof FlowException) {if (throwable instanceof FlowException) {if (throwable instanceof FlowException) {if (throwable instanceof FlowException) exchange.getResponse().setStatusCode(HttpStatus.TOO_MANY_REQUESTS); // Request body setting error = soulResultwrap. error(soulResultenum.too_many_requests. GetCode (), SoulResultEnum.TOO_MANY_REQUESTS.getMsg(), null); } else if (throwable instanceof BlockException) {// Service is blocked // HTTP status set to 429 exchange.getResponse().setStatusCode(HttpStatus.TOO_MANY_REQUESTS); Error = soulResultwrap. error(soulResultenum.sentinel_block_error.getCode (), SoulResultEnum.SENTINEL_BLOCK_ERROR.getMsg(), null); } else { return Mono.error(throwable); } return WebFluxResultUtils.result(exchange, error); }}Copy the code
conclusion
The Soul Gateway encapsulates an excellent flow control component, Sentinel, to provide users with easy flow control and fuse functions. Note that if Soul uses Sentinel, some of the parameters are set by default, and you need to adjust the source code if needed. Second, the Soul gateway can be distributed, but with Sentinel there is no distributed flow control. Each Soul gateway node is independent but identical for flow control of the same resource.