Traffic Defender for SCA Sentinel distributed system

Sentinel introduces

Sentinel is a flow control, fuse downgrading component for cloud native microservices. Replace Hystrix for problems: service avalanche, service degradation, service fuse, service limiting

hystrix

In the previous case, two microservices were created: service consumer (automatic delivery microservice) – > Invoke service provider (resume microservice)

Hystrix was introduced on the caller — > a separate Dashboard project — >Turbine

1) Build a monitoring platform dashboard by myself

2) Did not provide UI interface for service fusing, service degradation and other configurations (but wrote codes and invaded our source program environment)

Sentinel

  • The basic features

    • Standalone deployable Dashboard/ console component
    • Reduce code development for fine-grained control through UI configuration (automated delivery of microservices)
  • The composition of Sentinel

    • Core libraries: Java clients are independent of any framework/library, can run in all Java runtime environments, and have good support for frameworks such as Dubbo/Spring Cloud

    • Console :(Dashboard) based on Spring Boot, packaged can run directly, no additional application containers such as Tomcat.

  • Sentinel has the following characteristics

    • Rich application scenarios: Sentinel has undertaken the core scenarios of Alibaba’s double Eleven traffic drive in the past 10 years, such as SEC killing (i.e. burst traffic control within the range of system capacity), message peaking and valley filling, cluster flow control, real-time fusing of unavailable downstream applications, etc
    • Complete real-time monitoring: Sentinel also provides real-time monitoring capabilities. From the console, you can see a summary of the performance of a single machine-by-second data, or even a cluster of less than 500 machines, for accessing the application.
    • Extensive Open source ecosystem: Sentinel provides out-of-the-box integration modules with other open source frameworks/libraries, such as Spring Cloud and Dubbo. You can quickly access Sentinel by introducing the appropriate dependencies and simple configuration.
    • Sophisticated SPI extension points: Sentinel provides an easy-to-use, sophisticated SPI extension interface. You can quickly customize the logic by implementing an extension interface. For example, customize rule management and adapt dynamic data sources.
  • The main features

  • Open source ecological

Sentinel deployment

Download: github.com/alibaba/Sen… We used V1.7.1

Start: java-jar sentinel-dashboard-1.7.1.jar &

Username/password: sentinel/sentinel

Open the url after startup:http://localhost:8080/

Service transformation

In our existing business scenario, “Automatic delivery microservice” calls “resume microservice”, and we perform fuse downgrading and other controls in the automatic delivery microservice. Then, we transform the automatic delivery microservice and introduce Sentinel core package

  • Create a new project

  • Pom adds dependencies
<! -- Sentinel Core Environment Dependence -->
<dependency>
    <groupId>com.alibaba.cloud</groupId>
    <artifactId>spring-cloud-starter-alibaba-sentinel</artifactId>
</dependency>
Copy the code
  • Application. Yml configuration file, focusing on adding sentinel configuration
server:
  port: 8098
spring:
  application:
    name: lagou-service-autodeliver
  cloud:
    nacos:
      discovery:
        server-addr: 127.0. 01.: 8848127.00 0.1:8849127.00 0.1:8850

    sentinel:
      transport:
        dashboard: 127.0. 01.: 8080 # sentinel Dashboard /console address
        port: 8719   # sentinel will start the HTTP server on this port so that some of the rules defined by the console can be sent and passed.
        If port 8719 is occupied, then +1 will be added
      # Sentinel Nacos data source configuration, the rules in Nacos will be automatically synchronized to Sentinel flow control rules

management:
  endpoints:
    web:
      exposure:
        include: "*"
  Expose health interface details
  endpoint:
    health:
      show-details: always
# for the microservice name of the called party, if not added, it takes effect globally
lagou-service-resume:
  ribbon:
    Request connection timeout
    ConnectTimeout: 2000
    Request processing timeout
    # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # Feign timeout value Settings
    ReadTimeout: 3000
    All operations are retried
    OkToRetryOnAllOperations: true
    #### Based on the configuration above, when a failed request is reached, it tries to access the current instance again (the number of times is specified by MaxAutoRetries).
    #### If not, another instance is accessed, if not, another instance is accessed (the number of changes is configured by MaxAutoRetriesNextServer),
    #### If you still fail, a failure message is displayed.
    MaxAutoRetries: 0 Number of retries for the currently selected instance, not including the first call
    MaxAutoRetriesNextServer: 0 Number of retries for switching instances
    NFLoadBalancerRuleClassName: com.netflix.loadbalancer.RoundRobinRule Load policy adjustment
logging:
  level:
    # Feign Logs respond only to logs with the log level debug
    com.lagou.edu.controller.service.ResumeServiceFeignClient: debug
Copy the code
  • Start a resume micro-service, start a resume delivery micro-service

  • In NACOS, you’ve already seen two microservices coming up

  • Let’s go to the resume micro service and make a request to the resume micro service, and then look at Sentinel, and we’ve already seen data coming in

Sentinel key Concepts

  • resources

It can be anything in a Java application, for example, a service provided by the application, or another application invoked by the application, or even a piece of code. The API interface we request is the resource

  • The rules

The rules set around the real-time status of resources can include flow control rules, fuse degrade rules, and system protection rules. All rules can be dynamically adjusted in real time

Sentinel traffic rule module

The concurrent capacity of the system is limited. For example, system A supports one QPS. If too many requests come in, THEN A should control the flow

We open the cluster Point link TAB to see the requested resource

Now, no matter how many requests I make per second, the system can receive the response, and if the peak of the traffic comes, it will knock the machine out of its mind, so we can limit the flow configuration for this resource

Click on the “flow control” button, the pop-up box, new here to explain the “source” column: because by the request of the client may be A browser, it is possible that A service, it is possible that B service, we can be configured according to different sources of call, select the public said do not limit, for all request must carry on the current limit

  • Flow control QPS, number of threads

Limit QPS is selected here, and the single-machine threshold is set to 1, that is, there can only be one request per second. If the number of threads is selected here, it means that only one thread can be processed at a time. If the current thread has not finished processing, the flow will be limited

When we make multiple requests within a second, we return a stream limiting error

  • Warm up of flow control effect

When the system is idle for a long time, when the traffic suddenly increases, directly pulling the system to the high water level may instantly overwhelm the system, such as the second kill module of the e-commerce website.

Through the Warm Up mode, the traffic slowly increases and reaches the set value of the request processing rate after the preset warmup time. By default, the Warm Up mode starts at 1/3 of the set QPS threshold and slowly increases to the QPS set value

  • Associated limiting of flow control mode

When the number of associated resource calls reaches the threshold, traffic limits itself. For example, the user registration interface needs to invoke the ID verification interface (usually the ID verification interface). If the id verification interface request reaches the threshold, traffic limits the user registration interface using association. In other words, the system restricts registering interfaces when a large number of validation interfaces are called.

+ Link in flow control mode flow limiting link refers to request link (call chain)

In link mode, the system controls the traffic of the invocation link where the resource resides. You need to configure the entry resource in the rule, which is the context name of the link entry for this invocation.

The requests from Entrance1 and Entrance2 in the figure above both invoke the resource NodeA. Sentinel allows limiting the flow of resources based only on the statistics of a call entry. For example, in link mode, setting the entry resource to Entrance1 means that only calls from Entrance1 are recorded in NodeA’s traffic limiting statistics, regardless of calls coming through Entrance2.

  • Queuing for flow control effect

In queuing mode, the interval for passing requests is strictly controlled. That is, the requests are passed at a uniform speed and part of the requests are queued. This mode is usually applied to scenarios such as peak clipping and valley filling in message queues. You need to set the timeout period. If the waiting time exceeds the timeout period, the request is rejected.

A lot of traffic is coming, not directly reject the request, but the request is queued, a uniform speed through (processing), the request can wait to be processed, can not wait (waiting time > timeout time) will be rejected.

For example, a QPS configuration of 5 means that one request is allowed to pass every 200 ms, or five requests are allowed to pass every second. Any additional requests are queued up to pass. The timeout period represents the maximum queuing time. Requests exceeding the maximum queuing time will be rejected. In queued mode, the QPS value should not exceed 1000 (request interval 1 ms), that is, the number of passes per second is 1000.

Sentinel degradation rule module

Flow control is to control the large flow from the outside, and the perspective of fuse downgrading is to deal with the internal problems. Sentinel degradation will restrict the invocation of a resource in the call link when it is in unstable state (such as call timeout or abnormal proportion increases), so that the request can fail quickly and avoid cascading errors caused by affecting other resources. When a resource is degraded, all calls to the resource are automatically disabled within the next degraded window.

Sentinel does not miss a request like Hystrix does and try to repair itself, which is that the fuse is triggered and the request is rejected within the time window and then restored after the time window.

  • RT (Average response time)

If the average response time exceeds the threshold (in ms), all calls to this method will automatically fuse (throw a DegradeException) within the next time window (in s).

Pay attention to the Sentinel default statistical RT limit is 4900 ms, is beyond the threshold will be classified as 4900 ms, if need to change this limit by startup configuration items – Dcsp. Sentinel. Statistic. Max. RT = XXX to configure

As shown in the figure above, the average RT response time is 200 ms and the time window is 5 seconds. If the average response time of this interface is greater than 5 seconds, then all other requests are directly fusing during this 5 seconds

  • Abnormal proportion

When requests per second for the resource >= 5 and the ratio of total exceptions per second to passes exceeds the threshold, the resource enters the degraded state, that is, within the next time window (in s), all calls to this method are automatically returned. The threshold range for the abnormal ratio is [0.0, 1.0], representing 0-100%.

  • Number of abnormal

When the number of resource anomalies in the last 1 minute exceeds the threshold, the system fuses. Notice the statistical timeWindow is minute. If the timeWindow is less than 60 seconds, the circuit breaker may enter the circuit breaker state again after the circuit breaker state ends. Time window >= 60s

Sentinel custom bottom-of-the-pocket logic

The @SentinelResource annotation is similar to the @HystrixCommand annotation in Hystrix

There are two properties in the @SentinelResource annotation that need to be distinguished

  • The blockHandler property is used to specify degraded bottom-of-the-pocket methods that do not meet the Sentinel rule
  • The fallback property is used to specify the Java runtime exception bottom-pocket method

Build an exception handler class. Note that the methods in the bottom class are static

public class SentinelHandlersClass {

    // As with Hystrix, you need to add the BlockException parameter to the parameter to receive exceptions
    // Note that methods are static
    public static Integer handleException(Long userId, BlockException blockException) {
        return -100;
    }

    // A back-of-the-envelope service to handle Java errors
    public static Integer handleError(Long userId) {
        return -500; }}Copy the code

In the controller method, the tag with the @SentinelResource annotation is a Sentinel service degradation handler class

  • BlockHandlerClass: Sentinel service degrade handler class
  • BlockHandler: Processing method of sentinel service degradation
  • FallbackClass: Java error rollback handler class (same as Hystrix)
  • Fallback: Error handling (same as Hystrix)
    @GetMapping("/checkState/{userId}")
    @SentinelResource(value = "findResumeOpenState", blockHandlerClass = SentinelHandlersClass.class, blockHandler = "handleException", fallbackClass = SentinelHandlersClass.class,fallback = "handleError")
    public Integer findResumeOpenState(@PathVariable Long userId){

        try {
            Thread.sleep(1000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return client.findResumeOpenState(userId);
    }
Copy the code

Implementation of Sentinel rule persistence based on Nacos

Currently, the rule data added to Sentinel Dashboard is stored in memory, and the rule data disappears when the microservice is stopped, which is not appropriate in a production environment. We can persist Sentinel rule data to the Nacos configuration center and let microservices get rule data from Nacos

  • Add dependencies to pom.xml for automated delivery of microservices
<! -- Sentinel supports the use of Nacos as a rule configuration data source.
<dependency>
 <groupId>com.alibaba.csp</groupId>
 <artifactId>sentinel-datasource-nacos</artifactId>
</dependency>
Copy the code
  • The Nacos data source is configured in application. Yml for automatic delivery of microservices
server:
  port: 8098
spring:
  application:
    name: lagou-service-autodeliver
  cloud:
    nacos:
      discovery:
        server-addr: 127.0. 01.: 8848127.00 0.1:8849127.00 0.1:8850

    sentinel:
      transport:
        dashboard: 127.0. 01.: 8080 # sentinel Dashboard /console address
        port: 8719   # sentinel will start the HTTP server on this port so that some of the rules defined by the console can be sent and passed.
        If port 8719 is occupied, then +1 will be added
      # Sentinel Nacos data source configuration, the rules in Nacos will be automatically synchronized to Sentinel flow control rules

      # Sentinel Nacos data source configuration, the rules in Nacos will be automatically synchronized to Sentinel flow control rules
      datasource:
        # Custom flow control rule data source name
        flow:
          nacos:
            server-addr: ${spring.cloud.nacos.discovery.server-addr}
            data-id: ${spring.application.name}-flow-rules
            groupId: DEFAULT_GROUP
            data-type: json
            rule-type: flow  The # type comes from the RuleType class
        # Custom reversion rule data source name
        degrade:
          nacos:
            server-addr: ${spring.cloud.nacos.discovery.server-addr}
            data-id: ${spring.application.name}-degrade-rules
            groupId: DEFAULT_GROUP
            data-type: json
            rule-type: degrade  The # type comes from the RuleType class

Copy the code
  • Add the corresponding rule configuration set in Nacos Server (public namespace – >DEFAULT_GROUP)
    • Resource: indicates the name of the resource
    • LimitApp: Source app
    • Grade: Threshold Type 0 Number of threads 1 QPS
    • Count: single-machine threshold
    • Strategy: In flow control mode, 0 directly associates 1 with 2 links
    • ControlBehavior: Flow control effect, 0 fast failure 1 Warm Up 2 Queue waiting
    • ClusterMode: True /false Indicates whether to cluster

Flow control rule configuration set Lagou-service-autodeliver -flow-rules

[{"resource":"findResumeOpenState"."limitApp":"default"."grade":1."count":1."strategy":0."controlBehavior":0."clusterMode":false}]Copy the code
  • Degrade rule configuration set Lagou-service-autodeliver -degrade-rules
    • Resource: indicates the name of the resource
    • Grade: downgrade policy 0 RT 1 Abnormal ratio 2 Abnormal number
    • Count: the threshold
    • TimeWindow: indicates the timeWindow
[{"resource":"findResumeOpenState"."grade":2."count":1."timeWindow":5}]Copy the code
  • Note:
    • 1) A resource can have multiple flow limiting rules and degradation rules at the same time, so the configuration set is a JSON array
    • 2) The modification of rules in Sentinel console only takes effect in memory and does not modify the configuration value in Nacos. The original value is restored after restart. The modification of rules in the Nacos console takes effect not only in memory, but also in persistent rules in Nacos, and the rules remain after the restart

A combination of Nacos + Sentinel + Dubbo

Revamp the Auto-delivery microservices and Resume microservices, remove OpenFeign and Ribbon, and use DubboRPC and Dubbo LB

First, you need to remove or comment out the hot deployment dependency in the parent project. If you do not remove this dependency, the Dubbo project will not get up

<! -- Hot deployment -->
 <! --<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-devtools</artifactId> <optional>true</optional> </dependency> -->
Copy the code

Create a New Duboo SPI interface project

  • Create a project to extract dubbo service interface, lagou-service-dubbo- API

Creating an Interface

package com.lagou.edu.service;
public interface ResumeService {
 Integer findDefaultResumeByUserId(Long userId);
}
Copy the code

Service provider engineering transformation

  • 0) Create a resume micro-service

  • 1) Add spring Cloud + Dubbo integration dependencies to POM files, and add Dubbo service interface engineering dependencies
<!--spring cloud alibaba dubbo 依赖-->
    <dependency>
        <groupId>com.alibaba.cloud</groupId>
        <artifactId>spring-cloud-starter-dubbo</artifactId>
    </dependency>
    <dependency>
        <groupId>com.alibaba.csp</groupId>
        <artifactId>sentinel-apache-dubbo-adapter</artifactId>
    </dependency>
    <! -- Dubbo service interface dependency -->
    <dependency>
        <groupId>com.lagou.edu</groupId>
        <artifactId>lagou-service-dubbo-api</artifactId>
        <version>1.0 the SNAPSHOT</version>
    </dependency>
Copy the code
  • 2) ResumeService (dubbo) is not supported by dubbo. Add dubbo @service to the service implementation class

  • 3) Add dubbo configuration to application.yml or bootstrap.yml configuration file
dubbo:
  scan:
    Dubbo service scan benchmark
    base-packages: com.lagou.edu.service.impl
  protocol:
    # dubbo agreement
    name: dubbo
    # dubbo protocol port (-1 indicates autoincrement port, starting from 20880)
    port: - 1
    host: 127.0. 01.
  registry:
    Mount to the Spring Cloud registry
    address: spring-cloud://localhost

Copy the code

After running the resume microservice, check out NacOS

Check out the details and you can see that the resume micro service is provided through Dubbo

Service consumer engineering transformation

Next, transform the service consumer project – > Automatic delivery of microservices

  • The OpenFeign content was deleted from pom. XML
  • Delete content related to Feign and Ribbon from the application.yml configuration file. Remove Feign client content from code;
  • Pom.xml adds the same content as the service provider
  • Add dubbo content to application. Yml configuration file to subscribe service
dubbo:
  registry:
    Mount to the Spring Cloud registry
    address: spring-cloud://localhost
  cloud:
    # subscribe to a list of service providers' applications, subscribe to multiple service providers using "," connections
    subscribed-services: lagou-service-resume
Copy the code

Also, let’s configure Spring.main. allow-bean-mrit-overriding =true

  • Controller code modification, other unchanged

Reference SPI’s services, using Dubbo’s Reference annotation

  • After running the publication, you will also find that the Nacos console already has service registration information

  • Testing: http://localhost:8099/autodeliver/checkState/1545132

Return to normal

Flow control limit