1. An overview of the

1.1 what is

Hystrix is a system to deal with distributed delay, and fault tolerance of the open source library, in a distributed system, many rely on inevitably call fails, such as overtime, abnormal, Hystrix can guarantee in the case of a dependency problem, won’t cause the overall service failure, avoid cascading failure, in order to improve the flexibility of a distributed system. The circuit breaker itself is a switching device that turns on when Hystrix detects that a service has failed, disconnecting the service link. Instead of blocking the consumer of the service or throwing an exception to the consumer, Hystrix sends the caller an expected, FallBack response that can be processed. This ensures that service callers’ threads are not tied up unnecessarily for long periods of time, preventing failures from spreading and even avalanches in distributed systems, so Hystrix is a system defense mechanism.

1.2 What can Hystrix do

  • Service degradation
  • Service fusing
  • Near real-time monitoring

1.3 hystrix website

https://github.com/Netflix/Hystrix/wiki/How-To-Use

1.4 Hystrix official announcement, stop more dimension

2. Hystrix important concepts

Service degradation: the server is busy, please try again later, do not let the client wait and return a friendly prompt immediately



What are the downgrades?

Abnormal program running

timeout

Service fuse fault triggers service degradation

A full thread pool/semaphore can also cause service degradation



Service fuse: similar to the fuse reaches the maximum service access, directly deny access, pull the power limit, and then call the service degradation method and return a friendly prompt (service degradation -> then fuse -> restore the calling link)

Service flow limit: second kill high concurrency and other operations, it is strictly prohibited to rush over crowded, everyone queue, N per second, orderly

Copy the code

3. Hystrix case

3.1 build

3.1.1 new cloud provider – hystrix – payment8001
3.1.2 Adding hystirx dependency to POM Files
<! -- hystrix -->

<dependency>

    <groupId>org.springframework.cloud</groupId>

    <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>

</dependency>

Copy the code
3.1.3 yml file
server:

  port: 8001



spring:

  application:

    name: cloud-provider-hystrix-payment





eureka:

Client: # Register the client in the Eureka service list

    register-with-eureka: true# indicates whether to register yourself in eurekaServer. Defaulttrue

    fetch-registry: true# Whether to fetch the existing registration information from EurekaServertrueThe cluster must betrueTo use load balancing with the ribbon

    service-url:

      defaultZone: http://localhost:7001/eureka

Copy the code
3.1.4 Main boot class
@SpringBootApplication

@EnableEurekaClient

public class PaymentHystrixMain8001 {

    public static void main(String[] args) {

        SpringApplication.run(PaymentHystrixMain8001.class, args);

    }

}

Copy the code
3.1.5 business class

PaymentService

@Service

public class PaymentService {



    // Normal access

    public String paymentInfo_OK(Integer id){

        return "Thread pool :" + Thread.currentThread().getName()+"paymentInfo_OK,id:"+id+"\t";

    }



    public String paymentInfo_TimeOut(Integer id){

        try {

            Thread.sleep(3000);

        } catch (InterruptedException e) {

             e.printStackTrace();

        }

        return "Thread pool :" + Thread.currentThread().getName()+"paymentInfo_TimeOut,id:"+id+"\t";

    }

}

Copy the code

Controller

@RestController

@Slf4j

public class PaymentController {



    @Resource

    private PaymentService paymentService;



    @Value("${server.port}")

    private String serverPort;



    @GetMapping("/payment/hystrix/ok/{id}")

    public String paymentInfo_OK(@PathVariable("id") Integer id){

        String result = paymentService.paymentInfo_OK(id);

        log.info("****result:"+result);

        return result;

    }



    @GetMapping("/payment/hystrix/timeout/{id}")

    public String paymentInfo_Timeout(@PathVariable("id") Integer id){

        String result = paymentService.paymentInfo_TimeOut(id);

        log.info("****result:"+result);

        return result;

    }

}

Copy the code
3.1.6 Normal test
  • Start the eureka7001

  • Enable cloud-provider-Hystrix-Payment8001 access

  • The method of sucess

http://localhost:8001/payment/hystrix/ok/2

  • Method that takes 3 seconds per call

http://localhost:8001/payment/hystrix/timeout/2

All the modules above are OK

3.2 High concurrency testing

3.2.1 Jmeter pressure test

Enable Jmeter to access the paymentInfo_TimeOut service for 20000 requests


Results,

Both are going in circles on their own

Why did he get stuck?

Tomcat’s default worker thread count is full and there are no extra threads to decompress and process

Conclusion of Jmeter pressure measurement

Now the service provider 8001 is only testing by itself. If the consumer 80 also visits at this time, the consumer can only wait, and the server 8001 is directly dragged to death

3.2.2 build cloud – consumer – feign – hystrix – order80

Pom files add hystirx dependencies

<! -- hystrix -->

<dependency>

    <groupId>org.springframework.cloud</groupId>

    <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>

</dependency>

Copy the code

Yml file

server:

  port: 80



eureka:

Client: # Register the client in the Eureka service list

    register-with-eureka: false# indicates whether to register yourself in eurekaServer. Defaulttrue

    service-url:

      defaultZone: http://localhost:7001/eureka

Copy the code

The main start class

@SpringBootApplication

@EnableFeignClients

public class OrderHystrixMain80 {



    public static void main(String[] args) {

        SpringApplication.run(OrderHystrixMain80.class,args);

    }

}

Copy the code

Business class service

@Component

@FeignClient(value="CLOUD-PROVIDER-HYSTRIX-PAYMENT")

public interface PaymentHystrixService {



    @GetMapping("/payment/hystrix/ok/{id}")

    public String paymentInfo_OK(@PathVariable("id") Integer id);



    @GetMapping("/payment/hystrix/timeout/{id}")

    public String paymentInfo_Timeout(@PathVariable("id") Integer id);

}

Copy the code

controller

@RestController

@Slf4j

public class OrderHystrixController {



    @Resource

    private PaymentHystrixService paymentHystrixService;



    @GetMapping("/consumer/payment/hystrix/ok/{id}")

    public String paymentInfo_OK(@PathVariable("id") Integer id){

        String result = paymentHystrixService.paymentInfo_OK(id);

        return result;

    }



    @GetMapping("/consumer/payment/hystrix/timeout/{id}")

    public String paymentInfo_Timeout(@PathVariable("id") Integer id){

        return paymentHystrixService.paymentInfo_Timeout(id);

    }

}

Copy the code
  • The normal test http://localhost/consumer/payment/hystrix/ok/3
  • High concurrency testing

2W request voltage 8001

Consumer 80 then goes to the normal OK microservice 8001 address

Occasionally, timeout exceptions occur

3.3 Fault Symptoms

8001 other interface services at the same layer are trapped because the tomcat thread pool has been crowded out. 80 When 8001 is called, the client responds slowly and goes in circles

3.4 Above Conclusions

It is because of these failures or underperformance that our degradation/fault tolerance/current limiting technologies are born

3.5 How can I Solve the problem? Requirements for resolution

The timeout causes the server to slow down

Error (downtime or program execution error)

The solution

The caller (80) cannot wait indefinitely because the service (8001) has timed out

If the service (8001) is down, the caller (80) cannot wait for the service to degrade

The other party service (8001) is OK, and the caller (80) is faulty or has a self-requirement (the waiting time of the caller is smaller than that of the service provider)

Copy the code

3.6 Service Degradation

3.6.1 track 8001 fallback

If a service method invocation fails and an error message is thrown, the specified method in the class is automatically called with the fallbackMethod annotated with @hystrixCommand

Business Class enablement

@HystrixCommand(fallbackMethod = "paymentInfo_TimeOutHandler",commandProperties =

 @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value="3000"))

public String paymentInfo_TimeOut(Integer id){

    int age = 10 / 0;

    int second = 5;

    try { TimeUnit.SECONDS.sleep(second); } catch (InterruptedException e) { e.printStackTrace(); }

    return "Thread pool :" + Thread.currentThread().getName()+"paymentInfo_TimeOut,id:"+id+"\t";

}



public String paymentInfo_TimeOutHandler(Integer id){

    return "Thread pool :" + Thread.currentThread().getName()+"System busy please try again later id:"+id+"\t"+"Ha ha";

}

Copy the code

Add @enablecircuitbreaker for primary startup class activation

@SpringBootApplication

@EnableEurekaClient

@EnableCircuitBreaker

public class PaymentHystrixMain8001 {

    public static void main(String[] args) {

        SpringApplication.run(PaymentHystrixMain8001.class, args);

    }

}

Copy the code


The figure above intentionally creates two exceptions

  1. int age = 10 / 0; Calculate anomaly
  2. Timeout exception The current service is unavailable and the solution to service degradation is paymentInfo_TimeOutHandler
3.6.2 80 fallback

(1) YML files enable Feign support for Hystrix

server:

  port: 80



eureka:

Client: # Register the client in the Eureka service list

    register-with-eureka: false# indicates whether to register yourself in eurekaServer. Defaulttrue

    service-url:

      defaultZone: http://localhost:7001/eureka



feign:

  hystrix:

    enabled: true

Copy the code

(2) Main boot class

@SpringBootApplication

@EnableFeignClients

@EnableHystrix

public class OrderHystrixMain80 {



    public static void main(String[] args) {

        SpringApplication.run(OrderHystrixMain80.class,args);

    }

}

Copy the code

(3) the business class

@RestController

@Slf4j

public class OrderHystrixController {



    @Resource

    private PaymentHystrixService paymentHystrixService;





    @GetMapping("/consumer/payment/hystrix/ok/{id}")

    public String paymentInfo_OK(@PathVariable("id") Integer id){

        String result = paymentHystrixService.paymentInfo_OK(id);

        return result;

    }



    @GetMapping("/consumer/payment/hystrix/timeout/{id}")

    @HystrixCommand(fallbackMethod = "paymentInfo_TimeOutFallbackMethod",commandProperties =

      @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value="1500"))

    public String paymentInfo_Timeout(@PathVariable("id") Integer id){

        return paymentHystrixService.paymentInfo_Timeout(id);

    }



    public String paymentInfo_TimeOutFallbackMethod(Integer id){

        return "This is consumer 80. The debit system is busy. Please try again later.;

    }

}

Copy the code

test

3.6.3 Existing Problems
3.6.3.1 Each business method corresponds to a bottom-of-the-line method, code bloat

How to solve it?

Using @defaultProperties (defaultFallback = “”)

@defaultProperties (defaultFallback = “”) 1: 1 Configure a service degradation method for each method. Technically it is possible, but in practice it is not possible. Other common can be through @defaultProperties (defaultFallback = “”) unified jump to unified processing results page common and exclusive respectively, to avoid code inflation, reasonable reduction of code volume

3.6.3.2 mixed with business logic, code confusion

How to solve it?

Decoupling can be achieved by adding an implementation class for service degradation handling to the interface defined by the Feign client

Specific steps

  • According to the existing PaymentHystrixService interface of Cloud-consumer-Feign-Hystrix-Order80, Create a new class PaymentFallbackService to implement the PaymentHystrixService interface and uniformly handle exceptions for the methods in the interface
  • Yml files turn on Feign support for Hystrix
feign:

  hystrix:

    enabled: true

Copy the code
  • PaymentFallbackService interface
@Component

public class PaymentFallbackService implements PaymentHystrixService{



    @Override

    public String paymentInfo_OK(Integer id) {

        return "---PaymentFallbackService paymentInfo_OK";

    }



    @Override

    public String paymentInfo_Timeout(Integer id) {

        return "---PaymentFallbackService paymentInfo_Timeout";

    }

}

Copy the code
  • test

Eureka7001 start

PaymentHystrixMain8001 start

Normal visit http://localhost/consumer/payment/hystrix/ok/2 test

Deliberately disable microservice 8001


At this point, the server provider is down, but we have degraded the service so that the client will get a message when the server is unavailable instead of hanging up and killing the server

3.7 Service fuse Failure

3.7.1 What is circuit breaker

Circuit breaker mechanism is a kind of micro – service link protection mechanism to deal with avalanche effect. If a microservice on the fan out link is unavailable or the response time is too long, the service is degraded. In this way, the microservice invocation of the node is interrupted and an incorrect response message is quickly returned. In the Framework of Springcloud, the circuit breaker mechanism is implemented by Hystrix. Hystrix will monitor the call status between microservices. When the failed calls reach a certain threshold, the default is 20 failed calls within 5 seconds, the circuit breaker mechanism will be activated. The comment on the circuit breaker mechanism is @hystrixCommand

3.7.2 field

Modify the cloud – the provider – hystrix – payment8001

paymentService

@Service

public class PaymentService {



    //== The service is fused

    @HystrixCommand(fallbackMethod = "paymentCircuitBreaker_fallback",commandProperties = {

            @HystrixProperty(name="circuitBreaker.enabled",value="true"),// Whether to turn on the circuit breaker

            @HystrixProperty(name="circuitBreaker.requestVolumeThreshold",value="10"), // Number of requests

            @HystrixProperty(name="circuitBreaker.sleepWindowInMilliseconds",value="10000"), // Time window

            @HystrixProperty(name="circuitBreaker.errorThresholdPercentage",value="60")}) // What is the failure rate after trip

    public String paymentCircuitBreaker(@PathVariable("id") Integer id){

        if(id < 0) {

            throw new RuntimeException("**** ID cannot be negative");

        }

        String serialNumber = IdUtil.simpleUUID();

        return Thread.currentThread().getName()+"\t"+"Call successful, serial number :"+serialNumber;

    }

    public String paymentCircuitBreaker_fallback(@PathVariable("id") Integer id){

        return "Id cannot be negative, please try again later, id:"+id;

    }

}

Copy the code

paymentController

@RestController

@Slf4j

public class PaymentController {



    @Resource

    private PaymentService paymentService;



    @Value("${server.port}")

    private String serverPort;



    //===== service meltdown

    @GetMapping("/payment/circuit/{id}")

    public String paymentCircuitBreaker(@PathVariable("id") Integer id){

        String result = paymentService.paymentCircuitBreaker(id);

        log.info("*****result:"+result);

        return result;

    }

}

Copy the code

test

The right to visit: http://localhost:8001/payment/circuit/2

Error visit: http://localhost:8001/payment/circuit/-10

Visit the error page several times and slowly correct it, discovering that even the correct access address cannot be accessed

3.7.3 small summary
Fusing type

The current service is no longer invoked in the fusing open request. The internal clock is generally MTTR(average fault handling time). When the fusing open time reaches the set clock, the service enters the semi-fusing state

Fuse closing The fuse closing does not fuse the service

If the request is successful and meets the rules, the current service is considered normal and the fuse is disabled



Under what circumstances does a circuit breaker work?

There are three important parameters involved in the circuit breaker: snapshot time window, total number of requests threshold, and error percentage threshold

1. Snapshot time window: The circuit breaker needs to collect request and error data to determine whether to enable the circuit breaker. The snapshot time window is the latest 10 seconds by default

2. Total number of requests threshold: In the snapshot time window, the total number of requests must meet the threshold to be eligible for a circuit breaker. The default value is 20

3. Error percentage threshold: When the total number of requests exceeds the threshold in the snapshot time window, such as 30 calls, if 15 of these 30 calls have timeout exceptions, that is, errors exceeding 50%, the breaker will be opened by default at the 50% threshold



What are the conditions for opening or closing the circuit breaker?

When a certain threshold is met (more than 20 requests in 10 seconds by default)

When the failure rate reaches a certain level (more than 50% of requests fail within 10 seconds by default)

At the above threshold, the circuit breaker will open

When enabled, all requests are not forwarded

After a period of time (5 seconds by default) the breaker is half-open and will allow one of the requests to be forwarded. If successful, the circuit breaker will close, if not, continue to open.



After the circuit breaker is turned on?

1. When another request is invoked, the main logic will not be called, but degraded fallback will be directly called. Through the circuit breaker, the fault will be found automatically and the degraded logic will be switched to the main logic to reduce the response delay.

2. How to restore the original master logic?

Hystrix gives us automatic recovery

As the circuit breaker opens, to fuse the main logic, hystrix will start a sleep time window, within the time window, relegation logic is temporary into logic, expire when sleep time window, the breaker will enter a state of half open, release a request to the original primary logically, if the request to return to normal, the circuit breaker will continue to be closed, The master logic resumes, and if the request is still in question, the breaker continues to open and the sleep time window restarts.

Copy the code

~

3.8 Traffic Limiting services

More on Alibaba’s Sentinel later

4. Hystrix workflow

Website: https://github.com/Netflix/Hystrix/wiki/How-it-Works

Official website flow chart:

5. Service monitoring hystrixDashboard

5.1 an overview of the

In addition to isolating calls to dependent services, Hystrix also provides quasi-real-time call monitoring (Hystrix Dashboard). Hystrix continuously records the execution information of all requests initiated through Hystrix and presents it to users in the form of statistical reports and graphs. How many requests are executed per second how many successes, how many failures, etc. Netflix monitors these metrics through the Hystrix-metrics-event-stream project. Spring Cloud also provides integration with the Hystrix Dashboard, which translates monitoring content into a visual interface

5.2 Dashboard 9001

New cloud – consumer – hystrix – dashboard9001

Add dependencies to POM files

<dependencies>

<! -- hystrix dashboard-->

    <dependency>

        <groupId>org.springframework.cloud</groupId>

        <artifactId>spring-cloud-starter-netflix-hystrix-dashboard</artifactId>

        <version>2.1.5.RELEASE</version>

    </dependency>

<! --web-->

    <dependency>

        <groupId>org.springframework.boot</groupId>

        <artifactId>spring-boot-starter-web</artifactId>

    </dependency>

    <dependency>

        <groupId>org.springframework.boot</groupId>

        <artifactId>spring-boot-starter-actuator</artifactId>

    </dependency>

<! -- devtools-->

    <dependency>

        <groupId>org.springframework.boot</groupId>

        <artifactId>spring-boot-devtools</artifactId>

        <scope>runtime</scope>

        <optional>true</optional>

    </dependency>

    <dependency>

        <groupId>org.projectlombok</groupId>

        <artifactId>lombok</artifactId>

        <optional>true</optional>

    </dependency>

    <dependency>

        <groupId>org.springframework.boot</groupId>

        <artifactId>spring-boot-starter-test</artifactId>

        <scope>test</scope>

    </dependency>

</dependencies>

Copy the code

Write yML files

server:

  port: 9001

Copy the code

Startup class + new annotation @enablehystrixDashboard

@SpringBootApplication

@EnableHystrixDashboard

public class HystrixDashboardMain9001 {

    public static void main(String[] args) {

        SpringApplication.run(HystrixDashboardMain9001.class, args);

    }

}

Copy the code

All provider microservice provider classes (8001/8002/8003) need to monitor dependency configurations

<dependency>

    <groupId>org.springframework.boot</groupId>

    <artifactId>spring-boot-starter-actuator</artifactId>

</dependency>

Copy the code

Start cloud-consumer- Hystrix -dashboard9001 This micro-service monitors micro-service 8001

5.3 circuit breaker demo (service monitoring hystrixDashboard)

5.3.1 modify cloud provider – hystrix – payment8001

Note: The new Hystrix version needs to specify a monitoring path in the main startup class PaymentHystrixMain8001 or report Unable to connect to Command Metric Stream

@SpringBootApplication

@EnableEurekaClient

@EnableCircuitBreaker

public class PaymentHystrixMain8001 {

    public static void main(String[] args) {

        SpringApplication.run(PaymentHystrixMain8001.class, args);

    }



    / * *

* This configuration is configured for service monitoring and has nothing to do with service fault tolerance per se.

Since the default path for SpringBoot is not "/hystrix.stream", you can simply configure the servlets above and below in your own project

* /


    @Bean

    public ServletRegistrationBean getServlet(a){

        HystrixMetricsStreamServlet streamServlet = new HystrixMetricsStreamServlet();

        ServletRegistrationBean registrationBean = new ServletRegistrationBean(streamServlet);

        registrationBean.setLoadOnStartup(1);

        registrationBean.addUrlMappings("/hystrix.stream");

        registrationBean.setName("HystrixMetricsStreamServlet");

        return registrationBean;

    }

}

Copy the code
5.3.2 Monitoring tests
5.3.2.1 start eureka7001
5.3.2.2 Observing the Monitoring window

(1) monitor 9001 8001

  • 1: Delay: controls the Delay of polling monitoring information on the server. The default value is 2000 milliseconds. You can configure this parameter to reduce the network and CPU consumption of the client.
  • 2: Title: this parameter corresponds to the content after the header Title Hystrix Stream. By default, the URL of the specific monitoring instance is used. You can configure this information to display a more appropriate Title.

(2) Test address

http://localhost:8001/payment/circuit/3

http://localhost:8001/payment/circuit/-10

Visit the correct address first, then visit the wrong address, then the correct address, you will find that the circuit breaker is slowly released monitoring success graph


Monitoring failure result diagram


(3) how to think

7 color

1 ring

Solid circle: Has two meanings. It represents the health of the instance by changing its color, decreasing from green < yellow < orange < red.

In addition to changing color, the size of the solid circle also changes according to the request traffic of the instance, and the larger the traffic, the larger the solid circle. Therefore, through the display of the solid circle, fault instances and high pressure instances can be quickly found in a large number of instances.

1 line

Curve: Used to record the relative change of the flow over a 2-minute period. It can be used to observe the upward and downward trend of the flow.

Copy the code

Figure 1


Figure 2