Why is hystrix\ needed

Hystrix is available on Github

Hystrix is also Netflix’s contribution to distributed systems. Also entering the non-maintenance phase. No maintenance does not mean elimination. It just means that the technology keeps iterating. Once the brilliant once the design is still worth us to learn. \

In a distributed environment, service scheduling is both a feature and a headache. In the Service Governance chapter we introduced the capabilities of service governance. In the previous lesson, we also introduced the service invocation by ribbon and FEign. Now it comes naturally to service monitoring management. Hystrix is an isolation protection for services. To ensure that the service does not have associated failures. The entire system is unusable

As shown in the figure above, when multiple clients make service calls to Aservice, there are three Aservice services in the distributed system, among which some logic of Aservice needs to be processed by Bservice. Bservice Two services are deployed in a distributed system. At this time, the communication between one of the Aservice and the Bservice is abnormal because of network problems. If Bservice does log processing. In terms of the system as a whole, the loss of the log should be nothing compared to the system outage. But at this time, the whole Aservice service is unavailable due to network communication problems. A little not to try.

Look at the picture above. – > C – A – > B > D. The D service is down at this time. C An exception occurs because D breaks down. But C’s thread is still responding to B. As concurrent requests come in, the C server thread pool is full, causing the CPU to swell. At this time, other services of C service will also be affected by CPU increase, resulting in slow response.

Features \

Hystrix is a low latency and fault tolerant third-party component library. Access points designed to isolate remote systems, services, and third-party libraries. Maintenance has been stopped on the official website and it is recommended to use Resilience4J. But in China, we have SpringCloud Alibaba.

Hystrix implements latency and fault tolerance mechanisms in distributed systems by isolating access between services to address service avalanche scenarios and provides fallback alternatives based on Hystrix.

Fault-tolerant for network delays and faults
Block the distributed system avalanche
Fast failure and slow recovery \
Service degradation \
Real-time monitoring and alarm

99.9 9 30 = 99.7 % u p t I m e 0.3% o f 1 b I l L I O n r e q u e s t s = 3, 000 000 f a i l u r e s 2 + h o u r s d o w n t i m e / m o n t h e v e n i f a l l d e p e n d e n c i e s h a v e e x c e L l e n t upt I m e.99.99 ^{30} = 99.7\% \quad uptime \ 0.3\% \quad of \quad 1 \quad billion \quad requests \quad = \quad 3 million \quad failures \ 2+ \quad hours \quad downtime/month \quad even \quad if \quad all \quad dependencies \quad have \quad excellent \quad uptime. 99.9930 = 99.7% of1billionrequests uptime0.3% = 3000000 failures2 + hoursdowntime/monthevenifalldependencieshaveexcellentuptime .

Interview website gives a statistic. The number of exceptions in each of the 30 services is 0.01%. For every 100 million requests, 300,000 fail. That translates to at least two hours of downtime per month. This is fatal to the Internet system.

Here are the two scenarios presented on the website. Similar to what we saw in the last chapter. Both are scenarios that introduce an avalanche of services.

Project Preparation \

In the OpenFeign feature we talked about the feign based service circuit breaker and said it was based internally on Hystrix. At that time we also looked at the structure inside the POM. The Ribbon is built into Eureka and the Hystrix module is built into eureka.

Hystrix is included in the package though. Let’s introduce the corresponding start to enable the configuration. This is actually an example of the OpenFeign project. In the project we provided PaymentServiceFallbackImpl, PaymentServiceFallbackFactoryImpl two classes as an alternative. At the time, however, we simply pointed out that OpenFeign supports alternatives to setting up both. Today we \

\
\

\

org.springframework.cloud
\

spring-cloud-starter- Netflix-hystrix
\
\

The interface test

First we test the Payment# createByOrder interface. Check the response \

Test payment#getTimeout/id method. \

* Now we use jemeter to test payment#getTimeOut/id interface. A person who needs 4S to wait will result in a resource depletion problem. At this point our payment#createByOrder will also be blocked.

* The default tomcat maximum number of threads in Spring is 200. To protect our laptops. Here we set the number of threads to be small. This makes it easier to reproduce when a thread is full. The payment#createByOrder interface is affected when the thread is full.

\

Above we are testing the native interface of Payment. If the pressure is the ORDER module. If fallback is not configured in OpenFeign. Then the order service will be full because the payment#getTimeOut/ ID interface is concurrent, causing the order module to respond slowly. This is the avalanche effect. Let’s tackle avalanches in two ways. \

Business Isolation \

The above scenario occurs because payment#createByOrder and Payment# getTimeOut/ ID both belong to the Payment service. A Payment service is actually a Tomcat service. There is a thread pool for the same Tomcat service. Each time a request falls into the Tomcat service, it requests a thread from the thread pool. Only when the thread is acquired can the thread handle the requested business. This is because of shared thread pools within Tomcat. So when payment#getTimeOut/id is concurrent it will empty the thread pool. Resulting in other excuses, even unrelated interfaces have no resources to apply for. Can only dry wait for the release of resources. \

It’s like taking an elevator during rush hour when all the elevators are used for a period of time because one company is concentrating on work. There’s no way the state leader can get on the elevator. \

We also know that this situation is easy to solve. Each park will have a dedicated elevator for special use. \

The same is true of our approach to these problems. Quarantine. Different interfaces have different thread pools. So you don’t create avalanches. \

Thread isolation

Remember we set the maximum number of threads for the Order module to 10 above to demonstrate concurrency. Here we call the order/ getPayment /1 interface through the test tool to see how the log print \

The current thread is printed where our interface calls it. We can see that it’s all those 10 threads going back and forth. That’s what causes avalanches. \

@HystrixCommand(\ groupKey = "order-service-getPaymentInfo",\ commandKey = "getPaymentInfo",\ threadPoolKey = "orderServicePaymentInfo",\ commandProperties = {\ @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value = "1000")\ },\ threadPoolProperties = {\ @HystrixProperty(name = "coreSize" ,value = "6"),\ @HystrixProperty(name = "maxQueueSize",value = "100"),\ @HystrixProperty(name = "keepAliveTimeMinutes",value = "2"),\ @HystrixProperty(name = "queueSizeRejectionThreshold",value = "100") },\ fallbackMethod = "getPaymentInfoFallback"\ )\ @RequestMapping(value = "/getpayment/{id}",method = RequestMethod.GET)\ public ResultInfo getPaymentInfo(@PathVariable("id") Long id) {\ log.info(Thread.currentThread().getName()); \ return restTemplate.getForObject(PAYMENT_URL+"/payment/get/"+id, ResultInfo.class); \}\ public ResultInfo getPaymentInfoFallback(@pathVariable ("id") Long ID) {\ log.info(" already entered the alternative, "+ thread.currentThread ().getName()); \ return new ResultInfo(); \ }\ @HystrixCommand(\ groupKey = "order-service-getpaymentTimeout",\ commandKey = "getpaymentTimeout",\ threadPoolKey =  "orderServicegetpaymentTimeout",\ commandProperties = {\ @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value = "10000")\ },\ threadPoolProperties = {\ @HystrixProperty(name  = "coreSize" ,value = "3"),\ @HystrixProperty(name = "maxQueueSize",value = "100"),\ @HystrixProperty(name = "keepAliveTimeMinutes",value = "2"),\ @HystrixProperty(name = "queueSizeRejectionThreshold",value = "100") }\ )\ @RequestMapping(value = "/getpaymentTimeout/{id}",method = RequestMethod.GET)\ public ResultInfo getpaymentTimeout(@PathVariable("id") Long id) {\ log.info(Thread.currentThread().getName()); \ return orderPaymentService.getTimeOut(id); \} \Copy the code

It’s not a good demo, so I’ll just show you the data.

| concurrency in getpaymentTimeout | getpaymentTimeout / {id} | / getpayment / {id} | | — – | — – | — – | | | 20 three threads start error | played after a period of time Can respond normally; Will also be slow, the CPU thread need time | | above 30 above | | | | above 50 | | will timeout, because the order calls payment service pressure | will be affected

If we load hystrix with the payment native service, the third situation above will not occur. The reason I put it on order is to show you the avalanche scenario. The maximum number of threads set by payment is also 10 at concurrent 50, which itself has throughput. The order# getPyament/ID interface has its own thread running in the order module because of hystrix thread isolation, but it is not valid because the native service does not allow the call to time out. This demonstration is also intended to lead to a scenario simulation of fallback solving an avalanche. \
We can set the Fallback with hystrix in the Payment service. Ensure low latency for the Payment service to ensure that the Order module does not cause order# getPayment, a normal interface exception, because the payment itself is slow. \
One more thing is that threads are isolated through hystrix. But we also have slightly longer response times when running other interfaces. This is because the CPU has overhead when performing the thread switch. This is also a pain point. We can’t segregate threads at will. Which brings us to our semaphore isolation.

Semaphore isolation

We will not demonstrate semaphore isolation here. The presentation is not very meaningful

 @HystrixCommand(\
            commandProperties = {\
                    @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value = "1000"),\
                    @HystrixProperty(name = HystrixPropertiesManager.EXECUTION_ISOLATION_STRATEGY,value = "SEMAPHORE"),\
                    @HystrixProperty(name = HystrixPropertiesManager.EXECUTION_ISOLATION_SEMAPHORE_MAX_CONCURRENT_REQUESTS,value = "6")\
            },\
            fallbackMethod = "getPaymentInfoFallback"\
    ) \
Copy the code

We configured the semaphore as above to be a maximum of 6. Indicates a wait after concurrency 6. The waiting timeout period is not 1 second.

Advantage measures | | | faults asynchronous | | | | overtime fusing | — – | — – | — – | — – | — – | — – | | | thread isolation one call a thread pool; Mutual non-interference; To ensure high availability | CPU thread switching overhead | | | |))) | | semaphore isolation to avoid CPU switches. Efficient | in high concurrency scenarios need to be stored signal greater | * | | |)

In addition to thread isolation, semaphore isolation and other isolation methods, we can enhance stability through request merge, interface data caching and other methods.

Service degradation \

The trigger condition

In addition to the abnormal HystrixBadRequestException program. \
Service invocation times out \
Service circuit breaker \
The thread pool and semaphore are insufficient

Above our timeout interface. Either thread isolation or semaphore isolation will directly reject subsequent requests when the condition is met. It’s too rough. We also mentioned Fallback above. \

Remember that when we ordered 50 concurrent timeouts, the getPayment interface was abnormal. At that time, it was identified that the native Payment service was under pressure. If we add fallback to payment, we can ensure that we can respond quickly even when resources are low. This at least ensures the availability of the order#getpayment method. \

* However, this configuration is experimental. In real production we would not be able to configure fallback on every method. This is stupid.

Hystrix has a global fallback in addition to custom fallbacks for methods. You just have to pass on the class@DefaultProperties(defaultFallback = "globalFallback")To implement global alternatives. A method that satisfies the conditions that trigger degradation if the request corresponds toHystrixCommandIf fallback is not configured in the annotation, the global fallback of the class is used. If there is none globally, an exception is thrown.

# # #

* althoughDefaultPropertiesYou can avoid configuring fallbacks on every interface. But this global does not seem to be a global fallback. We still need to configure fallbacks on each class. The author looked up the data seems not to have

* But in the OpenFeign feature we talked about the service degradation that OpenFeign implemented in conjunction with Hystrix. Remember there’s one in thereFallbackFactoryThis class. This class can be understood as springBeanFactory. This class is used to generate what we needFallBack. We can generate a generic fallback proxy object in this factory. The proxy object can take in and out parameters based on the method signature of the proxy method.

* So we can configure the factory class in all openFeign places. This avoids generating many fallbacks. The fly in the ointment still needs to be specified everywhere. aboutFallBackFactoryThose interested can download the source code or check out the OpenFeign topic on the home page.

Service circuit breaker \

@HystrixCommand(\ commandProperties = {\ @HystrixProperty(name = "circuitBreaker.enabled",value = "true"), / / whether open circuit breaker \ @ HystrixProperty (name = "circuitBreaker. RequestVolumeThreshold", value = "10"), / / request number \ @ HystrixProperty (name = "circuitBreaker. SleepWindowInMilliseconds", value = "10000"), / / time range \ @ HystrixProperty (name = "circuitBreaker. ErrorThresholdPercentage", value = "60"), \},\ fallbackMethod = "getInfoFallback"\)\ @requestMapping (value = "/get", method = RequestMethod.GET)\ public ResultInfo get(@RequestParam Long id) {\ if (id < 0) {\ int i = 1 / 0; \ }\ log.info(Thread.currentThread().getName()); \ return orderPaymentService.get(id); \ }\ public ResultInfo getInfoFallback(@RequestParam Long id) { return new ResultInfo(); \} \Copy the code

First we turn on the fuse \ with circuitBreaker. Enabled =true
circuitBreaker.requestVolumeThresholdSet the number of statistical requests \
circuitBreaker.sleepWindowInMillisecondsSet the time sliding unit, how long to try to open after triggering the fuse, and the commonly known half-open state \
circuitBreaker.errorThresholdPercentageSet the critical condition for triggering the fuse breaker \
If the error rate of the last 10 requests reaches 60%, the circuit breaker is degraded and the service is in the circuit breaker state for 10 seconds. Try to get the latest service status after 10S
Next, we interface through JMeterhttp://localhost/order/get?id=-1Do 20 tests. Although none of these 20 times will be an error. But we’ll see that the initial error was due to an error in our code. The latter error is the hystrix circuit breaker. Short-circuited and fallback failed

Normally we will configure fallbacks in hystrix, both of which were implemented in the demoting section above. I’m just going to let it go so I can see the difference.

The parameters configured in the HystrixCommand are basically in the HystrixPropertiesManager object. We can see that the fuse configuration has 6 parameters. That’s basically our top four configurations

Service flow limiting \

Service degradation The two types of isolation we mentioned above are strategies to implement traffic limiting.

Request merge \

In addition to circuit breakers, downages, and current limiting accidents, Hystrix also provides us with requests for merges. As the name implies, merging multiple requests into a single request has reached the problem of reducing concurrency. \
For example, we have an order with a query when the order informationorder/getId? id=1All of a sudden, 10,000 requests came in. To ease the pressure, let’s focus on requests once every 100 requestsorder/getIds? ids=xxxxx. So we end up in the Payment module with 10000/100=100 requests. Let’s implement the request merge through the code configuration.

HystrixCollapser\

@Target({ElementType.METHOD})\
@Retention(RetentionPolicy.RUNTIME)\
@Documented\
public @interface HystrixCollapser {\
    String collapserKey() default "";

    String batchMethod();

    Scope scope() default Scope.REQUEST;

    HystrixProperty[] collapserProperties() default {};\
} \
Copy the code

All the properties configuration in Hystrix HystrixPropertiesManager. In Java. There we find the Collapser with only two related configurations. Represents the maximum number of requests and the statistical time unit respectively.

@HystrixCollapser(\ scope = com.netflix.hystrix.HystrixCollapser.Scope.GLOBAL,\ batchMethod = "getIds",\ collapserProperties = {\ @HystrixProperty(name = HystrixPropertiesManager.MAX_REQUESTS_IN_BATCH , value = "3"),\ @HystrixProperty(name = HystrixPropertiesManager.TIMER_DELAY_IN_MILLISECONDS, value = "10")\ }\ )\ @RequestMapping(value = "/getId", method = RequestMethod.GET)\ public ResultInfo getId(@RequestParam Long id) {\ if (id < 0) {\ int i = 1 / 0; \ }\ log.info(Thread.currentThread().getName()); \ return null; \ }\ @HystrixCommand\ public List<ResultInfo> getIds(List<Long> ids) {\ System.out.println(ids.size()+"@@@@@@@@@"); \ return orderPaymentService.getIds(ids); \} \Copy the code

Above, we configured that getId will make getIds requests. The maximum number is 10S. The three requests will be combined. Then getIds has the Payment service to query separately and return multiple ResultInfo.

The jemeter is used to perform pressure testing on the getId interface. The maximum length of IDS in the log is 3. Verify the configuration of our getId interface above. This ensures that interfaces are merged to reduce TPS when high concurrency occurs. \

Above we did interface merge by requesting method annotations. Actually the internal hystrix is through HystrixCommand\

Workflow \

The official website gives the process diagram, and equipped with process instructions, a total of 9. Let’s translate it. \

Create HystrixCommand or HystrixObservableCommand object \

* HystrixCommand: used to depend on a single service * HystrixObservableCommand: used to depend on multiple services \

2, hystrrixCommand execute execute, queue; HystrixObservableCommand performs observe, toObservable\

Check whether the cache is enabled and whether the cache is hit. If the cache is hit, the cache response \ is returned
(4) Whether it is fuses, if it is fuses, fallback downgrade; Release \ if the fuse is closed
Whether the thread pool and semaphore have resources for use. Fallback if there are not enough resources. If necessary, release \
⑥. Execute the run or construct method. These two methods are native to hystrix, and the Java implementation of hystrix will implement the logic for both methods, which springcloud has encapsulated for us. I’m not going to look at these two methods here. Fallback if the execution fails or times out. During this period, logs will be collected to the monitoring center. \
⑦, calculate the fuse data, determine whether to try to release; The data collected here will be viewed in the Dashboard of Hystrix. stream. Convenient for us to locate the interface health status \
⑧, in the flow chart, we can also see that ④, ⑤, ⑥ all point to fallback. It’s also what we call a service downgrade. Demoting services is a hot business for Hystrix. \
⑨. Return a response

HystrixDashboard\

An important feature of hystrix in addition to service fuses, downages, and current limiting is real-time monitoring. And form a report to collect interface request information. \

Installing Hystrix is also simple, just configuring Actutor andhystrix-dashboardTwo modules will do \

 <dependency>\
            <groupId>org.springframework.cloud</groupId>\
            <artifactId>spring-cloud-starter-netflix-hystrix-dashboard</artifactId>\
        </dependency>\
        <dependency>\
            <groupId>org.springframework.boot</groupId>\
            <artifactId>spring-boot-starter-actuator</artifactId>\
        </dependency> \
Copy the code

On the startup classEnableHystrixDashboardDashboard was introduced. We don’t need to do any development. This, like Eureka, requires a simple primer.

And so the dashboard was built. The dashboard is primarily used to monitor hystrix request processing. So we also need to expose the endpoint in the Hystrix request. \

Add the following configuration to the module that uses the hystrix command. I will add \ to the order module

@Component\ public class HystrixConfig {\ @Bean\ public ServletRegistrationBean getServlet(){\ HystrixMetricsStreamServlet streamServlet = new HystrixMetricsStreamServlet(); \ ServletRegistrationBean registrationBean = new ServletRegistrationBean(streamServlet); \ registrationBean.setLoadOnStartup(1); \ / / pay attention to this configuration/hystrix stream eventually access address is localhost: port/hystrix stream; If the configuration in the configuration file is needed in the new version \ / / plus physical, localhost: port/physical \ registrationBean addUrlMappings (". / hystrix stream "); \ registrationBean.setName("HystrixMetricsStreamServlet"); \ return registrationBean; \ \} \}Copy the code

Then we access the Order modulelocalhost/hystrix.streamThe ping interface appears. Indicates that our Order module monitoring installation is successful. Of course order also needs the ACTUATOR module \
Let’s use the JMeter to test our circuit breaker, downgrade, and current limiting interfaces. Let’s see the status of each interface through the dashboard.

The animation above looks like our service is still very busy. Think about e-commerce and when you look at the polyline image of every interface it doesn’t look like your heart beat. Too high for you to worry about. Too low and no achievement too high. Let’s take a look at the dashboard metrics in detail

Let’s look at the status of the interfaces during the run of our service.

Aggregated monitoring \

Above we are using the new modulehystrix-dashboardTo monitor our Order module. In practice, it is not possible to configure hystrix only in order. \
We are just above the order configuration for demonstration purposes. Now we also configure hystrix in Payment. Then we need to switch back and forth between the order and payment monitoring data in the Dashboard. \
That’s where our aggregated surveillance comes in. Let’s introduce Payment into hystrix before we do aggregate monitoring. Note that we injected hystrix.stream through beans. The access prefix does not need to operate

New hystrix – turbine \

pom

<! - new hystrix dashboard - > \ < dependency > \ < groupId > org. Springframework. Cloud < / groupId > \ <artifactId>spring-cloud-starter-netflix-turbine</artifactId>\ </dependency> \Copy the code

The main thing is to add turbine coordinates, and the others are hystrix, Dashboard and other modules. For details, see the source code at the end

yml

spring:\ application:\ name: cloud-hystrix-turbine eureka:\ client:\ register-with-eureka: true\ fetch-registry: true\ service-url:\ defaultZone: http://localhost:7001/eureka\ instance:\ prefer-ip-address: \ app-config: cloud-order-service,cloud-payment-service\ cluster-name-expression: "'default'"\ # is set as url. If /actuator/hystrix.stream is used, configure THE SUFFIX of ACTUATOR \ instanceUrlSuffix: hystrix.stream \Copy the code

Start the class

Add EnableTurbine annotations to the startup class

Source: gitee.com/zxhTom/clou…

Author: fireworks all over 13141 links: blog.csdn.net/u013132051/…

Source: CSDN

supplement

How to learn Java, share two universal learning databases:

1. Books:

codeGoogler/ProgramBooks

2: Video Tutorial:

SpringBoot, Spring, Mybatis, Redis, RabbitMQ, SpringCloud, High Concurrency (Continuous updates)

After brushing 200 questions to the top, I drifted!

Feel good friends remember to forward attention oh, the follow-up will continue to update selected technical articles!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Distributed service fuses degrade current limiting sharps to Hystrix

Why is hystrix\ needed

Features \

Project Preparation \

The interface test

Business Isolation \

Thread isolation

Semaphore isolation

Service degradation \

The trigger condition

Service circuit breaker \

Service flow limiting \

Request merge \

HystrixCollapser\

Workflow \

HystrixDashboard\

Aggregated monitoring \

New hystrix – turbine \

pom

yml

Start the class

supplement

Distributed service fuses degrade current limiting sharps to Hystrix

Why is hystrix\ needed

Features \

Project Preparation \

The interface test

Business Isolation \

Thread isolation

Semaphore isolation

Service degradation \

The trigger condition

Service circuit breaker \

Service flow limiting \

Request merge \

HystrixCollapser\

Workflow \

HystrixDashboard\

Aggregated monitoring \

New hystrix – turbine \

pom

yml

Start the class

supplement

Related Posts

Read python HTTP source code

How to use the Sarama package to operate Kafka correctly in Golang?

K8s do you know? What can you do with it?