1. An overview of the
1.1 what is
Hystrix is a system to deal with distributed delay, and fault tolerance of the open source library, in a distributed system, many rely on inevitably call fails, such as overtime, abnormal, Hystrix can guarantee in the case of a dependency problem, won’t cause the overall service failure, avoid cascading failure, in order to improve the flexibility of a distributed system. The circuit breaker itself is a switching device that turns on when Hystrix detects that a service has failed, disconnecting the service link. Instead of blocking the consumer of the service or throwing an exception to the consumer, Hystrix sends the caller an expected, FallBack response that can be processed. This ensures that service callers’ threads are not tied up unnecessarily for long periods of time, preventing failures from spreading and even avalanches in distributed systems, so Hystrix is a system defense mechanism.
1.2 What can Hystrix do
- Service degradation
- Service fusing
- Near real-time monitoring
1.3 hystrix website
https://github.com/Netflix/Hystrix/wiki/How-To-Use
1.4 Hystrix official announcement, stop more dimension
2. Hystrix important concepts
Service degradation: the server is busy, please try again later, do not let the client wait and return a friendly prompt immediately
What are the downgrades?
Abnormal program running
timeout
Service fuse fault triggers service degradation
A full thread pool/semaphore can also cause service degradation
Service fuse: similar to the fuse reaches the maximum service access, directly deny access, pull the power limit, and then call the service degradation method and return a friendly prompt (service degradation -> then fuse -> restore the calling link)
Service flow limit: second kill high concurrency and other operations, it is strictly prohibited to rush over crowded, everyone queue, N per second, orderly
Copy the code
3. Hystrix case
3.1 build
3.1.1 new cloud provider – hystrix – payment8001
3.1.2 Adding hystirx dependency to POM Files
<! -- hystrix -->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>
Copy the code
3.1.3 yml file
server:
port: 8001
spring:
application:
name: cloud-provider-hystrix-payment
eureka:
Client: # Register the client in the Eureka service list
register-with-eureka: true# indicates whether to register yourself in eurekaServer. Defaulttrue
fetch-registry: true# Whether to fetch the existing registration information from EurekaServertrueThe cluster must betrueTo use load balancing with the ribbon
service-url:
defaultZone: http://localhost:7001/eureka
Copy the code
3.1.4 Main boot class
@SpringBootApplication
@EnableEurekaClient
public class PaymentHystrixMain8001 {
public static void main(String[] args) {
SpringApplication.run(PaymentHystrixMain8001.class, args);
}
}
Copy the code
3.1.5 business class
PaymentService
@Service
public class PaymentService {
// Normal access
public String paymentInfo_OK(Integer id){
return "Thread pool :" + Thread.currentThread().getName()+"paymentInfo_OK,id:"+id+"\t";
}
public String paymentInfo_TimeOut(Integer id){
try {
Thread.sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return "Thread pool :" + Thread.currentThread().getName()+"paymentInfo_TimeOut,id:"+id+"\t";
}
}
Copy the code
Controller
@RestController
@Slf4j
public class PaymentController {
@Resource
private PaymentService paymentService;
@Value("${server.port}")
private String serverPort;
@GetMapping("/payment/hystrix/ok/{id}")
public String paymentInfo_OK(@PathVariable("id") Integer id){
String result = paymentService.paymentInfo_OK(id);
log.info("****result:"+result);
return result;
}
@GetMapping("/payment/hystrix/timeout/{id}")
public String paymentInfo_Timeout(@PathVariable("id") Integer id){
String result = paymentService.paymentInfo_TimeOut(id);
log.info("****result:"+result);
return result;
}
}
Copy the code
3.1.6 Normal test
-
Start the eureka7001
-
Enable cloud-provider-Hystrix-Payment8001 access
-
The method of sucess
http://localhost:8001/payment/hystrix/ok/2
- Method that takes 3 seconds per call
http://localhost:8001/payment/hystrix/timeout/2
All the modules above are OK
3.2 High concurrency testing
3.2.1 Jmeter pressure test
Enable Jmeter to access the paymentInfo_TimeOut service for 20000 requests
Results,
Both are going in circles on their own
Why did he get stuck?
Tomcat’s default worker thread count is full and there are no extra threads to decompress and process
Conclusion of Jmeter pressure measurement
Now the service provider 8001 is only testing by itself. If the consumer 80 also visits at this time, the consumer can only wait, and the server 8001 is directly dragged to death
3.2.2 build cloud – consumer – feign – hystrix – order80
Pom files add hystirx dependencies
<! -- hystrix -->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>
Copy the code
Yml file
server:
port: 80
eureka:
Client: # Register the client in the Eureka service list
register-with-eureka: false# indicates whether to register yourself in eurekaServer. Defaulttrue
service-url:
defaultZone: http://localhost:7001/eureka
Copy the code
The main start class
@SpringBootApplication
@EnableFeignClients
public class OrderHystrixMain80 {
public static void main(String[] args) {
SpringApplication.run(OrderHystrixMain80.class,args);
}
}
Copy the code
Business class service
@Component
@FeignClient(value="CLOUD-PROVIDER-HYSTRIX-PAYMENT")
public interface PaymentHystrixService {
@GetMapping("/payment/hystrix/ok/{id}")
public String paymentInfo_OK(@PathVariable("id") Integer id);
@GetMapping("/payment/hystrix/timeout/{id}")
public String paymentInfo_Timeout(@PathVariable("id") Integer id);
}
Copy the code
controller
@RestController
@Slf4j
public class OrderHystrixController {
@Resource
private PaymentHystrixService paymentHystrixService;
@GetMapping("/consumer/payment/hystrix/ok/{id}")
public String paymentInfo_OK(@PathVariable("id") Integer id){
String result = paymentHystrixService.paymentInfo_OK(id);
return result;
}
@GetMapping("/consumer/payment/hystrix/timeout/{id}")
public String paymentInfo_Timeout(@PathVariable("id") Integer id){
return paymentHystrixService.paymentInfo_Timeout(id);
}
}
Copy the code
- The normal test http://localhost/consumer/payment/hystrix/ok/3
- High concurrency testing
2W request voltage 8001
Consumer 80 then goes to the normal OK microservice 8001 address
Occasionally, timeout exceptions occur
3.3 Fault Symptoms
8001 other interface services at the same layer are trapped because the tomcat thread pool has been crowded out. 80 When 8001 is called, the client responds slowly and goes in circles
3.4 Above Conclusions
It is because of these failures or underperformance that our degradation/fault tolerance/current limiting technologies are born
3.5 How can I Solve the problem? Requirements for resolution
The timeout causes the server to slow down
Error (downtime or program execution error)
The solution
The caller (80) cannot wait indefinitely because the service (8001) has timed out
If the service (8001) is down, the caller (80) cannot wait for the service to degrade
The other party service (8001) is OK, and the caller (80) is faulty or has a self-requirement (the waiting time of the caller is smaller than that of the service provider)
Copy the code
3.6 Service Degradation
3.6.1 track 8001 fallback
If a service method invocation fails and an error message is thrown, the specified method in the class is automatically called with the fallbackMethod annotated with @hystrixCommand
Business Class enablement
@HystrixCommand(fallbackMethod = "paymentInfo_TimeOutHandler",commandProperties =
@HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value="3000"))
public String paymentInfo_TimeOut(Integer id){
int age = 10 / 0;
int second = 5;
try { TimeUnit.SECONDS.sleep(second); } catch (InterruptedException e) { e.printStackTrace(); }
return "Thread pool :" + Thread.currentThread().getName()+"paymentInfo_TimeOut,id:"+id+"\t";
}
public String paymentInfo_TimeOutHandler(Integer id){
return "Thread pool :" + Thread.currentThread().getName()+"System busy please try again later id:"+id+"\t"+"Ha ha";
}
Copy the code
Add @enablecircuitbreaker for primary startup class activation
@SpringBootApplication
@EnableEurekaClient
@EnableCircuitBreaker
public class PaymentHystrixMain8001 {
public static void main(String[] args) {
SpringApplication.run(PaymentHystrixMain8001.class, args);
}
}
Copy the code
The figure above intentionally creates two exceptions
- int age = 10 / 0; Calculate anomaly
- Timeout exception The current service is unavailable and the solution to service degradation is paymentInfo_TimeOutHandler
3.6.2 80 fallback
(1) YML files enable Feign support for Hystrix
server:
port: 80
eureka:
Client: # Register the client in the Eureka service list
register-with-eureka: false# indicates whether to register yourself in eurekaServer. Defaulttrue
service-url:
defaultZone: http://localhost:7001/eureka
feign:
hystrix:
enabled: true
Copy the code
(2) Main boot class
@SpringBootApplication
@EnableFeignClients
@EnableHystrix
public class OrderHystrixMain80 {
public static void main(String[] args) {
SpringApplication.run(OrderHystrixMain80.class,args);
}
}
Copy the code
(3) the business class
@RestController
@Slf4j
public class OrderHystrixController {
@Resource
private PaymentHystrixService paymentHystrixService;
@GetMapping("/consumer/payment/hystrix/ok/{id}")
public String paymentInfo_OK(@PathVariable("id") Integer id){
String result = paymentHystrixService.paymentInfo_OK(id);
return result;
}
@GetMapping("/consumer/payment/hystrix/timeout/{id}")
@HystrixCommand(fallbackMethod = "paymentInfo_TimeOutFallbackMethod",commandProperties =
@HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value="1500"))
public String paymentInfo_Timeout(@PathVariable("id") Integer id){
return paymentHystrixService.paymentInfo_Timeout(id);
}
public String paymentInfo_TimeOutFallbackMethod(Integer id){
return "This is consumer 80. The debit system is busy. Please try again later.;
}
}
Copy the code
test
3.6.3 Existing Problems
3.6.3.1 Each business method corresponds to a bottom-of-the-line method, code bloat
How to solve it?
Using @defaultProperties (defaultFallback = “”)
@defaultProperties (defaultFallback = “”) 1: 1 Configure a service degradation method for each method. Technically it is possible, but in practice it is not possible. Other common can be through @defaultProperties (defaultFallback = “”) unified jump to unified processing results page common and exclusive respectively, to avoid code inflation, reasonable reduction of code volume
3.6.3.2 mixed with business logic, code confusion
How to solve it?
Decoupling can be achieved by adding an implementation class for service degradation handling to the interface defined by the Feign client
Specific steps
- According to the existing PaymentHystrixService interface of Cloud-consumer-Feign-Hystrix-Order80, Create a new class PaymentFallbackService to implement the PaymentHystrixService interface and uniformly handle exceptions for the methods in the interface
- Yml files turn on Feign support for Hystrix
feign:
hystrix:
enabled: true
Copy the code
- PaymentFallbackService interface
@Component
public class PaymentFallbackService implements PaymentHystrixService{
@Override
public String paymentInfo_OK(Integer id) {
return "---PaymentFallbackService paymentInfo_OK";
}
@Override
public String paymentInfo_Timeout(Integer id) {
return "---PaymentFallbackService paymentInfo_Timeout";
}
}
Copy the code
- test
Eureka7001 start
PaymentHystrixMain8001 start
Normal visit http://localhost/consumer/payment/hystrix/ok/2 test
Deliberately disable microservice 8001
At this point, the server provider is down, but we have degraded the service so that the client will get a message when the server is unavailable instead of hanging up and killing the server
3.7 Service fuse Failure
3.7.1 What is circuit breaker
Circuit breaker mechanism is a kind of micro – service link protection mechanism to deal with avalanche effect. If a microservice on the fan out link is unavailable or the response time is too long, the service is degraded. In this way, the microservice invocation of the node is interrupted and an incorrect response message is quickly returned. In the Framework of Springcloud, the circuit breaker mechanism is implemented by Hystrix. Hystrix will monitor the call status between microservices. When the failed calls reach a certain threshold, the default is 20 failed calls within 5 seconds, the circuit breaker mechanism will be activated. The comment on the circuit breaker mechanism is @hystrixCommand
3.7.2 field
Modify the cloud – the provider – hystrix – payment8001
paymentService
@Service
public class PaymentService {
//== The service is fused
@HystrixCommand(fallbackMethod = "paymentCircuitBreaker_fallback",commandProperties = {
@HystrixProperty(name="circuitBreaker.enabled",value="true"),// Whether to turn on the circuit breaker
@HystrixProperty(name="circuitBreaker.requestVolumeThreshold",value="10"), // Number of requests
@HystrixProperty(name="circuitBreaker.sleepWindowInMilliseconds",value="10000"), // Time window
@HystrixProperty(name="circuitBreaker.errorThresholdPercentage",value="60")}) // What is the failure rate after trip
public String paymentCircuitBreaker(@PathVariable("id") Integer id){
if(id < 0) {
throw new RuntimeException("**** ID cannot be negative");
}
String serialNumber = IdUtil.simpleUUID();
return Thread.currentThread().getName()+"\t"+"Call successful, serial number :"+serialNumber;
}
public String paymentCircuitBreaker_fallback(@PathVariable("id") Integer id){
return "Id cannot be negative, please try again later, id:"+id;
}
}
Copy the code
paymentController
@RestController
@Slf4j
public class PaymentController {
@Resource
private PaymentService paymentService;
@Value("${server.port}")
private String serverPort;
//===== service meltdown
@GetMapping("/payment/circuit/{id}")
public String paymentCircuitBreaker(@PathVariable("id") Integer id){
String result = paymentService.paymentCircuitBreaker(id);
log.info("*****result:"+result);
return result;
}
}
Copy the code
test
The right to visit: http://localhost:8001/payment/circuit/2
Error visit: http://localhost:8001/payment/circuit/-10
Visit the error page several times and slowly correct it, discovering that even the correct access address cannot be accessed
3.7.3 small summary
Fusing type
The current service is no longer invoked in the fusing open request. The internal clock is generally MTTR(average fault handling time). When the fusing open time reaches the set clock, the service enters the semi-fusing state
Fuse closing The fuse closing does not fuse the service
If the request is successful and meets the rules, the current service is considered normal and the fuse is disabled
Under what circumstances does a circuit breaker work?
There are three important parameters involved in the circuit breaker: snapshot time window, total number of requests threshold, and error percentage threshold
1. Snapshot time window: The circuit breaker needs to collect request and error data to determine whether to enable the circuit breaker. The snapshot time window is the latest 10 seconds by default
2. Total number of requests threshold: In the snapshot time window, the total number of requests must meet the threshold to be eligible for a circuit breaker. The default value is 20
3. Error percentage threshold: When the total number of requests exceeds the threshold in the snapshot time window, such as 30 calls, if 15 of these 30 calls have timeout exceptions, that is, errors exceeding 50%, the breaker will be opened by default at the 50% threshold
What are the conditions for opening or closing the circuit breaker?
When a certain threshold is met (more than 20 requests in 10 seconds by default)
When the failure rate reaches a certain level (more than 50% of requests fail within 10 seconds by default)
At the above threshold, the circuit breaker will open
When enabled, all requests are not forwarded
After a period of time (5 seconds by default) the breaker is half-open and will allow one of the requests to be forwarded. If successful, the circuit breaker will close, if not, continue to open.
After the circuit breaker is turned on?
1. When another request is invoked, the main logic will not be called, but degraded fallback will be directly called. Through the circuit breaker, the fault will be found automatically and the degraded logic will be switched to the main logic to reduce the response delay.
2. How to restore the original master logic?
Hystrix gives us automatic recovery
As the circuit breaker opens, to fuse the main logic, hystrix will start a sleep time window, within the time window, relegation logic is temporary into logic, expire when sleep time window, the breaker will enter a state of half open, release a request to the original primary logically, if the request to return to normal, the circuit breaker will continue to be closed, The master logic resumes, and if the request is still in question, the breaker continues to open and the sleep time window restarts.
Copy the code
~
3.8 Traffic Limiting services
More on Alibaba’s Sentinel later
4. Hystrix workflow
Website: https://github.com/Netflix/Hystrix/wiki/How-it-Works
Official website flow chart:
5. Service monitoring hystrixDashboard
5.1 an overview of the
In addition to isolating calls to dependent services, Hystrix also provides quasi-real-time call monitoring (Hystrix Dashboard). Hystrix continuously records the execution information of all requests initiated through Hystrix and presents it to users in the form of statistical reports and graphs. How many requests are executed per second how many successes, how many failures, etc. Netflix monitors these metrics through the Hystrix-metrics-event-stream project. Spring Cloud also provides integration with the Hystrix Dashboard, which translates monitoring content into a visual interface
5.2 Dashboard 9001
New cloud – consumer – hystrix – dashboard9001
Add dependencies to POM files
<dependencies>
<! -- hystrix dashboard-->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-hystrix-dashboard</artifactId>
<version>2.1.5.RELEASE</version>
</dependency>
<! --web-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<! -- devtools-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-devtools</artifactId>
<scope>runtime</scope>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
Copy the code
Write yML files
server:
port: 9001
Copy the code
Startup class + new annotation @enablehystrixDashboard
@SpringBootApplication
@EnableHystrixDashboard
public class HystrixDashboardMain9001 {
public static void main(String[] args) {
SpringApplication.run(HystrixDashboardMain9001.class, args);
}
}
Copy the code
All provider microservice provider classes (8001/8002/8003) need to monitor dependency configurations
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
Copy the code
Start cloud-consumer- Hystrix -dashboard9001 This micro-service monitors micro-service 8001
5.3 circuit breaker demo (service monitoring hystrixDashboard)
5.3.1 modify cloud provider – hystrix – payment8001
Note: The new Hystrix version needs to specify a monitoring path in the main startup class PaymentHystrixMain8001 or report Unable to connect to Command Metric Stream
@SpringBootApplication
@EnableEurekaClient
@EnableCircuitBreaker
public class PaymentHystrixMain8001 {
public static void main(String[] args) {
SpringApplication.run(PaymentHystrixMain8001.class, args);
}
/ * *
* This configuration is configured for service monitoring and has nothing to do with service fault tolerance per se.
Since the default path for SpringBoot is not "/hystrix.stream", you can simply configure the servlets above and below in your own project
* /
@Bean
public ServletRegistrationBean getServlet(a){
HystrixMetricsStreamServlet streamServlet = new HystrixMetricsStreamServlet();
ServletRegistrationBean registrationBean = new ServletRegistrationBean(streamServlet);
registrationBean.setLoadOnStartup(1);
registrationBean.addUrlMappings("/hystrix.stream");
registrationBean.setName("HystrixMetricsStreamServlet");
return registrationBean;
}
}
Copy the code
5.3.2 Monitoring tests
5.3.2.1 start eureka7001
5.3.2.2 Observing the Monitoring window
(1) monitor 9001 8001
- 1: Delay: controls the Delay of polling monitoring information on the server. The default value is 2000 milliseconds. You can configure this parameter to reduce the network and CPU consumption of the client.
- 2: Title: this parameter corresponds to the content after the header Title Hystrix Stream. By default, the URL of the specific monitoring instance is used. You can configure this information to display a more appropriate Title.
(2) Test address
http://localhost:8001/payment/circuit/3
http://localhost:8001/payment/circuit/-10
Visit the correct address first, then visit the wrong address, then the correct address, you will find that the circuit breaker is slowly released monitoring success graph
Monitoring failure result diagram
(3) how to think
7 color
1 ring
Solid circle: Has two meanings. It represents the health of the instance by changing its color, decreasing from green < yellow < orange < red.
In addition to changing color, the size of the solid circle also changes according to the request traffic of the instance, and the larger the traffic, the larger the solid circle. Therefore, through the display of the solid circle, fault instances and high pressure instances can be quickly found in a large number of instances.
1 line
Curve: Used to record the relative change of the flow over a 2-minute period. It can be used to observe the upward and downward trend of the flow.
Copy the code
Figure 1
Figure 2