Why fault tolerance and current limiting
-
Complex distributed systems often have many dependencies, and if an application is not insulated from dependency failures, the application itself is at risk of being dragged down. On a high-traffic site, delays on a single backend can cause all application resources to run out in seconds (one rotten egg affects a basket).
-
For example, there will be an explosion of network traffic at a certain point in time. If there is no good network traffic restriction and the traffic is allowed to flow to the background service instances, it is likely to cause resource exhaustion, service failure, or even serious application crash.
What are Hystrix
Hystrix enables your system to prevent cascading failures by isolating dependent services in the event of a dependent service failure, and provides a failback mechanism to handle failures more gracefully and recover from exceptions more quickly.
What can Hystrix do
-
Provides system protection and control in the event of high latency or failure of dependent services accessed through third-party clients (usually over the network)
-
Prevent cascading failures in distributed systems
-
Fail fast and recover quickly
-
Provide Fallback and elegant service degradation mechanism
-
Provide approximate real-time monitoring, alarm and operation and maintenance control means
Hystrix design principles
-
Prevents a single dependency from exhausting all user threads in a container such as Tomcat
-
Reduce system load and fail fast instead of queuing requests that cannot be processed in a timely manner
-
Provide failure fallbacks to make failures transparent to users when necessary
-
Use isolation mechanisms (such as “bulkhead”/” swimlane “mode, fuse mode, etc.) to reduce the impact of service dependency on the overall system
-
Measurement, monitoring and alerting of system services are optimized to meet the requirements of approximate real-time
-
The vast majority of Hystrix needs to dynamically adjust the configuration and quickly deploy to all applications, providing optimization to meet fast recovery requirements
-
Protects applications from failures in the entire execution of dependent services, not just network requests
Hystrix design idea source
Bulkhead isolation mode
In order to prevent the spread of water leakage and fire, cargo ships are divided into multiple warehouses. In the event of a disaster, isolating the warehouses can reduce the risk of the entire ship.
Breaker mode
Fuses are like fuses in your home that trip when the current gets too much, but the Hystrix fuse is more sophisticated.
The switch status of the fuse switch from off to on is determined by comparing the current service health with the set threshold.
-
When the fuse switch is off, the request is allowed through the fuse. If the current health status is higher than the set threshold, the switch remains off. If the current health status is below the set threshold, the switch switches to on.
-
When the fuse switch is on, the request is blocked.
-
When the fuse switch is on, after a period of time, the fuse automatically enters the half-open state, and only one request is allowed to pass. When this request is invoked successfully, the fuse reverts to the closed state. If this request fails, the fuse remains open and subsequent requests are blocked.
Hystrix workflow
Website artwork
The Chinese version of
The process that
-
Each call creates a new HystrixCommand, encapsulating the dependent calls in the run() method.
-
Execute ()/queue to make synchronous or asynchronous calls.
-
Check whether the current call is cached. If yes, the result is returned. Otherwise, go to Step 4
-
Check whether the circuit-breaker is on. If it is, go to Step 8 and perform the degrade policy. If it is, go to Step 5
-
Check whether the thread pool/queue/semaphore is full. If the thread pool/queue/semaphore is full, go to Step 8. Otherwise, continue to step 6
-
Call the run method of HystrixCommand. Run dependency logic
-
6.1. Is the invocation abnormal? If no, go to Step 8.
-
6.2. Whether the call times out? If no, the call result is returned. Yes, go to Step 8
Collect all the running status (success, failure, rejection, timeout) in steps 5 and 6 and report it to fuses for statistics and judging fuses status
GetFallback () fallback logic. Four cases of triggering a getFallback call (from the arrow in Step 8 in the figure) : returns the result of successful execution
Two resource isolation modes
Thread pool isolation mode
A thread pool is used to store the current request, the thread pool processes the request, sets the task return processing timeout, and the accumulated request is stacked to the thread pool queue. In this way, thread pools need to be applied for each dependent service, which has certain resource consumption. The advantage is that it can cope with sudden traffic (when the traffic peak comes, data can be stored to the thread pool team for slow processing).
Semaphore isolation mode
An atomic counter (or semaphore) is used to record how many threads are currently running. The request determines the value of the counter first. If it exceeds the set maximum number of threads, new requests of the change type are discarded. This mode is a strict thread control and immediate return mode, which cannot cope with sudden traffic (when the number of threads being processed exceeds the number of traffic peaks, other requests will be directly returned without continuing to request dependent services).
Thread pool isolation vs. semaphore isolation
Hystrix major configuration items
Quick learning
pom.xml
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId></dependency><dependency> <groupId>com.netflix.hystrix</groupId> < artifactId > hystrix - core < / artifactId > < version > 1.5.12 < / version > < / dependency > < the dependency > <groupId>com.netflix.hystrix</groupId> <artifactId>hystrix-metrics-event-stream</artifactId> < version > 1.5.12 < / version > < / dependency > < the dependency > < groupId > com.net flix. Hystrix < / groupId > < artifactId > hystrix - javanica < / artifactId > < version > 1.5.12 < / version > < / dependency >
Copy the code
HystrixConfig
Configurationpublic class HystrixConfig {/** * specifies a HystrixCommandAspect proxy class. */ @bean public HystrixCommandAspect HystrixCommandAspect() {return new HystrixCommandAspect(); }}
Copy the code
HelloService
@Servicepublic class HelloService { @HystrixCommand(fallbackMethod = "helloError", commandProperties = { @HystrixProperty(name = "execution.isolation.strategy", value = "THREAD"), @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "1000"), @HystrixProperty(name = "circuitBreaker.enabled", value = "true"), @HystrixProperty(name = "circuitBreaker.requestVolumeThreshold", value = "2")}, threadPoolProperties = { @HystrixProperty(name = "coreSize", value = "5"), @HystrixProperty(name = "maximumSize", value = "5"), @HystrixProperty(name = "maxQueueSize", value = "10") }) public String sayHello(String name) { try { Thread.sleep( 15000 ); return "Hello " + name + " !" ; } catch (InterruptedException e) { e.printStackTrace(); } return null; } public String helloError(String name) {return "Server busy, please access later ~"; }}
Copy the code
Start the class
@SpringBootApplication@RestControllerpublic class HystrixSimpleApplication { @Autowired private HelloService helloService; public static void main(String[] args) { SpringApplication.run( HystrixSimpleApplication.class, args ); } @GetMapping("/hi") public String hi(String name) { return helloService.sayHello( name ); }}
Copy the code
test
Go to http://localhost:80809/hi? name=zhangsan
curl -X GET -d 'name=zhangsan' http://localhost:8080/hi
Copy the code
return
The server is busy, please visit ~ later
Copy the code
The source code
https://github.com/gf-huanchupk/SpringCloudLearning/tree/master/chapter16
reference
https://github.com/Netflix/Hystrix/wiki
https://blog.51cto.com/snowtiger/2057092
— END —
Each”good-looking“, are the biggest affirmation to me!