Why fault tolerance and current limiting

  • Complex distributed systems often have many dependencies, and if an application is not insulated from dependency failures, the application itself is at risk of being dragged down. On a high-traffic site, delays on a single backend can cause all application resources to run out in seconds (one rotten egg affects a basket).
  • For example, there will be an explosion of network traffic at a certain point in time. If there is no good network traffic restriction and the traffic is allowed to flow to the background service instances, it is likely to cause resource exhaustion, service failure, or even serious application crash.

What are Hystrix

Hystrix enables your system to prevent cascading failures by isolating dependent services in the event of a dependent service failure, and provides a failback mechanism to handle failures more gracefully and recover from exceptions more quickly.

What can Hystrix do

  • Provides system protection and control in the event of high latency or failure of dependent services accessed through third-party clients (usually over the network)
  • Prevent cascading failures in distributed systems
  • Fail fast and recover quickly
  • Provide Fallback and elegant service degradation mechanism
  • Provide approximate real-time monitoring, alarm and operation and maintenance control means

Hystrix design principles

  • Prevents a single dependency from exhausting all user threads in a container such as Tomcat
  • Reduce system load and fail fast instead of queuing requests that cannot be processed in a timely manner
  • Provide failure fallbacks to make failures transparent to users when necessary
  • Use isolation mechanisms (such as “bulkhead”/” swimlane “mode, fuse mode, etc.) to reduce the impact of service dependency on the overall system
  • Measurement, monitoring and alerting of system services are optimized to meet the requirements of approximate real-time
  • The vast majority of Hystrix needs to dynamically adjust the configuration and quickly deploy to all applications, providing optimization to meet fast recovery requirements
  • Protects applications from failures in the entire execution of dependent services, not just network requests

Hystrix design idea source

Bulkhead isolation mode

In order to prevent the spread of water leakage and fire, cargo ships are divided into multiple warehouses. In the event of a disaster, isolating the warehouses can reduce the risk of the entire ship.

Breaker mode

Fuses are like fuses in your home that trip when the current gets too much, but the Hystrix fuse is more sophisticated.

The switch status of the fuse switch from off to on is determined by comparing the current service health with the set threshold.

  • When the fuse switch is off, the request is allowed through the fuse. If the current health status is higher than the set threshold, the switch remains off. If the current health status is below the set threshold, the switch switches to on.
  • When the fuse switch is on, the request is blocked.
  • When the fuse switch is on, after a period of time, the fuse automatically enters the half-open state, and only one request is allowed to pass. When this request is invoked successfully, the fuse reverts to the closed state. If this request fails, the fuse remains open and subsequent requests are blocked.

Hystrix workflow

Website artwork

The Chinese version of

The process that

  1. Each call creates a new HystrixCommand, encapsulating the dependent calls in the run() method.
  2. Execute ()/queue to make synchronous or asynchronous calls.
  3. Check whether the current call is cached. If yes, the result is returned. Otherwise, go to Step 4
  4. Check whether the circuit-breaker is on. If it is, go to Step 8 and perform the degrade policy. If it is, go to Step 5
  5. Check whether the thread pool/queue/semaphore is full. If the thread pool/queue/semaphore is full, go to Step 8. Otherwise, continue to step 6
  6. Call the run method of HystrixCommand. Run dependency logic
  • 6.1. Is the invocation abnormal? If no, go to Step 8.
  • 6.2. Whether the call times out? If no, the call result is returned. Yes, go to Step 8
  1. Collect all the running status (success, failure, rejection, timeout) in steps 5 and 6 and report it to fuses for statistics and judging fuses status
  2. GetFallback () fallback logic. Four cases of triggering a getFallback call (from the arrow in Step 8 in the figure) : returns the result of successful execution

Two resource isolation modes

Thread pool isolation mode

A thread pool is used to store the current request, the thread pool processes the request, sets the task return processing timeout, and the accumulated request is stacked to the thread pool queue. In this way, thread pools need to be applied for each dependent service, which has certain resource consumption. The advantage is that it can cope with sudden traffic (when the traffic peak comes, data can be stored to the thread pool team for slow processing).

Semaphore isolation mode

An atomic counter (or semaphore) is used to record how many threads are currently running. The request determines the value of the counter first. If it exceeds the set maximum number of threads, new requests of the change type are discarded. This mode is a strict thread control and immediate return mode, which cannot cope with sudden traffic (when the number of threads being processed exceeds the number of traffic peaks, other requests will be directly returned without continuing to request dependent services).

Thread pool isolation vs. semaphore isolation

Hystrix major configuration items

Quick learning

pom.xml

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-core</artifactId>
    <version>1.5.12</version>
</dependency>
<dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-metrics-event-stream</artifactId>
    <version>1.5.12</version>
</dependency>
<dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-javanica</artifactId>
    <version>1.5.12</version>
</dependency>
Copy the code

HystrixConfig

@Configuration
public class HystrixConfig {

    /** * Declare a HystrixCommandAspect proxy class that intercepts the HystrixCommand functionality */
    @Bean
    public HystrixCommandAspect hystrixCommandAspect(a) {
        return newHystrixCommandAspect(); }}Copy the code

HelloService

@Service
public class HelloService {

    @HystrixCommand(fallbackMethod = "helloError",
            commandProperties = {
                    @HystrixProperty(name = "execution.isolation.strategy", value = "THREAD"),
                    @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "1000"),
                    @HystrixProperty(name = "circuitBreaker.enabled", value = "true"),
                    @HystrixProperty(name = "circuitBreaker.requestVolumeThreshold", value = "2")},
            threadPoolProperties = {
                    @HystrixProperty(name = "coreSize", value = "5"),
                    @HystrixProperty(name = "maximumSize", value = "5"),
                    @HystrixProperty(name = "maxQueueSize", value = "10")})public String sayHello(String name) {
        try {
            Thread.sleep( 15000 );
            return "Hello " + name + "!";
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return null;
    }

    public String helloError(String name) {
        return "Server busy, please visit later ~"; }}Copy the code

Start the class

@SpringBootApplication
@RestController
public class HystrixSimpleApplication {

    @Autowired
    private HelloService helloService;

    public static void main(String[] args) {
        SpringApplication.run( HystrixSimpleApplication.class, args );
    }

    @GetMapping("/hi")
    public String hi(String name) {
        returnhelloService.sayHello( name ); }}Copy the code

test

Go to http://localhost:80809/hi? name=zhangsan

curl -X GET -d 'name=zhangsan' http://localhost:8080/hi
Copy the code

return

The server is busy, please visit ~ laterCopy the code

The source code

Github.com/gf-huanchup…

reference

Github.com/Netflix/Hys…

Blog.51cto.com/snowtiger/2…

Please scan the code or search the wechat public number “Programmer Guoguo” to follow me, pay attention to surprise ~