The Circuit Breaker pattern comes from Martin Fowler’s Circuit Breaker. “Circuit breaker” itself is a switching device, used to protect the circuit overload on the circuit, when there is a short circuit in the line, the “circuit breaker” can be timely cut off the fault circuit, prevent overload, heating, even fire and other serious consequences.
In distributed architecture, the function of circuit breaker mode is similar. When a service unit fails (similar to a short circuit in a utility), the fault monitoring of the circuit breaker (similar to a fuse) directly cuts off the original main logic call. But the circuit breaker in Hystrix has more complicated logic than just cutting off the main logic, and let’s look at its deeper processing logic.
Let’s talk about how the circuit breaker works. When we add simulated latency to the service provider Eureka-client, the service degradation logic on the service consumer side is triggered because hystrix calls depend on the service timeout, but even then, limited by the Hystrix timeout problem, Our calls are still likely to pile up.
That’s when the circuit breaker kicks in. At what point does it kick in? Three important parameters of the circuit breaker are involved: snapshot time window, lower limit of total number of requests, and lower limit of error percentage. This parameter does the following:
Snapshot time window: The circuit breaker needs to collect request and error data to determine whether to enable the circuit breaker. The snapshot time window is the latest 10 seconds by default. Lower limit of total number of requests: In the snapshot time window, you must meet the lower limit of total number of requests to be eligible for fusing. The default value is 20, which means that if the hystrix command is invoked less than 20 times within 10 seconds, the circuit breaker will not open even if all requests time out or fail for other reasons. Error percentage lower limit: When the total number of requests in the snapshot time window exceeds the lower limit (for example, 30 calls), if 16 times out of 30 calls occur timeout exceptions, that is, the error percentage exceeds 50%, the breaker will be turned on under the default 50% lower limit. So what happens when the circuit breaker goes on? Before the circuit breaker is turned on, in the case of the previous example each request is returned to the fallback when Hystrix times out. Each request delay is approximately the Hystrix timeout. If set to 5 seconds, each request is returned 5 seconds later. The fuse is turned on when the fuse finds that the total number of requests exceeds 20 within 10 seconds and the error percentage exceeds 50%. After this function is opened, when a request is invoked again, the main logic will not be called, but the degraded logic will be directly called. In this case, it will not wait 5 seconds before returning to fallback. Through the circuit breaker, the fault can be found automatically and the degraded logic can be switched to the main logic to reduce the response delay.
After the circuit breaker is opened, the processing logic does not end, our degraded logic has been turned into the master logic, so how to restore the original master logic? Hystrix also provides automatic recovery for this problem. As the circuit breaker opens, to fuse the main logic, hystrix will start a sleep time window, within the time window, relegation logic is temporary into logic, expire when sleep time window, the breaker will enter a state of half open, release a request to the original primary logically, if the request to return to normal, the circuit breaker will continue to be closed, The master logic resumes, and if the request is still in question, the breaker continues to open and the sleep time window restarts.
Through the above mechanisms, Hystrix circuit breakers implement automatic switching of failed ports for dependent resources, failover of degraded policies, and automatic recovery of master logic. This makes our micro services very well protected when they depend on external services or resources. At the same time, for some business requirements with degraded logic, automatic switching and recovery can be realized. Compared with the traditional way of switching by monitoring and operation and maintenance, it is more intelligent and efficient.
Minglisoft. Cn/honghu/tech…