I remember when I was a child, I once took two wires and plugged them into a socket (don’t ask why, I’m searching for truth…). “So I still remember the feeling of numbness and nausea all over my body, but fortunately it didn’t last long. Two seconds later it stopped, and then the power went out in my house — right: the air switch worked, and MY life stayed.
At that time, I was obsessed with studying the principle of my savior. It turned out that we used a fuse at the beginning. Once the current was too high, the fuse would rise in temperature and blow. Just each time such a fuse waiting for people to replace the efficiency is too low, so the birth of the air switch, the principle is the electromagnet, the current is too large will cut off the power supply, but then just need to close the switch again, there is no need to replace the fuse every time like before.
So, which brings us to today’s topic, circuit breakers. Obviously, not the air switch at home, but the principle is the same: they protect back-end services by cutting connections, preventing accidents from spreading, and saving services (lives).
The principle of
In fact, the principle of circuit breakers in the software industry is basically the same as that in the electrical industry, but it can be automatically restored, of course, but also human intervention.
The current intensity of the former versus the error rate of the latter, which usually also sets a timeout period as a relatively major type of error. The former will automatically cut off the circuit when the current intensity exceeds a certain threshold, while the latter can also cut off the data connection when the error rate reaches a certain threshold: an error is returned directly.
The latter, as an online service provider, cannot passively wait for someone to turn on the switch. Instead, it will actively detect whether the service is restored. If not, it will continue to wait, and once restored, it will turn on the switch and resume external services, as shown in the following figure (from [1]).
As you can see from this diagram, the circuit breaker has three states: closed, open, and semi-open.
It is closed at first, and becomes open once it detects that an error reaches a certain threshold.
A reset timeout is displayed, indicating that the system is ready to resume.
Try to release some requests to the back end, once the detection is successful, it will return to the closed state, that is, restore the service;
Do not underestimate this very simple principle. It is in line with the architectural design principle Fail Fast, rather than failing slowly and letting operation and maintenance personnel know the situation when the failure reaches a certain level. In fact, it is the principle we often say: do not hide problems, but expose them as soon as possible.
Specific examples cannot be shown, but can be described simply: Before the circuit breaker was used, if the pressure of the database on which one of our functions depended increased, it would affect all the other services. The most serious one caused the response time of the whole station to rise up to 15s, and it took 4 or 5 hours to fully recover, which was a roller coaster in our monitoring. The reason for this was that the database was under too much pressure, and the front-end users kept retrying after the timeout, which further increased the database pressure.
But after adding the circuit breaker to that function, so far there has only been one accident, and 10 minutes of automatic restoration, almost no awareness of the front-end user.
Circuit breakers in Go Kit
Currently, Go Kit offers three options:
hystrix-go
go breaker
handdy breaker
After trying, Hystrix-Go is the most suitable at present. One is that the functions it provides are most consistent with the design in [1]. The other is that hystrix, the original tool, was originally provided in Netflix OSS and has been verified by the Java community for a long time.
It is also easy to use, and can be used directly from the Endpoint layer if configured in the Go Kit:
hystrix.ConfigureCommand(name, hystrix.CommandConfig{RequestVolumeThreshold:cb.RequestVolumeThreshold,ErrorPercentThreshold:cb.ErrorPercentThreshold,Ma xConcurrentRequests:cb.MaxConcurrentRequests,SleepWindow:cb.SleepWindow,Timeout:cb.Timeout,})endpoint = circuitbreaker.Hystrix(name)(endpoint)
The parameters here need to be explained
Peak number of requests per second x 99 percentile request response time
For details, please refer to [3].
Circuit breakers in Node.js
Two suitable ones have been found:
Github.com/awolden/bra…
Bitbucket.org/igor_sechyn…
After trial, the latter is better designed, simple and easy to use, and has more functions:
Slave Circuits can be used to create multiple shared state circuit breakers. This is especially useful when calling external interfaces, because when the external host is down, basically all the interfaces are unavailable.
Health check, which is actually better than semi-open, does not require release to test whether the back-end service is healthy;
It is also very simple to use, but it is recommended to use it directly encapsulated in the client external call logic, the upper-layer call does not need to know the existence of a circuit breaker.
Recommend an exchange learning group: 685167672, which will share some senior architects recorded video: Spring, MyBatis, Netty source analysis, high concurrency, high performance, distributed, microservice architecture principle, JVM performance optimization these become architects necessary knowledge system. You can also receive free learning resources, which have benefited a lot at present: