Circuit breakers and downgrades
When the downstream service (seasoning, deboning) suddenly becomes unavailable or responds too slowly due to some reason (it takes 3 minutes to buy vegetables and waits for half an hour), the upstream service does not continue to call the target service in order to ensure the availability of its overall service (it cannot wait), and directly returns to release resources quickly. Resume the call if the target service improves. This is called a service circuit breaker.
Due to the long queue time, Little Eyes decisively gave up the follow-up process and provided “lower quality” dishes. This is called service degradation.
There are several ways to make a circuit breaker
There are many ways to degrade a service, such as limiting current, switching, and fusing, which is one of the types of downgrades.
Fuse. Hystrix fuse downgrade library is available in Spring Cloud. Sentinel of Alibaba open source can also be used to achieve fuse downgrade in distributed projects. Both Hystrix and Sentinel require the introduction of third-party components to understand the implementation and are not suitable for simple scenarios.
Use of handwritten fuses
// Initialize a fuse
private CircuitBreaker breaker = new CircuitBreaker(0.1.10.true."serviceDemo");
public void doSomething(a) {
// Check the service status on each invocation
breaker.checkStatus();
// If the fuse returns true that the service is available, continue with the logic
if (breaker.isWorked()) {
try {
service.doSomething();
} catch (Exception e) {
e.printStackTrace();
// The number of invocation failures is recorded
breaker.addFailTimes();
} finally {
// For each call, increase the number of callsbreaker.addInvokeTimes(); }}// The service is unavailable
}Copy the code
In this pseudo-code, the fuse does three things:
-
Check the service status and output statistics logs
-
Return service state breaker. IsWorked ()
-
Record the number of calls and failures as the basis for fusing
Realization of fuse
public class CircuitBreaker {
/** * Record the number of failures */
private AtomicLong failTimes =
new AtomicLong(0);
/** * records the number of calls */
private AtomicLong invokeTimes =
new AtomicLong(0);
/** * Degradation threshold, such as 0.1 * ratio of failed requests to total requests */
private double failedRate = 0.1;
/** * The threshold judgment is performed only when the total number of requests is greater than this value * for example, if set to 10, the threshold judgment is performed only when the number of requests is greater than 10 */
private double minTimes;
/** * Fuse switch, default off */
private boolean enabled;
/** * Whether to send an email alarm */
private boolean mail;
/** * Whether to send an SMS alarm after the fuse is disconnected */
private boolean sms;
/** * Fuse name */
private String name;
/** * Saves the timestamp of the last count in minutes */
private AtomicLong currentTime =
new AtomicLong(
System.currentTimeMillis() / 60000);
/** * Record whether the service is unavailable */
private AtomicBoolean isFailed =
new AtomicBoolean(false);
/** * The state of the service down is placed in the thread container */
private ThreadLocal<Boolean> fail =
new ThreadLocal<Boolean>();
private Logger log =
LoggerFactory.getLogger(getClass());
/** * construct fuse **@paramFailedRate Fuse threshold, * Number of failed requests/Total number of requests *@paramMinTimes Specifies the minimum condition for fusing. * When the total number of requests exceeds this threshold, the system determines the number of requests and performs degradation. *@paramEnabled Whether to enable the fusing operation */
public CircuitBreaker(double failedRate,
double minTimes,
boolean enabled,
String name) {
fail.set(false);
this.failedRate = failedRate;
this.minTimes = minTimes;
this.enabled = enabled;
this.name = name;
}
/** * Check whether the service is in failed state **@return* /
public boolean isFailed(a) {
return isFailed.get();
}
/** * increase the number of errors */
public void addFailTimes(a) {
fail.set(true);
if(enabled) { failTimes.incrementAndGet(); }}/** * Increases the number of calls */
public void addInvokeTimes(a) {
if(enabled) { invokeTimes.incrementAndGet(); }}/** * Check whether the service is available **@return* /
public boolean isWorked(a) {
if(! enabled) {return true;
}
// Sacrifice 1% of traffic for probe requests when the service is unavailable
if (isFailed.get() &&
System.currentTimeMillis() % 100= =0) {
return true;
}
if (isFailed.get()) {
fail.set(true);
return false;
}
return true;
}
public void checkStatus(a) {
if(! enabled) {return;
}
long newTime =
System.currentTimeMillis() / 60000;
if ((newTime > currentTime.get())
&& (invokeTimes.get() > minTimes)) {
double percent =
failTimes.get() * 1.0 /
invokeTimes.get();
if (percent > failedRate) {
if (isFailed.get()) {
// Log output
if (mail) {
// Send an email notification}}else {
// Log output
isFailed.set(true);
if (sms) {
// Send SMS notification
}
if (mail) {
// Send an email notification}}}else { // The service is restored
if (isFailed.get()) {
// Log output
if (sms) {
// Send SMS notification
}
if (mail) {
// Send an email notification
}
}
isFailed.set(false);
}
if (log.isInfoEnabled()) {
// Log output
}
currentTime.set(newTime);
failTimes.set(0);
invokeTimes.set(0); }}}Copy the code
General idea:
-
If the proportion of error requests exceeds the threshold, the fault is fused
-
The statistical period is within the minute level (the statistics generated within 1 minute reach the threshold).
-
If the total number of requests does not reach minTimes within a minute, no fusing is performed (request frequency is too low, statistical information is meaningless)
-
Even when the circuit breaker condition is reached, 1% (modifiable) of requests are still sacrificed for probing
isFailed.get()&&System.currentTimeMillis() % 100 == 0
The advantages and disadvantages
Hystrix provides a range of service protection features such as service circuit breaker and thread isolation. Our hand-written fuses can only provide caller-based manual fuses.
Hystrix provides both thread pools and semaphores. The function of handwritten fuses is relatively single, based on statistical information only, and the granularity of minute dimension is relatively rough.
Hystrix commands programming and registers callbacks for high code complexity. Handwriting fuse in the process of code intrusion, process oriented, low understanding cost.
It took less than 100 lines of code to implement the fusing feature after removing comments and invalid blank lines. Although there are many defects when applied to large-scale service scenarios, I hope it can at least provide an idea for everyone.
Pay attention to my
Welcome to pay attention, at any time to communicate with me ~