Why do we need Hystrix
In large and medium-sized distributed systems, the system usually has many dependencies, as shown in the following figure:
Under high concurrent access, the stability of these dependencies has a great impact on the system, but dependencies have many uncontrollable problems, such as slow network connection, busy resources, temporarily unavailable, offline services, etc., as shown in the following figure:
When a dependency is blocked, the thread pool on most servers is blocked, affecting the stability of the entire online service, as shown below:
Applications in complex distributed architectures have many dependencies and will inevitably fail at some point. High concurrency dependencies fail without isolation, and the current application service is at risk of being dragged down.
How does Hystrix address dependency isolation
- Hystrix uses the Command pattern HystrixCommand(Command) to wrap the dependent call logic, with each Command executed in a separate thread/under signal authorization.
- You can configure the dependent call timeout period. The timeout period is generally set to slightly higher than 99.5% average time. When the call times out, the fallback logic is returned or executed directly.
- Provide a small thread pool or signal for each dependency, and the call will be rejected immediately if the thread pool is full, with no queuing by default. Speed up the failure determination time.
- Dependent call results: success, failure/throw exception, timeout, thread reject, short circuit. Fallback logic is executed when the request fails (exception, rejection, timeout, short circuit).
- Provides fuse components that can be run automatically or manually called to stop the current dependence for a period of time (10 seconds). The fuse default error rate threshold is 50%, beyond which it will run automatically.
- Provides statistics and monitoring for near real-time dependency.
Hystrix relies on the isolation architecture as shown below:
How to use Hystrix
Use Maven to introduce Hystrix dependencies
< hystrix version > 1.3.16 < / hystrix version > < hystrix - metrics - event - stream. Version > 1.1.2 < / hystrix - metrics - event - stream. Version > < the dependency > <groupId>com.netflix.hystrix</groupId> <artifactId>hystrix-core</artifactId> <version>${hystrix.version}</version>
</dependency>
<dependency>
<groupId>com.netflix.hystrix</groupId>
<artifactId>hystrix-metrics-event-stream</artifactId>
<version>${hystrix-metrics-event-stream.version}</version>
</dependency>Copy the code
Use command mode to encapsulate dependency logic
public class HelloWorldCommand extends HystrixCommand<String> { private final String name; Public HelloWorldCommand (String name) {/ / minimum configuration: specify the command group name (CommandGroup) super (HystrixCommandGroupKey. Factory. AsKey ("ExampleGroup"));
this.name = name;
}
@Override
protected String run() {// The dependency logic is encapsulated in the run() methodreturn "Hello " + name +" thread:"+ Thread.currentThread().getName(); Public static void main(String[] args) throws Exception{// Each Command object can only be called once. HelloWorldCommand HelloWorldCommand = new HelloWorldCommand("sync-hystrix"); Execute (); helloWorldCommand.queue().get(); String result = helloWorldCommand.execute(); System.out.println("result=" + result);
helloWorldCommand = new HelloWorldCommand("async-hystrix"); Future<String> Future = helloWorldCommand.queue(); // The get operation cannot be exceededcommandResult = future.get(100, timeunit.milliseconds); System.out.println("result=" + result);
System.out.println("mainThread="+ Thread.currentThread().getName()); }}Copy the code
Use Fallback() to provide a downgrade strategy
// Override HystrixCommand’s getFallback method to implement the logic
public class HelloWorldCommand extends HystrixCommand<String> {
private final String name;
public HelloWorldCommand(String name) {
super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey(“HelloWorldGroup”))
.andCommandPropertiesDefaults(HystrixCommandProperties.Setter()
.withExecutionIsolationThreadTimeoutInMilliseconds(500)));
this.name = name;
}
@Override
protected String getFallback() {
return “exeucute Falled”;
}
@Override
protected String run() throws Exception {
//sleep 1 second, the call will timeout
TimeUnit.MILLISECONDS.sleep(1000);
return “Hello ” + name +” thread:” + Thread.currentThread().getName();
}
public static void main(String[] args) throws Exception{
HelloWorldCommand command = new HelloWorldCommand(“test-Fallback”);
String result = command.execute();
}
}
NOTE: besides HystrixBadRequestException abnormalities, all from the run () method is the exception thrown all count failure, and triggers the relegation getFallback () and logic circuit breaker.
HystrixBadRequestException used in illegal abnormal parameters or system failure, etc should not trigger the fallback logic.
Dependency name :CommandKey
public HelloWorldCommand(String name) {
super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey(“ExampleGroup”))
/* HystrixCommandKey factory defines the dependency name */
.andCommandKey(HystrixCommandKey.Factory.asKey(“HelloWorld”)));
this.name = name;
}
NOTE: Each CommandKey represents a dependency abstraction, and the same dependency uses the same CommandKey name. The root of dependency isolation is to isolate dependencies of the same CommandKey.
Dependency group :CommandGroup
Command groups are used to group dependent operations, facilitating statistics and summary.
HystrixCommandGroupKey factory public HelloWorldCommand(String name) { Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey("HelloWorldGroup"))}Copy the code
NOTE: CommandGroup is a minimum configured parameter for each command. Without specifying ThreadPoolKey, the literal is used to distinguish between different dependent thread pools/signals.
Thread pool/signal :ThreadPoolKey
public HelloWorldCommand(String name) {
super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey(“ExampleGroup”))
.andCommandKey(HystrixCommandKey.Factory.asKey(“HelloWorld”))
/* Use HystrixThreadPoolKey factory to define the thread pool name */
.andThreadPoolKey(HystrixThreadPoolKey.Factory.asKey(“HelloWorldPool”)));
this.name = name;
}
CommandGroup is used when isolating the same business dependency, but HystrixThreadPoolKey is used when isolating different remote calls to the same dependency, such as redis and HTTP.
HystrixThreadPoolKey can be used for resource isolation when the services are the same group.
SEMAPHORE isolation :SEMAPHORE
Isolating local code or quickly returning remote calls (such as memcached, redis) can be used directly with semaphore isolation, reducing thread isolation overhead.
public class HelloWorldCommand extends HystrixCommand<String> {
private final String name;
public HelloWorldCommand(String name) {
super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey("HelloWorldGroup")) / * configuration signal isolation method, used by default thread pool isolation * /. AndCommandPropertiesDefaults (HystrixCommandProperties. The Setter () .withExecutionIsolationStrategy(HystrixCommandProperties.ExecutionIsolationStrategy.SEMAPHORE))); this.name = name; } @Override protected String run() throws Exception {return "HystrixThread:" + Thread.currentThread().getName();
}
public static void main(String[] args) throws Exception{
HelloWorldCommand command = new HelloWorldCommand("semaphore");
String result = command.execute();
System.out.println(result);
System.out.println("MainThread:"+ Thread.currentThread().getName()); }}Copy the code
Hystrix key components analysis
Hystrix process structure analysis
Process description:
1. Create a new HystrixCommand for each call, encapsulating the dependent calls in the run() method
Execute ()/queue to make synchronous or asynchronous calls
3. Determine whether the circuit-breaker is on. If so, go to Step 8 and perform the downgrade strategy; otherwise, continue the following steps
4. Determine whether the thread pool/queue/semaphore is full. If so, enter degrade Step 8; otherwise, continue the following steps
5. Call HystrixCommand’s run method to run the dependency logic
The dependent logical invocation times out. Go to Step 8
6. Determine whether the logic is successfully invoked
A returns the result of a successful call
B The call fails. Go to Step 8
7. Calculate the status of fuses, and report all the operating status to fuses for statistics to judge the status of fuses
GetFallback () fallback logic
The getFallback call is triggered in four ways:
- The run () method throws the HystrixBadRequestException anomalies
- The run() method call timed out
- The fuse turns on the interception call
- Whether the thread pool/queue/semaphore is full
A Command that does not implement getFallback will throw an exception directly
Fallback returns if the fallback logic call succeeds
The degraded logical call failed to throw an exception
9. The result is displayed
A Circuit Breaker
Circuit Breaker Process architecture and statistics
By default, each fuse maintains 10 buckets, one bucket per second, and each blucket records the status of success, failure, timeout, and rejection. By default, errors exceed 50% and more than 20 requests are intercepted within 10 seconds.
Isolation analysis
Hystrix isolation uses thread/signal isolation to limit the concurrency and blocking spread of dependencies.
(1) Thread isolation
Separate the thread executing the dependent code from the requester thread, which is free to control when it leaves (asynchronous process).
The amount of concurrency can be controlled by the size of the thread pool. When the thread pool is saturated, the service can be denied in advance to prevent the proliferation of dependency problems.
It is recommended not to set the thread pool too large, otherwise a large number of blocked threads may slow down the server.
The thread pool is shown in the diagram below. When n requisition threads concurrently invoke an interface request, one thread is acquired from the Hystrix-managed thread pool and the parameters are passed to this thread to perform the actual call. The size of the thread pool is limited. The default value is 10 threads, which can be specified using the maxConcurrentRequests parameter. If the number of concurrent requests exceeds the number of threads in the pool, some threads need to queue. There must be a request thread that goes through the fallback process.
Thread pool mode can support asynchronous call, support timeout call, support direct fusing, thread switching, large overhead.
(2) Advantages and disadvantages of thread isolation
Advantages of thread isolation:
- Using threads allows for complete isolation of third-party code, and requesting threads can be quickly put back in.
- When a failed dependency becomes available again, the thread pool is cleaned up and made available immediately, rather than a long recovery.
- Can fully simulate asynchronous invocation, convenient asynchronous programming.
Disadvantages of thread isolation:
- The main disadvantage of thread pooling is that it increases CPU, because execution of each command involves queuing (which is avoided by default using SynchronousQueue), scheduling, and context switching.
- Adding complexity to code that relies on thread state, such as ThreadLocal, requires manual passing and cleaning of thread state.
NOTE: Netflix internally believes that thread isolation overhead is small enough to not have a significant cost or performance impact.
Netflix’s internal API relies on 10 billion HystrixCommand requests per day using thread isolation, with approximately 40 + thread pools per application and approximately 5-20 threads per thread pool.
(3) Signal isolation
Signal isolation can also be used to limit concurrent access and prevent blocking from spreading. The main difference with thread isolation is that the thread executing the dependent code is still the requesting thread (which needs to be requested by signal).
If the client is trusted and can return quickly, you can use signal isolation instead of thread isolation to reduce overhead.
The difference between thread isolation and signal isolation is shown below:
When n concurrent requests call a target service interface, one semaphore must be obtained before the target service interface can be called. However, the number of semaphore requests is limited (10 by default). MaxConcurrentRequests can be specified in the following figure. There are threads that need to enter the queue, but the queue has an upper limit, the default is 5. If the queue is full, there must be a request thread to go through the fallback process, so as to achieve the purpose of limiting traffic and preventing avalanche.
In semaphore mode, only the thread itself from the beginning to the end, is synchronous call mode, no timeout call, no direct fuse, because there is no thread switching, the overhead is very small.
(4)
When the service network overhead of the request is high, or the request is time-consuming, it is best to use the thread isolation strategy, so that a large number of container (Tomcat) threads are available, and do not remain blocked or wait for service reasons, and quickly return with failure. When caching these services, we can use semaphore isolation, because the return of these services is usually very fast, does not take too long for the container thread, and also reduces the overhead of thread switching, improving the efficiency of caching services.
- Thread pools: For the vast majority of scenarios, 99%. Call and access to service-dependent network requests, timeout issues like this
- Semaphore: The access suitable for you is not access to external dependencies, but access to some internal complex business logic. However, such access, the code inside the system, actually does not involve any network request, so it is ok to do the ordinary flow limiting of semaphore, because there is no need to catch problems like timeout. The efficiency of algorithm + data structure is not too high, the concurrency is suddenly too high, because here is a little time consuming, resulting in many threads stuck here, it is not good, so a basic resource isolation and access, to avoid internal complex inefficient code, resulting in a large number of threads hang