Under the Spring Cloud microservices architecture, RPC protection can be implemented through Hystrix open source components, and Spring Cloud integrates Hystrix components, making it easy to use.
Feign+Hystrix implements RPC call protection
Hystrix translates to porcupine, and because the porcupine is covered with spines, it protects itself from predators and represents a defense mechanism.
Hystrix is a delay – and fault-tolerant component of Netflix. It is mainly used to protect RPC on the consumer side when the remote Provider service is abnormal.
More information about Hystrix can be found on its official website here
Before using Hystrix, you need to add the following Spring CloudHystrix integration module dependencies to Maven’s POM file:
<! - introducing Spring Cloud Hystrix dependence - > < the dependency > < groupId > org. Springframework. Cloud < / groupId > <artifactId>spring-cloud-starter-netflix-hystrix</artifactId> </dependency>Copy the code
In the Spring Cloud architecture, Hystrix is used in combination with Feign, so you need to enable Feign support for Hystrix in your application’s properties profile:
feign:
hystrix:
enabled: true# Enable Hystrix support for FeignCopy the code
Add @enablehystrix or @Enablecircuitbreaker to the startup class. Notice that @enablecircuitbreaker is included in @enablehystrix. As an example, here is some code for the Demo-Provider startup class:
package com.crazymaker.springcloud.demo.start;
```c
...
/** * Enable Hystrix */ on the boot class
@EnableHystrix
public class DemoCloudApplication
{
public static void main(String[] args)
{ SpringApplication.run(DemoCloudApplication.class, args); . }}Copy the code
Spring Cloud Hystrix’s RPC protection features include failback, fuse, retry, bulkhead isolation, and more. Learn about Hystrix failback and fuse.
Spring Cloud Hystrix rollback failed
What is failure fallback? When the target Provider instance fails, RPC failure fallback takes effect, returning a backup result.
A failed rollback is shown in the figure. There are four Provider instances A, B, C, and D. A-provider and B-Provider initiate RPC remote calls to D-Provider, but D-Provider is faulty. You will eventually get the Fallback result provided by the failed Fallback (or Fallback).How do I set the fallback logic for RPC calls? There are two ways:
- Define and use one
Fallback
Rollback the processing class. - Define and use one
FallbackFactory
Rollback the processing factory class. Let’s start with the first approach: define and use oneFallback
Rollback the processing class.
The concrete implementation of the first method can be divided into two steps: the first step is to implement Feign client remote call interface, write a Fallback class, and write the Fallback logic after RPC failure in the corresponding implementation method of the Fallback class; The second step is to configure the failure handling class on the key annotation @FeignClient of the Feign client interface. Specifically, configure the value of the annotation’s Fallback property to be the Fallback handling class defined in the previous step.
Here is a concrete example of how to define and use a Fallback handler class. In the UAA-Client module of the Crazy-Spring Cloud scaffolding, there is a Feign client remote call interface, UserClient, used for RPC calls to the UAA-Provider to obtain user information.
The first step is to define a simple Fallback implementation class for the UserClient interface as follows:
package com.crazymaker.springcloud.user.info.remote.fallback;
/ / omit the import
/** * Fallback handler class for Feign client interface */
@Component
public class UserClientFallback implements UserClient
{
/** * Rollback method for obtaining user information when RPC fails */
@Override
public RestOut<UserDTO> detail(Long id)
{
return RestOut.error("FailBack: User detail REST service invocation failure"); }}Copy the code
The second step is to configure the Fallback property value to the Fallback processing class UserClientFallback defined in the previous step in the @FeignClient annotation of the UserClient client interface as follows:
package com.crazymaker.springcloud.user.info.remote.client;
/ / omit the import
/** *Feign client interface *@description: RPC interface class to get user information */
@FeignClient(value = "uaa-provider", the configuration = FeignConfiguration. Class, fallback = UserClientFallback. Class, back processing class path = # configuration"/uaa-provider/api/user")
public interface UserClient
{
@RequestMapping(value = "/detail/v1", method = RequestMethod.GET)
RestOut<UserDTO> detail(@RequestParam(value = "userId") Long userId);
}
Copy the code
How do YOU verify that the implementation of the fallback processing class is complete? Still using the previously defined demoprovider REST interface/API/call/uaa/user/detail/v2, the interface by UserClient uaa – the provider for remote calls. The specific demonstration method is as follows:
Stop alluaa-provider
Service, and then indemo-provider
theswagger-ui
The interface accesses its REST interface/api/call/uaa/user/detail/v2
, the interface’s internal code will passUserClient
The remote invocationFeign
Interface pair targetuaa-provider
The REST of the interface/api/user/detail/v1
initiateFeignRPC
Remote call, whileuaa-provider
All services are down, so Feign will triggerHystrix
Perform the rollbackFallback
Rollback the processing classUserClientFallback
The rollback implementation method returnsFallback
The output is shown in the figure below.Next, look at the second approach, which defines and uses a Fallback processing factory class.
The second approach can also be implemented in two steps:
- The first step is to create a Fallback factory class that implements Hystrix’s FallbackFactory interface and implements its abstract CREATE creation method. In the implementation code of this method, we need to return an implementation class for the Feign client interface. The concrete implementation in method is the instance of the fallback processing. A new fallback processing class can be created in the way of anonymous class, and the RPC fallback logic can be written in the implementation code of each method of the anonymous class.
- Step 2 configure the failure processing factory class on the key annotation @FeignClient of the Feign client interface, and configure the value of the fallbackFactory property to the fallbackFactory class defined in the previous step.
Here is a concrete example of how to define and use a FallbackFactory fallback processing factory class.
This section arbitrarily uses the RPC call interface UserClient in the UAA-Client module as an example.
- The first step is to define a simple FallbackFactory factory class as follows:
package com.crazymaker.springcloud.user.info.remote.fallback;
/ / omit the import
/** *Feign client interface rollback factory class */
@Slf4j
@Component
public class UserClientFallbackFactory implements FallbackFactory<UserClient>
{
/** * Create a rollback instance of UserClient */
@Override
public UserClient create(final Throwable cause) {
log.error("RPC is abnormal. Back up!",cause);
/** * Create an anonymous fallback instance of the UserClient client interface */
return new UserClient() {
/** * Method: Rollback method when the RPC fails to obtain user information */
@Override
public RestOut<UserDTO> detail(Long userId)
{
return RestOut.error("FallbackFactory fallback: User detail REST service invocation failed"); }}; }}Copy the code
- The second step is in the Feign client interface
UserClient
the@FeignClient
On the note, willfallbackFactory
Property is configured to the values defined in the previous stepUserClientFallbackFactory
Back up the processing factory class with the following code:
package com.crazymaker.springcloud.user.info.remote.client;
/ / omit the import
/** *Feign client interface *@description: RPC interface class to get user information */
@FeignClient(value = "uaa-provider", the configuration = FeignConfiguration. Class, configure the fallback processing factory class fallbackFactory = UserClientFallbackFactory. Class, # configure rollback processing factory class path ="/uaa-provider/api/user")
public interface UserClient
{
@RequestMapping(value = "/detail/v1", method = RequestMethod.GET)
RestOut<UserDTO> detail(@RequestParam(value = "userId") Long userId);
}
Copy the code
The verification process of the factory class in the second way is the same as that in the first way:
Stopped all uaa – service provider, and then in the demo – provider swagger – UI interface to access the REST interface/API/call/uaa/user/detail/v2
The internal code of the REST interface initiates remote Feign RPC calls to the REST interface/API /user/detail/v1 of the target UAA-Provider through UserClient. However, all services of the UAA-Provider are down
So Feign will trigger the Hystrix back, perform fallback back processing factory class UserClientFallbackFactory create methods to create a fallback processing class instance, and perform the back processing of the back of the class instance processing logic, return back processing results.
What is the difference between a fallback class using the first approach and a fallback factory class using the second approach when doing a failure fallback?
The answer is:
- When using the first fallback class, exceptions raised during remote RPC calls are completely masked by fallback logic.
- It is not easy for the application to intervene, nor can it see the specific exceptions in the RPC process, although these exceptions can be very helpful in troubleshooting the problem.
- When using the second fallback factory class, an application can intercept and handle RPC exceptions through Java code, including logging.
Avalanche of problems for distributed systems
In distributed systems, a service may depend on many other services, and these services will inevitably fail. If an application runs 30 Provider instances, each instance will be in service 99.99% of the time. Even with a failure rate of 0.01%, there will still be several hours of unavailability each month.
In addition, there is a big problem: when the traffic peaks, the service may be dependent on other services.
If this Provider instance responds late, it can lead to multiple levels of association failures with other providers, which can render the distributed system unusable.
To take a simple example, in a seckill system, goods(goodprovider)
Order,(the order - the provider)
, seconds kill(seckill - provider)
All three providers remotely invoke user accounts and authenticate users through RPC(uaa - provider)
To query user information, as shown in the figure.If the UAA-Provider is slow to respond (or even breaks down) during the traffic peak, the three providers, such as the product, order, and SSS, will all wait out of time, resulting in slow response. As more and more requests are queued, the time for a single request becomes long (due to internal timeout wait). Therefore, the system resources (CPU, memory, etc.) of each service node will soon be exhausted, and finally enter the system avalanche state, as shown in the figure.
In general, in the micro-service architecture, services are divided into Provider micro-services. Due to network reasons or its own reasons, services cannot be 100% available. To ensure high availability of service providers, a single Provider service is usually deployed in multiple providers.
Because of the dependency between providers, failures or unavailability can travel up the invocation chain of requests, with catastrophic consequences for the entire system. This is the avalanche effect of failures.
There are many reasons for the avalanche effect. Here are some common ones:
- Hardware faults: For example, the server breaks down, the equipment room is powered off, or optical fibers are cut off.
- Traffic surge: For example, abnormal traffic or sudden influx of massive requests (such as second kill).
- Cache penetration: When the system restarts and all caches fail, or when a large number of cache failures occur in a short period of time, a large number of requests from the front end do not hit the cache and directly hit the back-end services and databases, resulting in overloaded operation of service providers and databases and the overall breakdown.
- Program BUG: For example, program logic BUG leads to memory leak and other reasons caused by the overall paralysis.
- JVM FullGC takes a long time, in extreme cases tens of seconds, during which the JVM cannot provide any services.
In order to solve the avalanche effect, the industry proposed the fuse model. When some non-core services are abnormal, such as slow response or downtime, fuses are used to degrade and damage services are provided to ensure flexible service availability and avoid avalanche effect.
Spring Cloud Hystrix fuse
In physics, the fuse itself is a switch device, used in the circuit to protect the line overload, when there is a short circuit in the line, the fuse can cut off the fault in time, to prevent overload, fever and even fire and other serious consequences.
Fuses in the distributed architecture are mainly used for RPC interfaces. Fuses are installed on the interfaces to prevent system breakdown caused by excessive pressure when RPC interfaces are congested. When RPC interface traffic is heavy or the target Provider is abnormal, fuses are cut off in time to protect the system.
Why are fuses important?
Without overload protection, in distributed systems, when the called remote service is not available, the requested resources will be blocked on the remote server and exhausted.
Most of the time, there may be only a small local fault at the beginning, but due to various reasons, the scope of the fault becomes larger and larger, eventually leading to global consequences.
Fuses are also called fuses. Fuses count the number of recent RPC calls with errors, and then determine whether to allow subsequent RPC calls to continue or roll back quickly based on the failure ratio in the statistics.
The three states of fuse are as follows:
- Closed: The fuse is closed, which is the initial state of the fuse. In this state, RPC calls are allowed normally.
- Open: When the failure ratio reaches a certain threshold, the fuses enter the open state. In this state, the RPC fails quickly and then the failback logic is implemented.
- Half-open: After being opened for a certain period of time (the sleep window ends), the fuse enters the half-open state, and small traffic attempts to allow RPC call. If the attempt succeeds, the fuse turns off and the RPC call is normal. If the attempt fails, the fuse turns on and the RPC call quickly fails.
The following focuses on the half-open state of the fuse. In the semi-open state, an RPC call attempt is allowed. If the call succeeds, the fuse will reset to the closed state and return to the normal mode. But if this RPC call fails, the fuse will return to the on state and wait until the next half-on state.
Fuses in Spring Cloud Hystrix are on by default, but can be customized by configuring the fuses’ parameters. Here is the configuration of the fuse example in the Demo-Provider microservice:
hystrix:
...
command:
default:... CircuitBreaker: # Fuses related configurations enabled:true# Whether to use the fuse. The default value istrue
requestVolumeThreshold: 20# window of time, the minimum number of requests sleepWindowInMilliseconds:5000# Allow a single attempt to sleep time, default set to5Seconds errorThresholdPercentage:50Error ratio of fuses to open during window time50
metrics:
rollingStats:
timeInMilliseconds: 10000# Sliding window timeCopy the code
The parameters of the Hystrix fuse used above can be divided into two categories: fuse parameters and sliding window parameters. The parameters of the fuse used in the example are described as follows:
-
Hystrix.com mand. Default. CircuitBreaker. Enabled: this configuration is used to determine whether the fuse is used to track the RPC request running state, or used to configure whether to enable fuse, the default value is true.
-
Hystrix.com mand. Default. CircuitBreaker. RequestVolumeThreshold:
This configuration is used to set the minimum number of requests for triggering fuses. If set to 20, when 19 requests are received in a sliding window (say 10 seconds), the fuse will not turn on to open even if all 19 requests fail. The default is 20.
-
Hystrix.com mand. Default. CircuitBreaker. ErrorThresholdPercentage: This configuration is used to set the error rate threshold. In the sliding window time, when the error rate exceeds this value, the fuse enters the Open state and all requests trigger fallback. The default error rate threshold percentage is 50.
-
Hystrix.com mand. Default. CircuitBreaker. SleepWindowInMilliseconds: This configuration is used to set the fuse sleep window, specifically determining how long it takes after the fuse is open to allow a request to attempt execution. The default value is 5 000 milliseconds, indicating that after the fuse is open, all requests will be rejected for 5 000 milliseconds and the fuse will be half-open for 5 000 milliseconds.
-
Hystrix.com mand. Default. CircuitBreaker. ForceOpen: if the configuration is true, the fuse will be forced to open, all requests will trigger the Fallback failure (Fallback), the default value is false.
The fuse state transitions are related to Hystrix’s sliding window health statistics, such as the percentage of failures. The Hystrix health statistics configuration used in the example is described as follows:
hystrix.command.default.metrics.rollingStats.timeInMilliseconds
: Sets the duration of the statistics sliding window in milliseconds. The default value is 10 000 milliseconds. The fuses are opened according to the statistics of a sliding window. If the error rate in the sliding window exceeds the threshold, the fuses enter the open state. The sliding window is further subdivided into buckets, and the statistics of the sliding window is equal to the sum of the statistics of all time buckets in the window, and the statistics of each time Bucket includes The Times of Success, Failure, Timeout and Rejection of the request.hystrix.command.default.metrics.rollingStats.numBuckets
: Sets the number of buckets a sliding window is divided into. The default value is 10. If the duration of the sliding window is 10 000 ms and a sliding window is divided into 10 time buckets, the time of a time bucket is 1 second. The value of numBuckets must correspond to timeInMilliseconds%numberBuckets==0, otherwise an exception will be thrown. For example 70000 (sliding window 70000 ms) %700 (buckets) ==0 is fine, but 70000 (sliding window 70000 ms) %600 (buckets) ==400 will throw an exception.
The above configuration options for Hystrix fuses use the hystrix.mand. Default prefix. These default configuration items will apply to all FeignRPC interfaces in the project unless a FeignRPC interface is configured separately. If a Feign RPC call needs special configuration, the configuration item prefix is in the following format:
Hystrix.com mand. Class name # method name (parameter type list)Copy the code
Let’s look at an example of special configuration for a single interface to the UserClient class
The Feign RPC interface /detail/v1 is configured as an example. The function of this interface is to obtain user information from the User-Provider service. Before configuring this interface, check the code of the UserClient interface as follows:
package com.crazymaker.springcloud.user.info.remote.client; . @FeignClient(value ="uaa-provider",
configuration = FeignConfiguration.class,
fallback = UserClientFallback.class,
path = "/uaa-provider/api/user")
public interface UserClient
{
*@param userId userId *@return user details */
@RequestMapping(value = "/detail/v1", method = RequestMethod.GET)
RestOut<UserDTO> detail(@RequestParam(value = "userId") Long userId);
}
Copy the code
In demo-provider, the default prefix hystrix.mand. Default is not used if the fuses of RPC calls to the UserClient.detail interface are specially configured. Instead of using the hystrix.com mand. FeignClient# Method format prefix, specific configuration items as follows:
hystrix:
...
command:
UserClientClass name # method name (parameter type list). CircuitBreaker: # Fuses related configurations enabled:true# Whether to use the fuse. The default value istrue
requestVolumeThreshold: 20# at least20Trigger a request, the fuse will reach the fusing of the number of times the threshold sleepWindowInMilliseconds:5000# Allow a single attempt to sleep time, default set to5Seconds errorThresholdPercentage:50Error ratio of fuses to open during window time50
metrics:
rollingPercentile:
timeInMilliseconds: 60000# sliding window time600# bucketSize # bucketSize200The number of counts in the bucketCopy the code
In addition to the circuitBreaker parameters and the metrics sliding window parameters, many Hystrix Command parameters can be configured specifically for a particular Feign RPC interface, still using the “class name # method name (parameter type list)” format. For starters, the concept and configuration of sliding Windows can be a struggle to understand.
If you have a problem with this post, you can follow me to communicate changes, or communicate with each other in the comments section.