Public account: Java Architects Association, updated daily technical good articles

Feign+Hystrix implements RPC call protection

Under the Spring Cloud microservices architecture, RPC protection can be implemented through Hystrix open source components, and Spring Cloud integrates Hystrix components, making it easy to use.

Hystrix translates to porcupine, and because the porcupine is covered with spines, it protects itself from predators and represents a defense mechanism. Hystrix is a delay – and fault-tolerant component of Netflix. It is mainly used to protect RPC on the consumer side when the remote Provider service is abnormal. More information about Hystrix can be found on its official website (github.com/Netflix/Hys…

Before using Hystrix, you need to add the following Spring CloudHystrix integration module dependencies to Maven’s POM file:

<! - introducing Spring Cloud Hystrix dependence - > < the dependency > < groupId > org. Springframework. Cloud < / groupId > <artifactId>spring-cloud-starter-netflix-hystrix</artifactId> </dependency>Copy the code

In the Spring Cloud architecture, Hystrix is used in combination with Feign, so you need to enable Feign support for Hystrix in your application’s properties profile:

Feign: hystrix: enabled: true # Enable Hystrix support for FeignCopy the code

Add @enablehystrix or @Enablecircuitbreaker to the startup class. Notice that @enablecircuitbreaker is included in @enablehystrix. As an example, here is some code for the Demo-Provider startup class:

package com.crazymaker.springcloud.demo.start; . @enablehystrix public class DemoCloudApplication {public static void main(String[] args) {public static void main(String[] args) { SpringApplication.run(DemoCloudApplication.class, args); . }}Copy the code

Spring Cloud Hystrix’s RPC protection features include failback, fuse, retry, bulkhead isolation, and more. Learn about Hystrix failback and fuse.

Spring Cloud Hystrix rollback failed

What is failure fallback? When the target Provider instance fails, RPC failure fallback takes effect, returning a backup result. As shown in Figure 2-16, there are four Provider instances A, B, C, and D. A-provider and B-Provider initiate RPC remote calls to D-Provider. However, D-Provider is faulty. You will eventually get the Fallback result provided by the failed Fallback (or Fallback).

Figure 2-16 Rolling back RPC remote call failure

How do I set the fallback logic for RPC calls? There are two ways:

(1) Define and use a Fallback handler class.

(2) define and use a FallbackFactory FallbackFactory class. Let’s start with the first approach: define and use a Fallback handler class.

The concrete implementation of the first method can be divided into two steps: the first step is to implement Feign client remote call interface, write a Fallback class, and write the Fallback logic after RPC failure in the corresponding implementation method of the Fallback class; The second step is to configure the failure handling class on the key annotation @FeignClient of the Feign client interface. Specifically, configure the value of the annotation’s Fallback property to be the Fallback handling class defined in the previous step.

Here is a concrete example of how to define and use a Fallback handler class. In the UAA-Client module of the Crazy-Spring Cloud scaffolding, there is a Feign client remote call interface, UserClient, used for RPC calls to the UAA-Provider to obtain user information.

The first step is to define a simple Fallback implementation class for the UserClient interface as follows:

package com.crazymaker.springcloud.user.info.remote.fallback; */ @Component public class UserClientFallback implements UserClient {/** */ @override public RestOut<UserDTO> detail(Long ID) {return restout. error("failBack: User detail REST service invocation failed "); }}Copy the code

The second step is to configure the Fallback property value to the Fallback processing class UserClientFallback defined in the previous step in the @FeignClient annotation of the UserClient client interface as follows:

package com.crazymaker.springcloud.user.info.remote.client; // omit import /** *Feign client interface *@description: RPC to the information of the user interface classes * / @ FeignClient (value = "uaa - provider", the configuration = FeignConfiguration. Class, fallback = UserClientFallback.class, Path = "/ uaA-provider/API /user") public interface UserClient {@requestMapping (value = "/detail/v1", method = RequestMethod.GET) RestOut<UserDTO> detail(@RequestParam(value = "userId") Long userId); }Copy the code

How do YOU verify that the implementation of the fallback processing class is complete? Still using the previously defined demoprovider REST interface/API/call/uaa/user/detail/v2, the interface by UserClient uaa – the provider for remote calls. The specific demonstration method is as follows:

Stopped all uaa – service provider, and then in the demo – provider swagger – UI interface to access the REST interface/API/call/uaa/user/detail/v2, The internal code of this interface initiates a remote FeignRPC call to the REST interface/API /user/detail/v1 of the target UAA-Provider through UserClient. However, all services of the UAA-Provider are down. Therefore, Feign triggers the Hystrix rollback, executes the Fallback implementation method of the Fallback processing class UserClientFallback, and returns the Fallback processing content, as shown in Figure 2-17.

Figure 2-17 Schematic diagram of UserClientFallback after the rollback class takes effect

Next, look at the second approach, which defines and uses a Fallback processing factory class.

The second approach can also be implemented in two steps: The first step is to create a Fallback factory class that implements Hystrix’s FallbackFactory interface and implements its abstract CREATE creation method. In the implementation code of this method, we need to return an implementation class for the Feign client interface. The concrete implementation in method is the instance of the fallback processing. A new fallback processing class can be created in the way of anonymous class, and the RPC fallback logic can be written in the implementation code of each method of the anonymous class. Step 2 configure the failure processing factory class on the key annotation @FeignClient of the Feign client interface, and configure the value of the fallbackFactory property to the fallbackFactory class defined in the previous step.

Here is a concrete example of how to define and use a FallbackFactory fallback processing factory class. This section arbitrarily uses the RPC call interface UserClient in the UAA-Client module as an example. The first step is to define a simple FallbackFactory factory class as follows:

package com.crazymaker.springcloud.user.info.remote.fallback; / / to omit the import / * * * Feign client interface back processing factory class * / @ Slf4j @ Component public class UserClientFallbackFactory implements FallbackFactory<UserClient> {/** * public UserClient create(final Throwable cause) { Log. error("RPC error, rollback!") ,cause); Return new UserClient() {/** * method: Rollback method for obtaining user information after RPC failure */ @override public RestOut<UserDTO> detail(Long userId) {return restout. error("FallbackFactory Fallback: User detail REST service invocation failed "); }}; }}Copy the code

In the second step is to Feign client interface UserClient @ FeignClient annotations, defines fallbackFactory attribute value configured for the step of UserClientFallbackFactory back processing factory, the code is as follows:

package com.crazymaker.springcloud.user.info.remote.client; // omit import /** *Feign client interface *@description: RPC to the information of the user interface classes * / @ FeignClient (value = "uaa - provider", the configuration = FeignConfiguration. Class, Configuration back processing factory class fallbackFactory = UserClientFallbackFactory. Class, Path = "/ uAA-provider/API /user") public interface UserClient {@requestMapping (value = "/detail/v1", method = RequestMethod.GET) RestOut<UserDTO> detail(@RequestParam(value = "userId") Long userId); }Copy the code

The verification process of the factory class in the second way is the same as that in the first way:

Stopped all uaa – service provider, and then in the demo – provider swagger – UI interface to access the REST interface/API/call/uaa/user/detail/v2, The internal code of the REST interface initiates Feign RPC remote calls to the REST interface/API /user/detail/v1 of the target UAA-Provider through UserClient. However, all services of the UAA-Provider are down. So Feign will trigger the Hystrix back, perform fallback back processing factory class UserClientFallbackFactory create methods to create a fallback processing class instance, and perform the back processing of the back of the class instance processing logic, return back processing results.

What is the difference between a fallback class using the first approach and a fallback factory class using the second approach when doing a failure fallback?

The answer is: When using the first fallback class, exceptions raised during remote RPC calls are completely masked by fallback logic. It is not easy for the application to intervene, nor can it see the specific exceptions in the RPC process, although these exceptions can be very helpful in troubleshooting the problem. When using the second fallback factory class, an application can intercept and handle RPC exceptions through Java code, including logging.

Avalanche of problems for distributed systems

In distributed systems, a service may depend on many other services, and these services will inevitably fail. If an application runs 30 Provider instances, each instance will be in service 99.99% of the time. Even with a failure rate of 0.01%, there will still be several hours of unavailability each month. In addition, there is a big problem: when the traffic peaks, the service may be dependent on other services. If this Provider instance responds late, it can lead to multiple levels of association failures with other providers, which can render the distributed system unusable.

To take a simple example, in a seckill system, Goodprovider, Order-Provider, and Seckill-Provider are all remotely invoked to the interfaces of user accounts and authentication (UAA-Provider) through RPC to query user information. See Figure 2-18.

Figure 2-18 Dependency diagram of four Providers: Product, order, SSS, and user

If the UAA-Provider is slow to respond (or even breaks down) during the traffic peak, the three providers, such as the product, order, and SSS, will all wait out of time, resulting in slow response. As more and more requests are queued, the time for a single request becomes long (due to internal timeout wait). As a result, system resources (such as cpus and memory) on each service node quickly run out, and the system avalanches, as shown in Figure 2-19.

Figure 2-19 Avalanche caused by slow response of the UAA-Provider during peak traffic

In general, in the micro-service architecture, services are divided into Provider micro-services. Due to network reasons or its own reasons, services cannot be 100% available. To ensure high availability of service providers, a single Provider service is usually deployed in multiple providers. Because of the dependency between providers, failures or unavailability can travel up the invocation chain of requests, with catastrophic consequences for the entire system. This is the avalanche effect of failures.

There are many reasons for the avalanche effect. Here are some common ones:

(1) Hardware faults: For example, the server breaks down, the equipment room is powered off, and the optical fiber is cut.

(2) Traffic surge: such as abnormal flow, instantaneous influx of huge requests (such as second kill), etc.

(3) Cache penetration: generally occurs when the system restarts and all the caches fail, or when a large number of cache failures occur in a short period of time, a large number of requests from the front end do not hit the cache, directly hitting the back-end service and database, resulting in service providers and database overload operation, causing the overall paralysis.

(4) Program BUG: for example, program logic BUG leads to memory leakage and other reasons that cause the overall paralysis.

(5) JVM stuck: THE FullGC of JVM takes a long time, even tens of seconds in extreme cases, during which THE JVM cannot provide any services.

In order to solve the avalanche effect, the industry proposed the fuse model. When some non-core services are abnormal, such as slow response or downtime, fuses are used to degrade and damage services are provided to ensure flexible service availability and avoid avalanche effect.

Spring Cloud Hystrix fuse

In physics, the fuse itself is a switch device, used in the circuit to protect the line overload, when there is a short circuit in the line, the fuse can cut off the fault in time, to prevent overload, fever and even fire and other serious consequences. Fuses in the distributed architecture are mainly used for RPC interfaces. Fuses are installed on the interfaces to prevent system breakdown caused by excessive pressure when RPC interfaces are congested. When RPC interface traffic is heavy or the target Provider is abnormal, fuses are cut off in time to protect the system.

Why are fuses important? Without overload protection, in distributed systems, when the called remote service is not available, the requested resources will be blocked on the remote server and exhausted. Most of the time, there may be only a small local fault at the beginning, but due to various reasons, the scope of the fault becomes larger and larger, eventually leading to global consequences.

Fuses are also called fuses. Fuses count the number of recent RPC calls with errors, and then determine whether to allow subsequent RPC calls to continue or roll back quickly based on the failure ratio in the statistics.

The three states of fuse are as follows:

(1) Closed: the fuses are closed, which is also the initial state of the fuses. In this state, RPC calls are allowed normally.

(2) Open: When the failure ratio reaches a certain threshold, the fuse enters the open state. In this state, the RPC will fail quickly and then the failback logic will be implemented.

(3) half-open: after opening for a certain period of time (the sleep window ends), the fuse enters the half-open state, and small traffic attempts to release through RPC call. If the attempt succeeds, the fuse turns off and the RPC call is normal. If the attempt fails, the fuse turns on and the RPC call quickly fails.

Figure 2-20 shows the conversion relationship between fuses.

Figure 2-20 Conversion relationship between fuses

The following focuses on the half-open state of the fuse. In the semi-open state, an RPC call attempt is allowed. If the call succeeds, the fuse will reset to the closed state and return to the normal mode. But if this RPC call fails, the fuse will return to the on state and wait until the next half-on state.

Fuses in Spring Cloud Hystrix are on by default, but can be customized by configuring the fuses’ parameters. Here is the configuration of the fuse example in the Demo-Provider microservice:

hystrix: ... command: default: ... CircuitBreaker: # Fuse-related configurations Enabled: true # Whether to use fuses. The default value is true requestVolumeThreshold: 20 # window of time, the minimum number of requests sleepWindowInMilliseconds: 5000 # open allows an attempt to sleep time, after the default configuration for 5 seconds errorThresholdPercentage: Metrics: rollingStats: timeInMilliseconds: 10000 # Sliding millisecondsCopy the code

The parameters of the Hystrix fuse used above can be divided into two categories: fuse parameters and sliding window parameters. The parameters of the fuse used in the example are described as follows:

(1) hystrix.com mand. Default. CircuitBreaker, enabled: this configuration is used to determine whether the fuse is used for tracking the RPC request of running state, or used to configure whether to enable fuse, the default value is true.

(2) hystrix.com mand. Default. The circuitBreaker. RequestVolumeThreshold:

This configuration is used to set the minimum number of requests for triggering fuses. If set to 20, when 19 requests are received in a sliding window (say 10 seconds), the fuse will not turn on to open even if all 19 requests fail. The default is 20.

(3) hystrix.com mand. Default. The circuitBreaker. ErrorThresholdPercentage: This configuration is used to set the error rate threshold. In the sliding window time, when the error rate exceeds this value, the fuse enters the Open state and all requests trigger fallback. The default error rate threshold percentage is 50.

(4) hystrix.com mand. Default. The circuitBreaker. SleepWindowInMilliseconds: This configuration is used to set the fuse sleep window, specifically determining how long it takes after the fuse is open to allow a request to attempt execution. The default value is 5 000 milliseconds, indicating that after the fuse is open, all requests will be rejected for 5 000 milliseconds and the fuse will be half-open for 5 000 milliseconds.

(5) mand. Hystrix.com default. The circuitBreaker. ForceOpen: if the configuration is true, the fuse will be forced to open, all requests will trigger the failure back (Fallback), the default value is false.

The fuse state transitions are related to Hystrix’s sliding window health statistics, such as the percentage of failures. The Hystrix health statistics configuration used in the example is described as follows:

(1) hystrix.com mand. Default. The metrics. RollingStats. TimeInMilliseconds: set the duration of the statistical sliding window (in milliseconds), the default value is 10, 000 milliseconds. The fuses are opened according to the statistics of a sliding window. If the error rate in the sliding window exceeds the threshold, the fuses enter the open state. The sliding window is further subdivided into buckets, and the statistics of the sliding window is equal to the sum of the statistics of all time buckets in the window, and the statistics of each time Bucket includes The Times of Success, Failure, Timeout and Rejection of the request.

(2) hystrix.com mand. Default. The metrics. RollingStats. NumBuckets: set a sliding window is divided time bucket number, the default value is 10. If the duration of the sliding window is 10 000 ms and a sliding window is divided into 10 time buckets, the time of a time bucket is 1 second. The value of numBuckets must correspond to timeInMilliseconds%numberBuckets==0, otherwise an exception will be thrown. For example 70000 (sliding window 70000 ms) %700 (buckets) ==0 is fine, but 70000 (sliding window 70000 ms) %600 (buckets) ==400 will throw an exception.

The above configuration options for Hystrix fuses use the hystrix.mand. Default prefix. These default configuration items will apply to all FeignRPC interfaces in the project unless a FeignRPC interface is configured separately. If a Feign RPC call needs special configuration, the configuration item prefix is in the following format:

Hystrix.com mand. Class name # method name (parameter type list)Copy the code

Let’s look at an example of special configuration for a single interface to the UserClient class

The Feign RPC interface /detail/v1 is configured as an example. The function of this interface is to obtain user information from the User-Provider service. Before configuring this interface, check the code of the UserClient interface as follows:

package com.crazymaker.springcloud.user.info.remote.client; . @FeignClient(value = "uaa-provider", configuration = FeignConfiguration.class, fallback = UserClientFallback.class, Path = "/ uaA-provider/API /user") public interface UserClient {/** * *@param userId userId *@return user details */ @requestmapping (value = "/detail/v1", method = RequestMethod.GET) RestOut<UserDTO> detail(@RequestParam(value = "userId") Long userId); }Copy the code

In demo-provider, the default prefix hystrix.mand. Default is not used if the fuses of RPC calls to the UserClient.detail interface are specially configured. Instead of using the hystrix.com mand. FeignClient# Method format prefix, specific configuration items as follows:

hystrix: ... Command: UserClient#detail(Long): # class name # method name (parameter type list)... CircuitBreaker: # Fuse-related configurations Enabled: true # Whether to use fuses. The default value is true requestVolumeThreshold: At least 20 # 20 request, fuse will reach the fuse trigger sleepWindowInMilliseconds: the number of times the threshold on the 5000 # allows an attempt to sleep time, the default configuration for 5 seconds errorThresholdPercentage: Metrics: rollingPercentile: timeInMilliseconds: 60000 BucketSize: 200 # Count times in the bucketCopy the code

In addition to the circuitBreaker parameters and the metrics sliding window parameters, many Hystrix Command parameters can be configured specifically for a particular Feign RPC interface, still using the “class name # method name (parameter type list)” format. For starters, the concept and configuration of sliding Windows can be a struggle to understand.

Feign+Hystrix implementation of RPC call protection

  1. The next article will explain the core principle of SpringCloudRPC remote call;
  2. Feel good friends can forward this article to pay attention to small;
  3. Thank you for your support!