Recently, WE are doing the separation of microservices, and the final RPC framework is OpenFeign. I thought it would be very simple to use, after all, the online and official demo is very simple to use, just add a few notes. But in the actual landing or encountered a lot of problems, here is the first service up and down the line problem

The problem

After a service is split online, it is often found that other services report these two errors

feign.RetryableException: connect timed out executing GET 
Service connection timeout

Unexpected end of file from server executing GET xxx
Interface XXX is unavailable

Problem orientation

In fact, this problem is relatively easy to locate, because every time this error is reported, it is due to service restart. For example, if service A restarts, service B will report this error

Interface XXX is unavailable because interface B does not respond when service A calls service B, and service B is killed by kill -9

While feign. RetryableException: connect timed out executing the GET this error is largely due to feign client load balancing, the client will be pulled from the registration center service list address, a local caches. The problem is that the registry is aware of the offline service, but due to the client cache problem, the client still invokes the original offline service, and an error is reported

Problem solving

Unexpected end of file from server executing GET xxx

This problem is easier to solve, we need to solve the service is not directly forced to kill, requiring the service to complete the running request, and then stop. Graceful downtime was added after Spring Boot 2.3.

The usage is also very simple, add the following configuration

Spring: lifecycle: timeout-per-shutdown-phase: 20s # Set the buffer time to the default 30sCopy the code

If The Spring Boot version is less than 2.3, there is no official feature, but we can implement one ourselves. The following code

public class SpringStopListener implements ApplicationListener<ContextClosedEvent>, TomcatConnectorCustomizer {

    private volatile Connector connector;
    public void customize(Connector connector) {
        this.connector = connector;
    public void onApplicationEvent(ContextClosedEvent contextClosedEvent) {
        Executor executor = this.connector.getProtocolHandler().getExecutor();
        if (executor instanceof ThreadPoolExecutor) {
            try {
                ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executor;
                // Set the state to shutdown, no new requests will be received, and running tasks will finish
                if(! threadPoolExecutor.awaitTermination(waitTime, TimeUnit.SECONDS)) { log.warn("Tomcat thread pool did not shut down gracefully within " + waitTime + " seconds. Proceeding with forceful shutdown"); }}catch(InterruptedException ex) { Thread.currentThread().interrupt(); }}}Copy the code

feign.RetryableException: connect timed out executing GET

This problem is not particularly easy to handle, mainly because the client cache registry instance metadata caused by disabling the client cache will lead to poor performance, each time the need to go to the registry to pull the service data, if the disabled service online will occur when this error. First of all, Spring Cloud Feign has load balancing processors in different versions. Older versions use the Ribbon, The new version of Spring Cloud has replaced the Ribbon with Spring Cloud Load Balancer. After all, Netfix no longer maintains the Ribbon


If you use the Ribbon as a load balancer, you can use the following configuration

  ServerListRefreshInterval: 2000 # client cache metadata time
  ReadTimeout: 5000 Read data timeout
  ConnectTimeout: 2000 Connection timeout
This ServerListRefreshInterval value is to point to change, the default of the client cache registry service data for 30 s

This time is really pit. Let’s compromise on 2s here

Spring Cloud Load Balancer

Spring Cloud Load Balancer currently, there is less information on the Internet. If you want to configure Spring Cloud Load Balancer, you need to go to the official website to find information

        enabled: false
The current solution is mostly graceful downtime + reduced client cache time. There still needs to be a better optimization plan in the future