This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

Hi, everyone. Our business volume has skyrocketed recently, so I have been a f * * king idiot recently. A few nights ago, it was found that due to the surge of business pressure, several instances of a core micro-service newly expanded appeared Redis connection failure to varying degrees:

org.springframework.data.redis.RedisConnectionFailureException: Unable to connect to Redis; nested exception is io.lettuce.core.RedisConnectionException: Unable to connect to redis.production.com at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.transl AteException (LettuceConnectionFactory. Java: 1553) ~ [spring - data - redis - 2.4.9. Jar! The at / : 2.4.9] org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.getCon Nection (LettuceConnectionFactory. Java: 1461) ~ [spring - data - redis - 2.4.9. Jar! The at / : 2.4.9] org.springframework.data.redis.connection.lettuce.LettuceConnection.doGetAsyncDedicatedConnection(LettuceConnection.java : 1027) ~ [spring - data - redis - 2.4.9. Jar! The at / : 2.4.9] org.springframework.data.redis.connection.lettuce.LettuceConnection.getOrCreateDedicatedConnection(LettuceConnection.jav A: 1013) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Connection. Lettuce. LettuceConnection. The openPipeline (LettuceConnection. Java: 527) ~ [spring - data - redis - 2.4.9. Jar! The at / : 2.4.9] org.springframework.data.redis.connection.DefaultStringRedisConnection.openPipeline(DefaultStringRedisConnection.java:32 45) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at JDK. Internal. Reflect. GeneratedMethodAccessor319. Invoke (Unknown Source) ~ [? :?] the at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.springframework.data.redis.core.CloseSuppressingInvocationHandler.invoke(CloseSuppressingInvocationHandler.java:61) ~ [spring - data - redis - 2.4.9. Jar! /:2.4.9] at com.sun.proxy.$proxy355. openPipeline(Unknown Source) ~[?:?] at com.sun.proxy org.springframework.data.redis.core.RedisTemplate.lambda$executePipelined$1(RedisTemplate.java:318) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Core. RedisTemplate. Execute (RedisTemplate. Java: 222) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Core. RedisTemplate. Execute (RedisTemplate. Java: 189) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Core. RedisTemplate. Execute (RedisTemplate. Java: 176) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Core. RedisTemplate. ExecutePipelined (RedisTemplate. Java: 317) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Core. RedisTemplate. ExecutePipelined (RedisTemplate. Java: 307) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Core. RedisTemplate $$FastClassBySpringCGLIB $$81812 bd6. Invoke (< generated >) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] / / omit some stack under Caused by: org. Springframework. Dao. QueryTimeoutException: Redis command timed out at org.springframework.data.redis.connection.lettuce.LettuceConnection.closePipeline(LettuceConnection.java:592) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9]... 142 moreCopy the code

At the same time, there are exceptions when the service invokes the Redis command.

org.springframework.data.redis.connection.RedisPipelineException: Pipeline contained one or more invalid commands; nested exception is org.springframework.data.redis.connection.RedisPipelineException: Pipeline contained one or more invalid commands; nested exception is org.springframework.dao.QueryTimeoutException: Redis command timed out at org.springframework.data.redis.connection.lettuce.LettuceConnection.closePipeline(LettuceConnection.java:594) ~ [spring - data - redis - 2.4.9. Jar! The at / : 2.4.9] org.springframework.data.redis.connection.DefaultStringRedisConnection.closePipeline(DefaultStringRedisConnection.java:3 (224) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at JDK. Internal. Reflect. GeneratedMethodAccessor198. Invoke (Unknown Source) ~ [? :?] the at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.springframework.data.redis.core.CloseSuppressingInvocationHandler.invoke(CloseSuppressingInvocationHandler.java:61) ~ [spring - data - redis - 2.4.9. Jar! /:2.4.9] closePipeline(Unknown Source) ~[?:?] at com.sun.proxy.$Proxy355 org.springframework.data.redis.core.RedisTemplate.lambda$executePipelined$1(RedisTemplate.java:326) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Core. RedisTemplate. Execute (RedisTemplate. Java: 222) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Core. RedisTemplate. Execute (RedisTemplate. Java: 189) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Core. RedisTemplate. Execute (RedisTemplate. Java: 176) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Core. RedisTemplate. ExecutePipelined (RedisTemplate. Java: 317) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Core. RedisTemplate. ExecutePipelined (RedisTemplate. Java: 307) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Data. Redis. Core. RedisTemplate $$FastClassBySpringCGLIB $$81812 bd6. Invoke (< generated >) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9] at org. Springframework. Additional, proxy. MethodProxy. Invoke (MethodProxy. Java: 218) ~ [spring - core - 5.3.7. Jar! The at / : 5.3.7] org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:779) ~ [spring aop -- 5.3.7 jar! / : 5.3.7] at org. Springframework. Aop) framework. ReflectiveMethodInvocation. Proceed (ReflectiveMethodInvocation. Java: 163) ~ [spring aop -- 5.3.7 jar! / : 5.3.7] at org. Springframework. Aop) framework. CglibAopProxy $CglibMethodInvocation. Proceed (CglibAopProxy. Java: 750) ~ [spring aop -- 5.3.7 jar! / : 5.3.7] at org. Springframework. Aop) interceptor. ExposeInvocationInterceptor. Invoke (ExposeInvocationInterceptor. Java: 97) ~ [spring aop -- 5.3.7 jar! / : 5.3.7] at org. Springframework. Aop) framework. ReflectiveMethodInvocation. Proceed (ReflectiveMethodInvocation. Java: 186) ~ [spring aop -- 5.3.7 jar! / : 5.3.7] at org. Springframework. Aop) framework. CglibAopProxy $CglibMethodInvocation. Proceed (CglibAopProxy. Java: 750) ~ [spring aop -- 5.3.7 jar! / : 5.3.7] at org. Springframework. Aop) framework. CglibAopProxy $DynamicAdvisedInterceptor. Intercept (CglibAopProxy. Java: 692) ~ [spring aop -- 5.3.7 jar! The at / : 5.3.7] org.springframework.data.redis.core.StringRedisTemplate$$EnhancerBySpringCGLIB$$c9b8cc15.executePipelined(<generated>) ~[spring-data-redis-2.4.9.jar!/:2.4.9] // Omit part of stack Caused by: org.springframework.data.redis.connection.RedisPipelineException: Pipeline contained one or more invalid commands; nested exception is org.springframework.dao.QueryTimeoutException: Redis command timed out at org.springframework.data.redis.connection.lettuce.LettuceConnection.closePipeline(LettuceConnection.java:592) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9]... 142 more Caused by: org.springframework.dao.QueryTimeoutException: Redis command timed out at org.springframework.data.redis.connection.lettuce.LettuceConnection.closePipeline(LettuceConnection.java:592) ~ [spring - data - redis - 2.4.9. Jar! / : 2.4.9]... 142 moreCopy the code

Our spring-data-redis configuration is:

Spring: redis: host: redis.production.com port: 6379 # Command timeout: 3000 BC: pool: max-active: 128 max-idle: 128 max-wait: 3000Copy the code

These requests failed the first time the request was sent to the instance, but we have a retry mechanism and the request eventually succeeded. But that’s 3s more than normal, and it makes up about 3% of all requests.

From the exception stack, we can see that the root cause of the exception is the redis command timeout, but why the redis command is executed when the connection is established?

Lettuce Specifies the process for establishing a connection

Our Redis access uses the spring-data-redis + Lettuce connection pool. By default, the procedure for establishing a Redis connection in Lettuce is as follows:

  1. Establishing a TCP Connection
  2. Make the necessary handshakes:
  3. For Redis 2.x to 5.x versions:
If the user name and password are required, send the user name and password. 2. If the heartbeat before connection is enabled, send PINGCopy the code
  1. For Redis 6.x, the new command HELLO was introduced to initialize the Redis connection: Redis HELLO. The parameter of this command can be username and password.

For Redis 2.x to 5.x, you can configure whether to PING heartbeat before enabling the connection. The default is:

ClientOptions

public static final boolean DEFAULT_PING_BEFORE_ACTIVATE_CONNECTION = true;
Copy the code

We are using the latest 6.x version of Redis, so in the connection establishment and handshake phase, we must send a HELLO command and wait for the response to be successful before the connection is created.

So why does this simple command time out?

View Redis command pressure via JFR

The redis operation in our project is via the spring-data-redis + Lettuce connection pool, enabling and adding JFR monitoring for the Lettuce command. See my article: This new Redis connection pool monitoring method is not necessary. I add a little spice, so far my Pull Request has been merged, this feature will be released in version 6.2.x. Let’s take a look at the Redis command collection near the time of the problem, as shown in the figure below:

It can be seen that the Redis pressure is still relatively high at this time (the unit of firstResponsePercentiles in the figure is microseconds). At this time, we have 7 instances, this instance was just started, the pressure is relatively small compared with other instances, has already appeared connection command timeout. And we only intercept the HGET command here, and the GET command is executed the same order of magnitude as the HGET command, and the rest of the commands add up to half of the HGET command. At this point, from the perspective of the client, the QPS of the command sent to Redis has exceeded one million.

From the monitoring of Redis, there is some pressure, which may cause some commands to wait too long and cause timeout exceptions.

Optimization thinking

For spring-data-redis + lettuce, if we don’t use commands that require exclusive connections (including Redis transactions and Redis Pipeline), then we don’t need connection pooling, because the database is asynchronously responsive. For requests that can use a shared connection, the same actual Redis connection is used for the request, and connection pooling is not required. However, in this micro-service, a large number of pipeline commands are used to improve query efficiency. If we do not use connection pooling, we can result in frequent connection closure and creation (hundreds of thousands of connections per second), which severely reduces efficiency. Although the website says that connection pooling is not required, this is only if you are not using transactions and pipelines.

First of all, Redis expansion: Our Redis is deployed on the public cloud. If expansion means increasing machine configuration, the next higher configuration index is twice as much as the current one, and the cost is almost twice as high. Currently, less than 3% of requests fail and the next instance is retried and successful only under transient pressure. Scaling up Redis for this is not worth the cost.

Then, for applications that are too stressful, we have dynamic scaling. We also retry failed requests. But the implications of this question are:

  1. A newly started instance may start with a large number of requests due to instantaneous pressure, resulting in a mix of interface requests and heartbeat requests after the connection is established. And because these requests are not fairly queued, some heartbeat requests respond too slowly and fail, reestablishing the connection may still fail.
  2. Some instances may have too few connections to meet concurrency requirements. As a result, many requests are actually blocking the process of waiting for connections, so that the CPU pressure does not suddenly become too high, so it does not continue to trigger the expansion. This brings greater lag for expansion.

In fact, if we can find a way to minimize or avoid connection creation failures, we can greatly optimize this problem. That is, all connections in the connection pool are created before the microservice instance starts serving.

How to implement Redis connection pool connection pre-creation

Let’s first see if we can implement this connection pool with official configuration.

We looked at the official documentation and found two configurations like this:

Min-idle Indicates the minimum number of connections in the connection pool. Time-between-eviction -runs is a timed task that checks whether the connections in the connection pool are capable of at least min-idle and max-idle at the same time. Min-idle will only work when configured with time-between-eviction-runs. The reason for this is that the link pool in lettuce is implemented based on Commons -pool. The connection pool can be configured with min-idle, but preparePool needs to be manually called to create at least a number of min-idle objects:

GenericObjectPool

Public void preparePool() throws Exception {// If valid min-idle is configured, If (this.getminidle () >= 1) {this.ensureminidle (); }}Copy the code

So when is this called? Commons-pool has timed tasks, initial delay and timing interval configured with time-between-eviction-runs, which are designed to do the following:

public void run() {
    final ClassLoader savedClassLoader =
            Thread.currentThread().getContextClassLoader();
    try {
        if (factoryClassLoader != null) {
            // Set the class loader for the factory
            final ClassLoader cl = factoryClassLoader.get();
            if (cl == null) {
                // The pool has been dereferenced and the class loader
                // GC'd. Cancel this timer so the pool can be GC'd as
                // well.
                cancel();
                return;
            }
            Thread.currentThread().setContextClassLoader(cl);
        }

        // Evict from the pool
        try {
            evict();
        } catch(final Exception e) {
            swallowException(e);
        } catch(final OutOfMemoryError oome) {
            // Log problem but give evictor thread a chance to continue
            // in case error is recoverable
            oome.printStackTrace(System.err);
        }
        // Re-create idle instances.
        try {
            ensureMinIdle();
        } catch (final Exception e) {
            swallowException(e);
        }
    } finally {
        // Restore the previous CCL
        Thread.currentThread().setContextClassLoader(savedClassLoader);
    }
}
Copy the code

This periodic task ensures that the number of idle objects in the current pool does not exceed max-idle and there are at least min-idle links. These are all mechanisms that common-Pools brings with it. But there is no need to initialize all links as soon as the connection pool is created.

This requires our own implementation, we first set min-idle = max-idle = max-active, so that there is the same maximum number of links in the pool at any time. We then modify the source code where the connection pool is created and force a call preparePool to initialize all links:

ConnectionPoolSupport

The method is called when initializing a connection pool. Public static <T extends StatefulConnection<? ,? >> GenericObjectPool<T> createGenericObjectPool( Supplier<T> connectionSupplier, GenericObjectPoolConfig<T> config, GenericObjectPool<T> pool = new GenericObjectPool<T>(new RedisPooledObjectFactory<T>(connectionSupplier), config) { @Override public T borrowObject() throws Exception { return wrapConnections ? ConnectionWrapping.wrapConnection(super.borrowObject(), poolRef.get()) : super.borrowObject(); } @Override public void returnObject(T obj) { if (wrapConnections && obj instanceof HasTargetConnection) { super.returnObject((T) ((HasTargetConnection) obj).getTargetConnection()); return; } super.returnObject(obj); }}; PreparePool try {pool.preparePool(); } catch (Exception e) { throw new RedisConnectionException("prepare connection pool failed",e); } // omit other code}Copy the code

In this way, we can implement initialization of Redis, before the microservice actually provides the service, initialization of all Redis links. Since this involves source code modification, you can now replace the source code of the dependent library by adding classes with the same name and path to your project. For this optimization, I also made an issue and corresponding pull request to lettuce:

  • ConnectionPool would be better if prepared before used
  • fix 1870,ConnectionPool would be better if prepared before used

Wechat search “my programming meow” public account, a daily brush, easy to improve skills, won a variety of offers