A few days ago, there was an online problem of Redis, which involved the testOnBorrow configuration of redis client. Today, through our online problem, we will talk about the role of the testOnBorrow configuration of Redis client

Redis online problem

Excessive redis traffic caused an alarm

The Redis service connected by one of our services suddenly reported an alarm. The alarm content is as follows:

  • More than 100,000 QPS accessed the Redis service

The surveillance shows that Redis has more than 100,000 QPS

Query platform monitoring found that the QPS exceeded 100,000

The redis query performance deteriorates

  • The QPS of Redis reaches 120,000. Our Redis is in the form of copy. The limit QPS is below 100,000, and the performance will be affected if the value exceeds this value
  • At that time, our Redis service was affected and the query time had reached 500ms

To analyze problems

1. Check the traffic distribution

  • I have checked the command traffic distribution of Redis, and the main traffic used is hmget command. The QPS reached 36,000, which is the real service traffic, but there are 72,000 QPS, which is not very normal. Look at the code logic

2. Look at the code logic

  • 1. Redis is used as cache in our service, mainly using hash structure to cache data and improve interface query speed
  • 2. If the REDIS query fails or the redIS query times out, the system queries the underlying services to ensure service reliability
  • 3. The redis client encapsulated by our students is used in the service code
  • 4. The testOnBorrow property in the Redis configuration is set to true
  • 5. The testOnBorrow configuration checks whether the connection is checked before it is removed from the pool, and if the check fails, it removes the connection from the pool and tries to retrieve another. True indicates check; false indicates no check. If this parameter is set to true, when all connections in the connection pool are occupied, the programs will wait until the connection times out or a valid connection is obtained
  • 6. Connection verification method is to send the ping command to the service, the source code is as follows:
boolean validate = false; Throwable validationThrowable = null; try { validate = this.factory.validateObject(p); } catch (Throwable var15) { PoolUtils.checkRethrow(var15); validationThrowable = var15; } if (! validate) { try { this.destroy(p); this.destroyedByBorrowValidationCount.incrementAndGet(); } catch (Exception var14) { } p = null; if (create) { NoSuchElementException nsee = new NoSuchElementException("Unable to validate object"); nsee.initCause(validationThrowable); throw nsee; }}}Copy the code

The verification logic is to send the ping command

return hostAndPort.getHost().equals(connectionHost) && hostAndPort.getPort() == connectionPort && jedis.isConnected() &&  jedis.ping().equals("PONG");Copy the code
  • 7. In this configuration, the ping and normal traffic is 1 to 1 in theory, but now the ping is 2 to 1
  • 8. Check the source code of our classmate’s package Redis client and find that this client is mainly prepared for the cluster mode Redis. It encapsulates a JedisClient object, and each JedisClient is linked to a Redis cluster node. First get jedisClient object, in the use of jedisClient object to obtain redis link, jedisClient object to obtain a connection, will also use Jedis to send a ping command
String ping = jedis == null ? "" : jedis.ping();
Copy the code
  • 9. So when our testOnBorrow property is set to true, two ping commands will be sent when we obtain a Jedis connection, resulting in twice the normal traffic volume of the ping command
  • 10. Experimental verification: After replacing the Redis client packaged by our students with the Jedis package, the testOnBorrow attribute is set to true, and the test interface has a ratio of ping traffic to normal traffic of 1 to 1

The solution

1. Changed the redis package packaged by our classmates into jedis3.2 version, which solved the problem of obtaining ping connection every time

2. Redis is only used as a cache, and when the query fails, the underlying service query will be called, so every query is not strongly dependent on Redis, so we set the testOnBorrow attribute to true. When the connection pool fails to obtain the connection, the underlying service will be queried to ensure the reliability of the service

3. Current situation

  • At its peak, Redis’ QPS stabilized below 50,000

  • The service response time is also stable at around 10ms