RedisTemplate.delete(Collection<K> keys)
Ps: If you want to see why, go to the explanation section at the end.
background
Last week, it was found that a large number of long-service suddenly appeared in the system through monitoring, and it was found that the operation time of Redis in this request was as high as 500ms through tracking logs!
After further following up the logs, it was found that recently there were many records that redis operation took a long time, and oddly, the long time was del operation, while set, GET and so on could be completed within 0~1ms. Here, you can basically rule out the network problem
Here’s something even weirder:
Contact the DBA, the DBA said positively: the server operation exceeds 10ms will be an alarm, now the cluster is ok.
In order to verify that it is not the server’s pot, I also pull the Redis slowlog on each primary node of the cluster. Unfortunately, it does not have the expected DEL record. So, I have to admit, the server side is also fine
Minimum reproducible environment
To illustrate the problem better, I set up a minimal replication environment locally.
Redis cluster
This parameter must be in cluster mode. The local server starts three primary and three secondary ports, and the port number ranges from 6380 to 6385.
Details are as follows:
127.0.0.1:6380 > cluster nodes 1 e5df367880965ead5b7064a2e4880db21b7aa94 127.0.0.1:6385 @ 16385 slave 3f8598d3ec376d6f732c9921c2da9d48c825b1ad 0 1586069946810 6 connected a136c7ed8206f0fa6fe73c02bf5f5d864fbc0f29 127.0.0.1:6383 @ 16383 slave f658a0138f70acbd880928930efa19eebba1098a 0 1586069945000 4 connected F658a0138f70acbd880928930efa19eebba1098a 127.0.0.1:6381 @ 16381 master - 0 1586069946000 2 connected. 5461-10922 3872 ab46cf6f316bc1d2a12c2d18e59e5d3a7e83 127.0.0.1:6384 @ 16384 slave a3a7431d8bbaa1bc795433d59ab0dd8bc3fa2d2f 0 1586069947000 5 connected 3 f8598d3ec376d6f732c9921c2da9d48c825b1ad 127.0.0.1:6380 @ 16380 myself, master 1586069945000-0 1 connected 0-5460 a3a7431d8bbaa1bc795433d59ab0dd8bc3fa2d2f 127.0.0.1:6382 @ 16382 master - 0 1586069947821 3 connected 10923-16383.Copy the code
Test project
Mainly for the deletion operation in the redisTemplate, I have configured the Redis driver to jedis(version independent).
For comparison, I’ve written three different service methods:
Redistemplate-delete (Collection
keys)
@Override
public void batchCollDel(List<String> list){
redisTemplate.delete(list);
}
Copy the code
2 Loop through the redistemplate-delete (K key) method using the normal forEach method.
@Override
public void batchNormalDel(List< String> list){
list.stream().forEach(redisTemplate::delete);
}
Copy the code
3 Using parallel streams, call the redistemplate.delete (K key) method.
@Override
public void batchParallelDel(List<String> list){
list.parallelStream().forEach(redisTemplate::delete);
}
Copy the code
To see how long each of the above three implementations takes, we wrote an around section that prints the entire list of deletes:
@Around(value = "methodPointCut()")
public Object doAroundAdvice(ProceedingJoinPoint proceedingJoinPoint){
String name = proceedingJoinPoint.getSignature().getName();
StopWatch watch = new StopWatch();
watch.start();
Object proceed = null;
try {
proceed = proceedingJoinPoint.proceed();
watch.stop();
log.info("method:{} spend : {}",name,watch.getTime());
} catch (Throwable throwable) {
throwable.printStackTrace();
}
return proceed;
}
Copy the code
Ok, the rest of the code is to package the above three methods as a controller and expose them, so I won’t put them in this article.
Pressure test
First of all, redis cluster performance is sufficient, RPS easily 10W, you can use the Redis-Benchmark to run.
The ab command pushes the normal forEach loop to delete:
ab -c 50 -n 500 http://localhost:8080/redis/batchNormalDel? size=1000Copy the code
Here is the statistics returned by AB. Each request was completed within 700ms, including the time to generate the random string:
Percentage of the requests served within a certain time (ms) 50% 527 66% 545 75% 559 80% 568 90% 595 95% 613 98% 632 99% 658 100% 663 (longest request)Copy the code
Take a look at the time it took to execute the redis batch delete as shown in the log:
Next, press it again and use it directlyRedisTemplate.delete(Collection<K> keys)
Batch delete effect:
ab -c 50 -n 500 http://localhost:8080/redis/batchCollDel? size=1000Copy the code
It’s going to blow up. At the beginning of the pressure test, the time increase linearly, and finally each batch deletion maintained at about 1500ms.
Percentage of the requests served within a certain time (ms)
50% 1576
66% 1757
75% 1785
80% 1854
90% 2172
95% 2308
98% 2340
99% 2399
100% 2446 (longest request)
Copy the code
Finally, let’s look at the performance of parallel streams:
ab -c 50 -n 500 http://localhost:8080/redis/batchParallelDel? size=1000Copy the code
It seems that concurrent is not omnipotent haha.
Percentage of the requests served within a certain time (ms)
50% 383
66% 490
75% 572
80% 639
90% 797
95% 989
98% 1213
99% 1462
100% 2206 (longest request)
Copy the code
Finally, after a slow operation, check the slowlog(here the unit of time is subtle).
127.0.0.1:6380> config get slowlog-log-log-than 1)"slowlog-log-slower-than"
2) "10000"
127.0.0.1:6380> slowlog get
(empty list or set) 127.0.0.1:6380 >Copy the code
Interpretation of the part
Spring - data: 2.2.6. Realease
Jedis: 3.2.0
Let’s look at the code for RedisTemplate:
@Override
public Long delete(Collection<K> keys) {
if (CollectionUtils.isEmpty(keys)) {
return 0L;
}
byte[][] rawKeys = rawKeys(keys);
return execute(connection -> connection.del(rawKeys), true);
}
Copy the code
Nothing special, so let’s move on: because of the clustering pattern, the corresponding class is JedisClusterKeyCommands.
@Override
public Long del(byte[]... keys) {
Assert.notNull(keys, "Keys must not be null!");
Assert.noNullElements(keys, "Keys must not contain null elements!");
if (ClusterSlotHashUtil.isSameSlotForAllKeys(keys)) {
try {
returnconnection.getCluster().del(keys); } catch (Exception ex) { throw convertJedisAccessException(ex); }}return (long) connection.getClusterCommandExecutor()
.executeMultiKeyCommand((JedisMultiKeyClusterCommandCallback<Long>) (client, key) -> client.del(key),
Arrays.asList(keys))
.resultsAsList().size();
}
Copy the code
Here whether incoming keys are in the same slot in a redis node, in ClusterSlotHashUtil. IsSameSlotForAllKeys inside will calculate all incoming key slot value.
In fact, since slots are generally evenly distributed in a cluster, there is very little chance that a batch of deletes will end up in the same slot (unless they are specifically grouped), and all, mostly, will go down the executeMultiKeyCommand branch.
public <S, T> MultiNodeResult<T> executeMultiKeyCommand(MultiKeyClusterCommandCallback<S, T> cmd,
Iterable<byte[]> keys) {
Map<RedisClusterNode, PositionalKeys> nodeKeyMap = new HashMap<>();
int index = 0;
for (byte[] key : keys) {
for (RedisClusterNode node : getClusterTopology().getKeyServingNodes(key)) {
nodeKeyMap.computeIfAbsent(node, val -> PositionalKeys.empty()).append(PositionalKey.of(key, index++));
}
}
Map<NodeExecution, Future<NodeResult<T>>> futures = new LinkedHashMap<>();
for (Entry<RedisClusterNode, PositionalKeys> entry : nodeKeyMap.entrySet()) {
if (entry.getKey().isMaster()) {
for(PositionalKey key : entry.getValue()) { futures.put(new NodeExecution(entry.getKey(), key), executor.submit(() -> executeMultiKeyCommandOnSingleNode(cmd, entry.getKey(), key.getBytes()))); }}}return collectResults(futures);
}
Copy the code
Next, each key will be grouped according to the Redis node, and the calculated results are as follows:
The Future is then used to asynchronously submit the commands to the respective master nodes.
After all the commands have been committed, wait for all the results to return using the collectResults(futures) method, that is, asynchronous transfer synchronization.
If the futures is not finished, wait for the futures to be executed. If the futures is not finished, wait for the futures to be executed.
The key problem is where the task is submitted for execution
executor.submit(() -> executeMultiKeyCommandOnSingleNode(cmd, entry.getKey(), key.getBytes()))
Copy the code
Let’s look at the Executor object:
It’s an AsyncTaskExecutor object, so what’s its corePoolSize?
In ClusterCommandExecutor, there are two initializations for executor:
1 construction method:
public ClusterCommandExecutor(ClusterTopologyProvider topologyProvider, ClusterNodeResourceProvider resourceProvider,
ExceptionTranslationStrategy exceptionTranslation, @Nullable AsyncTaskExecutor executor) {
this(topologyProvider, resourceProvider, exceptionTranslation);
this.executor = executor;
}
Copy the code
However, this constructor is not used anywhere.
2 blocks of code
{
if(executor == null) { ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor(); threadPoolTaskExecutor.initialize(); this.executor = threadPoolTaskExecutor; }}Copy the code
To be honest, you don’t see that written very often anymore.
This creates a ThreadPoolTaskExecutor with several key properties:
private int corePoolSize = 1;
private int maxPoolSize = Integer.MAX_VALUE;
private int keepAliveSeconds = 60;
private int queueCapacity = Integer.MAX_VALUE;
private boolean allowCoreThreadTimeOut = false;
Copy the code
The maximum wait queue length is integer.max_value
The maximum number of threads is integer.max_value
In this configuration, only one thread in the thread pool is always processing the task, and a second worker thread is created only when the queue is full, however, the queue is too large…
Even worse, there is no way to configure the Executor property, and its class, ClusterCommandExecutor, is not managed by the Spring container.
So, the problem has been found. Due to the encapsulation of Jedis by Spring in cluster mode, tasks will be submitted to executor with only one core thread for execution when using the batch method. In high concurrency scenarios, a large number of tasks in the application will be queued and not executed.
other
The associated code has been uploaded to github github.com/huifrank/re…
Contains redis scripts to create clusters, either using Docker or directly on the host.
If it’s a MAC, don’t create a Redis cluster using Docker. Docker host network mode is not supported on MAC (Linux only). If you use a bridge, because spring executes cluster Nodes to get the docker network address, even though you have configured the port mapped address. However, Spring will still use its own Intranet address, and the startup is also unsuccessful.
I worked on it for a while, gave up…