I. Characteristics of push platform
Vivo push platform is a message push service provided by Vivo company to developers. By establishing a stable and reliable long connection between the cloud and the client, it provides real-time message push service for developers to client applications, supporting tens of billions of notification/message push, and reaching mobile users in seconds.
Push platform is characterized by high concurrency, large amount of messages and high timeliness of delivery. Currently, the maximum push speed is 140W /s, the maximum daily message volume is 15 billion, and the end-to-end second-level online delivery rate is 99.9%.
Ii. Introduction to Redis, the push platform
Based on the characteristics of Vivo push platform, it has high requirements on concurrency and timeliness, and has a large number of messages and a short validity period. Therefore, the push platform chooses to use Redis middleware as message storage and relay, as well as token information storage. Previously, two Redis clusters were mainly used and the Redis Cluster mode was adopted. The two clusters are as follows:
Operations on Redis mainly include the following aspects:
1) In the push link, the message body is stored to MSG Redis cluster in the access layer, and the message expiration time is the expiration time of MSG Redis stored message.
2) After a series of logic, the push service layer checks the message body from MSG Redis cluster, queries the client information of client Redis cluster, and pushes the message directly if the client is online. If the client is offline, the message ID is written to the wait queue.
3) If the connection comes up, push the service layer, read the message in the waiting queue and push it.
4) The storage management service periodically scans the CII index. According to the last update time of the CII storage, if the ciI is not updated within 14 days, it indicates that the user is inactive and clears the token information and the message corresponding to the token in the waiting queue.
The flow chart of Redis operation in push link is as follows:
Iii. Push online problems on the platform
As described above, the push platform uses the main MSG cluster and client cluster of Redis. With the development of business, the system has higher and higher requirements on performance, and there are some bottleneck problems in Redis. Before optimization, the size of MSG Redis cluster has reached 220 masters with a capacity of 4400G. As the cluster scale increases, maintenance becomes more difficult and the accident rate increases. In particular, in April, the divorce event of certain star caused 520 million concurrent messages in real time, and the MSG Redis cluster had the problem of single node connection number and memory explosion, among which the connection number of one node reached 24,674 and the memory reached 23.46G, lasting about 30 minutes. The read and write response of MSG Redis cluster is slow during this period, with an average response time of about 500ms, which affects the stability and availability of the overall system, and the availability drops to 85%.
Iv. Redis optimization of push platform
Redis is generally optimized from the following aspects:
1) Capacity: Redis belongs to memory storage. Compared with disk storage database, the storage cost is more expensive. It is due to the feature of memory storage that Redis has high read and write performance, but limited storage space. Therefore, when using services, ensure that the storage content is as hot as possible, and the capacity can be evaluated in advance. You are advised to set the expiration time. In storage design, appropriate data structures should be reasonably used. Some relatively large values can be compressed and stored.
2) Hot key skew: Redis-cluster maps all physical nodes to slots [0-16383]. Each node is responsible for some slots. When a request is called, the slot into which the key is requested is determined based on the value of CRC16(Key) mod 16384. Due to redis-cluster, each node is responsible for only a portion of slots. Therefore, ensure random distribution of hash values when designing keys, especially when using hash algorithms to map keys. In addition, to control hot key concurrency, you can use traffic limiting degradation or local caching to prevent Redis hotspots from being skewed due to excessive hot key concurrent requests.
3) The Cluster is too large: Redis-cluster adopts a centrless structure. Each node stores data and the status of the whole Cluster, and each node is connected to all other nodes. Each node holds all node and slot mappings. When there are more nodes, more mappings are saved for each node. The more data is carried in the message body of heartbeat packets between nodes. It takes a relatively long time for clusters to re-run clusterSlots during scaling. The cluster may be blocked and its stability may be affected. Therefore, when using a cluster, avoid having too many nodes in the cluster and split the cluster based on services.
Here’s the question: why does a Redis-cluster use 16,384 slots, rather than more, and how many nodes it can have at most?
The official author gives an explanation, and explains that it is not recommended to have more than 1000 primary nodes in a Redis-cluster.
Based on the above optimization directions and its own business characteristics, push platform starts the Redis optimization road from the following aspects.
-
MSG Redis cluster capacity optimization;
-
MSG Redis large cluster is split according to business attributes;
-
Redis hot key check;
-
Client Redis cluster concurrent call optimization.
4.1 Capacity optimization of MSG Redis cluster
As mentioned above, the size of MSG Redis cluster reaches 220 masters with a capacity of 4400G, and the used capacity reaches 3650G in peak period, accounting for about 83%. If the volume is pushed later, it needs to be expanded, and the cost is too high. Therefore, the content stored in MSG Redis cluster was analyzed using snowball open source RDB analysis tool RDR. Github website: here is not much introduction, you can go to github website to download the corresponding tool to use. Redis snapshot analysis tool can analyze the Redis snapshot, including: Redis structure type capacity, number of keys, top 100 largest keys, number of prefix keys and capacity.
Conclusion after analysis: in MSG Redis cluster, the structure beginning with MI: accounts for about 80%, and single tweet messages account for 80%. Description:
-
Single push: 1 message push and 1 user
-
Group push: 1 message can be repeatedly pushed to multiple users, messages can be reused.
The feature of single push is one-to-one push. After push or push failure (controlled, invalid users, etc.), the message body will no longer be used.
Optimization scheme:
-
Delete the single tweet message in time. If the user has received the single tweet message and received the puback receipt, delete the Redis message directly. If the sending of a single tweet is restricted for reasons such as control, delete the single tweet body.
-
For the same content of the message, the aggregation storage, the same content of the message stored in one, the message ID as identification push multiple times.
After this optimization, the reduction effect is more obvious. After full online, the capacity is reduced by 2090G, and the original maximum capacity is 3650G, which is reduced by 58%.
4.2 MSG Redis large cluster is split according to service attributes
Although the cluster capacity has been optimized, MSG Redis is still under great pressure during peak hours.
Main reasons:
1) There are many nodes connected to MSG Redis, resulting in a high number of connections in peak periods.
2) Message body and waiting queue are stored in a cluster, which requires operation when push, resulting in large Redis concurrency and high CPU load at peak times, which reaches over 90%.
3) The old cluster Redis version is 3.x. After the split, the new cluster uses 4.x. It has the following advantages over the 3.x version:
-
PSYNC2.0: Optimized the previous version where switching between primary and secondary nodes inevitably caused full replication issues.
-
A new cache elimination algorithm, LFU (Least Frequently Used), was proposed, and the existing algorithm was optimized.
-
Flushdb – removes bigkey from Redis and removes bigkey from Redis.
-
Memory command is provided to achieve a more comprehensive monitoring statistics of memory.
-
More memory saving, storing the same amount of data, requiring less memory space.
-
You can defragment memory and reclaim memory step by step. Redis can use online memory collation when using the Jemalloc memory allocation scheme.
The splitting scheme splits the MSG Redis storage information according to business attributes, separates the message body and wait queue, and puts them into two independent clusters. So there are two ways to split it.
Solution 1: Split the wait queue from the old cluster
The node only needs to be pushed for modification, but the sending waiting queue is continuous and stateful, which is related to the clientId online status. The corresponding value will be updated in real time. Switching will lead to data loss.
Solution 2: Split the message body from the old cluster
All nodes connected to MSG Redis are restarted with new addresses, and the nodes are pushed to double read. When the hit ratio of the old cluster is 0, the new cluster is directly switched to read. Because the characteristics of the message body are only write and read operations, there is no update, switching does not consider the issue of state, as long as the guarantee can be written and read no problem. In addition, the capacity of the message body is incremental, which facilitates rapid capacity expansion. The new cluster uses version 4.0 to facilitate dynamic capacity expansion and contraction.
Considering the impact on business and service availability, we finally choose plan 2 to ensure that messages are not lost. Dual-read single-write scheme design:
Since the message body is switched to the new cluster, the new message body is written to the new cluster for a period of time (up to 30 days), and the old cluster stores the old message body content. During this period, the push node needs to double read to prevent data loss. Dynamic rule adjustment without code modification or service restart is required to ensure high efficiency of dual-read.
There are four general rules: read the old, read the new, read the old and then read the new, read the new and then read the old.
Design Roadmap: The server supports four types of policies. The configuration center determines which rule to adopt.
The rule is based on the number of hits and hit ratio of the old cluster. Configure “Read old data first and then read new data” in the early online period. If the hit ratio of the old cluster is less than 50%, switch to Read the new first and then read the old. When the number of hits in the old cluster becomes 0, the system switches to Read-only New.
The hit ratio and hit count of the old cluster are increased by common monitoring.
The flow chart of scheme 2 is as follows:
Effect after splitting:
-
Before the split, the peak load of the old MSG Redis cluster was over 95% in the same period.
-
After the split, peak load decreased to 70% during the same period, down 15%.
Before the splitting, the average response time of MSG Redis cluster in peak period was 1.2ms, and there was slow response of calling Redis in peak period. After splitting, the average response time is reduced to 0.5ms, and there is no slow response in peak period.
4.3 Redis Hotspot Key Check
As mentioned before, the number of MSG Redis single node connections and memory surge occurred in a certain star hot event in April. The number of single node connections reached 24674 and the memory reached 23.46g.
Due to the virtual machines used by the Redis cluster, it was suspected at first that there was a pressure problem in the host machine where the virtual machines were located. According to the investigation, it was found that the host machine where the faulty node was located mounted a large number of Redis primary nodes, about 10, while other host machines mounted 2-4 primary nodes. Therefore, a round of equalization optimization was carried out for the master. So that each host is allocated a more balanced master node. After equalization, the whole has a certain improvement. However, during the peak push period, especially at full speed, problems such as single-node connection number and memory surge still occur occasionally. The network card traffic in and out of the host is observed, and no bottleneck problem occurs, and the influence of other service nodes on the host is also excluded. Therefore, it is suspected that there is a hot skew problem in the use of Redis.
As can be seen from the following figure, it takes a lot of time to call the hEXISTS command of MSG Redis from 11:49 to 12:59, which is mainly to query whether the message is in the MII index. The key that consumes most time for link analysis is MII :0. At the same time, the Redis memory snapshot of the faulty node is analyzed. It is found that miI :0 occupies a high proportion of capacity, and there are miI :0 hot spots.
After analysis and investigation, it is found that the messageId generated by the snowflake algorithm generating messageId has a skew problem. Since the sequence value of the same millisecond starts from 0 and the sequence length is 12 bits, the last 12 bits of messageId generated by the management background and API nodes with low concurrency are basically 0. Because the miI index key is mi:${messageId%1024}, the last 12 bits of messageId are 0, so messageId%1024 is 0. This leads to the hot key problem of Redis.
Optimization measures:
1) Transform the Snowflake algorithm. The initial sequence value used to generate message ids is no longer 0, but a random number ranging from 0 to 1023 to prevent hotspot skew.
2) Replace the call hEXISTS command by the message type in the MSG message body and whether the message body exists.
Final result: After optimization, MII index is evenly distributed, Redis connection number is stable, memory growth is also stable, and the problem of Redis single node memory and connection number explosion no longer appears.
4.4 Client Redis Cluster Concurrent Invocation Optimization
The upstream node invokes the push node with consistent hash through clientId. The push node caches clientInfo information locally for 7 days. When pushing, the upstream node preferentially queries the local cache to determine whether the client is valid. For important and frequently changing information, the client Redis is directly queried to obtain it. As a result, the client Redis cluster is under great pressure, with high concurrency and high CPU load in the peak push period.
Flow chart of node operation cache and Client Redis before optimization:
Optimization scheme: Split the original clientInfo cache into three caches and adopt a hierarchical scheme.
-
The cache still stores clientInfo information, which is infrequently changed, and the cache time is still 7 days.
-
Cache1 Cache clientInfo information that is frequently changed, such as online status and CN address.
-
The cache2 cache encrypts part of the parameter ci. This part of the cache is used only when encryption is required, changes less frequently, and only when a connection is connected.
Because of the new cache, we need to consider the cache consistency problem, so we add the following measures:
1) Push cache verification, call broker nodes, update and clear local cache information according to the return information of the broker. Broker added an offline and AES mismatched error code. When the next push or retry is performed, the client is loaded from Redis again to obtain the latest client information.
2) Update and clear local cache information according to mobile terminal upstream events, connect and Disconnect, and reload it from Redis to obtain the latest client information when push or retry next time.
The overall process is as follows: When a message is pushed, the local cache is first queried and loaded from client Redis only when the cache does not exist or has expired. When pushed to the broker, the cache is updated or invalidated based on the information returned from the broker. On the uplink, disconnect and connect events are received, and the disconnect and connect events are updated in time or invalid cache. When pushed again, it is loaded from client Redis again.
Flow chart of push node operation cache and Client Redis after optimization:
Optimized effect:
1) New cache1 hit ratio 52% and Cache2 hit ratio 30%.
2) Client Redis concurrent call usage is reduced by nearly 20%.
3) The load of Redis is reduced by about 15% in peak period.
Five, the summary
Redis is a good choice for cache middleware in high-concurrency systems because of its high concurrency performance and rich data structure. Of course, whether Redis performs well depends on whether the business really understands and uses Redis properly. The following points need to be noted:
1) Due to the Redis cluster mode, each master node is only responsible for a part of slots. When designing Redis keys, the randomness of keys should be fully considered and distributed evenly among all nodes of Redis, while large keys should be avoided. In addition, the hot spot problem of Redis request should be avoided in business, and a small number of nodes should be requested at the same time.
2) The actual throughput of Redis is also related to the packet data size and network card requested by Redis. The official document has related information. When the size of a single packet exceeds 1000bytes, the performance deteriorates sharply. Avoid large keys when using Redis. In addition, it is better to perform performance pressure measurement based on the actual service scenarios and network environment, bandwidth, and network adapter conditions to get the bottom of the actual cluster throughput.
Take our client Redis cluster as an example :(for reference only)
-
Network: 10000 MB;
-
Redis Version: 3.x;
-
Payload size: 250bytes AVG;
-
Commands: hset (25%), hmset (10%), hget (60%), hmget (5%);
-
Performance: connection number 5500, 48000/s, CPU about 95%.
Redis has little support for real-time analysis. In addition to basic index monitoring, real-time memory data analysis is not supported at present. In actual service scenarios, if a Redis bottleneck occurs, monitoring data is often missing, making it difficult to locate problems. Redis data analysis can only rely on analysis tools to analyze Redis snapshot files. Therefore, the use of Redis depends on the business’s full understanding of Redis, and the scheme design should be fully considered. At the same time, perform a performance pressure test on Redis according to business scenarios, understand where the bottleneck is, and prepare for monitoring and capacity expansion and reduction.
Author: Yu Quan, Vivo Internet Server Team