The author | a senior r&d experts breeze

The introduction

New Year’s Day is coming. In the face of the blowout push demand, the push distribution pressure of tens of billions of levels, and the push service of hundreds of thousands of apps, how to continuously optimize the technical solution to ensure the real-time delivery of high-priority messages? This paper will describe the priority solution of ten billion level message push from the aspects of business scenarios, scheme design ideas, scheme exposition and so on.

The business scenario

In the process of daily news push, Individual push often needs to deal with challenges from the following four aspects.

**• High concurrency: ** A single server needs to respond to millions of concurrent requests during peak times per day

**• Low latency: ** It takes milliseconds to respond to push requests, ensuring the delivery of hundreds of millions of messages in seconds

**• Massive data: ** needs to realize real-time delivery of messages in tens of billions of push delivery requests every day

**• Mass users: ** It is necessary to quickly screen out the target groups that meet the push conditions for message delivery from the 10 billion users identified externally in the individual push business layer

When multiple apps deliver messages at the same time, resource competition inevitably occurs. Therefore, priority queues are required: Users with a higher priority need to have more resources to deliver messages when the number of resources to be delivered is fixed.

Scheme design Idea

To address these four challenges, we are guided by the following three principles:

• Ensuring relative equity and avoiding hunger

• Avoid mutual interference and obstruction

• Dynamically adjust the delivery speed

The business plan

The process of packaging a build was described above, but in practice, we find that as we build more and more tasks, the built environment becomes more and more complex and difficult to manage.

Based on the above three principles and combined with specific business scenarios, we formulated the following two solutions.

Solution overview

• The priority queue scheme based on Kafka is adopted to solve the three major problems in message delivery scenarios: different task sizes, large delivery volume, and low latency.

• PulSAR-based priority queue solution is adopted to solve the two problems of inconsistent response time and real-time priority adjustment in message receipt scenarios.

1. Priority queue scheme based on Kafka

Generally, during the three periods of 7-9 am, 12-13 PM and 19-21 PM, the push tasks received by each tweet are concentrated, accounting for 70 to 80 percent of the total number of messages. In these push tasks, we will screen out the users to be delivered according to the complex conditions such as mobile phone type, regional distribution and user group characteristics. However, the number of users of different customers varies greatly, ranging from hundreds to hundreds of millions. When these customers send push requests at the same time, individual push needs to ensure the real-time arrival of push messages to avoid the impact of delivery speed among customers.

The business logic

In view of the actual challenges and our planning ideas, the final design of the landing scheme is shown in the figure:

When the message push is in the low peak period and the machine resources are running at low water level, the message does not enter the queue and is directly sent to App.

When the concurrent quantity of message push becomes high and the machine resources run at the medium water level, the message enters the internal queue and is queued until the message is consumed and then sent to App.

When the concurrent volume of message push continues to increase and machine resources run at a high level, the message needs to be peak-filled and resource competition reduced. Its concrete implementation is as follows:

Messages are queued in the external queue Kafka. The internal queue only receives messages from the external queue. Messages are consumed from the internal queue and sent to App.

When all messages in the external queue are consumed and the running water of machine resources slowly decreases to the medium water, the internal queue starts to receive push messages from other channels. This avoids resource contention between messages and maintains consistency in subsequent processing logic.

We will use a scenario where the resource enters a high-water run as an example.

Details of the plan

As can be seen from the figure above, push messages are classified into high, medium and low priorities according to the App ID to which they belong. In the same priority case, Kafka producers first send different topics to large and small tasks, which are put into bounded blocking queues by the corresponding Kafka consumer threads. Then, the consumption ratio is 6:3:1. For example, if 1000 push messages are selected at a time, the high, medium and low priorities will get 600, 300 and 100 push messages respectively.

The scheduling thread then sends the batch of messages to the phone; If the high-priority push messages in a slack period, quota useless over, assuming that in 300, the remaining 300 quotas would be allocated according to the proportion of 3:1, low priority in the consumer thread, in order to fully improve the utilization rate of resources, large, medium and small tasks do not block each other, the purpose of different priority task does not interfere with each other.

2. Priority queue scheme based on Pulsar

When the message is delivered, most customers want to know whether the push message has reached the App terminal, whether it has been displayed by the system and whether it has been clicked by users in time. Therefore, the amount of receipt we send to customers every day is huge. The main challenges involved in this business scenario are:

Because each customer has different network conditions, machine performance, business processing logic, etc., the response time of the reply receipt sent to the customer is also different. The fast one can be returned in more than ten milliseconds, while the slow one takes more than ten seconds or even longer. Moreover, the response time of some customers fluctuates irregularly within this range.

Due to the large number of receipt messages, Pulsar also needs to queue up before sending them to the client server. In order to improve the speed of receipt sending, some customers will apply for adjusting the priority, so our server needs to adjust this in real time to improve the speed of receipt.

Combined with the challenges and the scheme design ideas mentioned above, our landing scheme is as shown in the figure:

Compared with the priority queue scheme based on Kafka, the business logic of this scheme is relatively simple. All the return receipt messages enter the external queue first, and then send the return receipt messages to the client server through the scheduling thread.

Since we needed to adjust priorities in real time and wanted customers with different receipt response speeds not to block and influence each other, we gave priority to the Pulsar component. The data transfer performance of this component is excellent and it has the ability to create millions of topics.

With this feature, Getuopai created different topics for each App, which laid a strong foundation for us to adjust the priority and delivery speed in real time. For details, see the detailed plan.

Details of the plan

As can be seen from the figure above, the receipt messages are divided into different groups, which are sent to customers according to their priorities. The specific steps are as follows:

First of all, after the mobile phone receipt message arrives at the push server, the message will be queued in the external queue Pulsar, and then different Pulsar consumer groups will obtain a batch of receipt messages according to different priorities within the group in the ratio of 6:3:1, and then send them to the client’s server.

App dimension acknowledgement message belonging to a group is not fixed, every once in a while, a push will according to different application response time, through the K means clustering algorithm to dynamic adjust the App’s group, guarantee delivery speed, in the group of different customers each other make acknowledgement response speed, and high priority within the same group of users can have more resources.

conclusion

The above two solutions are combined with message delivery and receipt scenarios. To sum up, the key points are:

• Scenarios vary

Message delivery: During the peak period, the number of messages to be delivered is large and the size of push task is different, so messages need to be delivered quickly. For this push, Kafka (internal and external) priority queue scheme is selected.

Message receipt — To dynamically aggregate messages with different response times and dynamically adjust priorities, a priority queue scheme based on Pulsar (different groups) was selected for this push.

• MQ varies by scheme

Internal and external priority queue scheme – Kafka component is selected to meet the need to transfer a large amount of message data in a short time;

Different group priority queue scheme – The Pulsar component was chosen for the need to create its own topic for each App.

It is suggested that developers should consider more business scenarios when choosing solutions, so as to find the most appropriate landing solutions. It has served hundreds of thousands of apps, and will further expand in the field of “message push” to share cutting-edge ideas and the latest practice schemes for developers.