Takeaway: Short video Push system is a distributed Push system that supports multiple apps and multiple business scenarios in Baidu. Currently, it supports the Push services of apps such as good-looking video, live broadcast, small view and good-looking big-character version, and provides personalized Push based on the basic characteristics of users, as well as operation Push of popular activities and hot events. Support for scenarios such as real-time business push based on concern relationship or subscription relationship. It aims to steadily and efficiently push users’ favorite content information to the notification bar through personalized recommendation system and operation editing method so as to achieve the business goal of improving user activity and user retention.

The full text is 5886 words, and the expected reading time is 15 minutes.

Background:

In this Internet era of information explosion, able to timely and accurate access to information in today’s society is one of the key problems to solve, Push technology has changed the traditional way of “active,” to get information, but become to actively look for the user’s information, in the mobile network is more suitable for meet the needs of the user personalization information. This paper mainly introduces the design and implementation of the short video Push system as well as the continuous optimization of the system, so as to tell you the construction experience of the Push system with hundreds of millions of data.

Notification Push: Notification Push, which is initiated by the server and displayed in the lock screen interface, notification bar and APP corner icon of the user device.

Personalized Push: Select the materials that users are interested in through user portrait and recommendation model.

Operation Push: the operation personnel manually edit the Push sent by the material in the background of Push (such as: Push hot activities and hot events).

Real-time Push: real-time Push that can be sent according to the relatively precise time requirements of users when they have interactive operations (such as following, liking, commenting, etc.) in the APP or need to send alert for broadcast broadcast.

First, understand the system

1.1 System Overview

With the continuous development of Baidu’s short video business, the app also has hundreds of millions of quarterly active users. Push system will give the app every day living quarter users for article n personalized Push and no fixed number of popular events and hot issues of operation, Push to deal with the amount of data and the concurrency value is the important problem of system design needs to consider, in addition, according to different regions of users will send hundreds of regional Push every day and a lot of attention to relationship between real-time delivery, This also has strict requirements on the stability of the system. As we all know, Push is a very effective means of pulling, and the importance of the stability of the system can be imagined.

1.2 System Overview

Push system serves good-looking video, live broadcast, degree of small view, good-looking large-type version and other businesses. The system will subscribe and update video material information and user attribute information in real time to ensure the accuracy of information in the construction of Push message body. It will request recommendation service to recall personalized materials in the early morning, and then create Push task according to operation Push and personalized Push time points. After the task is created, it will preprocess the task half an hour in advance (to ensure that Push can be sent as soon as possible according to the time). Real-time Push, such as user interaction messages and live broadcast alerts, sends the content to be pushed to the Push preprocessing service in real time through API calls. After the preprocessing is completed, the result is written into the REDis queue, and the sending service sends the information to the cloud Push center according to the priority of the task. The cloud Push center calls the manufacturer’s agent or its own long link service to send the Push information to the user’s mobile phone.

The overall architecture is shown in the figure below:

1.2.1 Introduction to each module of Push core architecture

1. Material center: Stores the video material information required for Push, including the title, description, material picture and status information of Push, and subscribs to the B-end video change message queue for real-time update.

2. User center: stores the basic user information required by Push and some user attributes unique to Push system (e.g. 1). Estimated user active time. 2. Estimate the personalized Push time of the first and last items of the user), and the client reports the user information for real-time update.

3. Personalized recall: Personalized material recall will start at 1:00 am every day for seasonal users to send personalized Push in the daytime.

4. Realtime-api service: real-time write to the pre-processing queue for data pre-processing and sending operations, used in real-time Push scenarios.

5. Frequency control (UFC) : Prevents users from being disturbed. The uFC can be divided into two levels: day and hour. Day-level frequency control setting, a user set the maximum number of Push in a day. At the hourly level, each user receives a maximum of one Push every half hour.

6. Pre-processing service: divide the incoming tasks half an hour in advance, construct messages and put them into Push queues to ensure timely delivery of Push tasks.

7. Sending service: obtain the corresponding manufacturer’s task from Push queue according to the sending time and priority of the task, cut the task according to ups and QPS of the manufacturer and send the task to cloud Push.

8. Receipt service: record relevant logs according to the arrival receipt of each manufacturer for data statistics and real-time monitoring and alarm.

9. Control Center (PCC) : visual configuration system for important Push functions.

The dependency diagram of each module of Push core architecture is as follows:

1.3 System Data Flow

1.3.1 Overall data flow of the system

The client reports user information and logs of some user behaviors to the data center, and the data center generates corresponding data tables according to the client. The policy generates video materials, Push sending user sets and agent quota user sets according to the data tables produced by the data center. According to the strategy model, the architecture side recalls Push materials, creates and sends tasks, and sends the information to Push center, which sends the Push center to each manufacturer’s agent or long link and produces push-related data tables. After sensing Push’s arrival, the manufacturer sends the acknowledgement message to the internal service. The architecture logs and reports the output of the report to the data center based on the receipt of arrival. As shown below:

Client: The Push SDK is used to bind Push_token and report basic user information and user behavior log.

Data center: Generates active user tables, user behavior tables and related business reports based on service operations.

Push strategy: Push materials are produced at the day level and personalized Push materials are produced according to user portraits.

Push architecture: carry out personalized Push material recall at dawn, send tasks regularly and deal with the arrival receipt of manufacturers.

Cloud Push central: Send Push tasks to various vendor agents or long links and produce Push base data tables.

Vendor agent: responsible for sending their vendor’s Push task to the user device and sending the arrival receipt.

1.3.2 Push reaches the receipt data stream

There are three types of Push arrival receipt, namely, receipt from each Android agent, receipt from Ios and receipt from long link, which are received by the Push service and written into the message queue. The push-arrive service consumption message queue on the architecture side is 1. Real-time statistical calculation and data are written into Redis for use in real-time statistical reports. 1. Local Log is recorded, real-time monitoring and alarm are made after collection, and then uploaded to statistical reports related to output of data center and candidate set of Push materials.

As shown below:

Second, system iteration and optimization

2.1 Timing Estimate the sending time of the first and last personalized Push items

2.1.1 background

The time of the first personalized push for all users in the original logic is 6:30 and 21:45 for the last personalized push every day. Each user gets up and goes to sleep at different times, and different times have different sensitivity to the Push received. The click rate of Push can be improved by choosing the time to send according to the user’s habits.

2.1.2 Service design

Estimate the time of sending the first and last messages of different users according to their usage habits, so that users can Push the content they are interested in on time when they want to look at their mobile phones. It is obvious that the difficulty of the service lies in how to predict when users will look at their phones when they are idle. The general logic is as follows: Estimate the first sending time, count the first active days of users in [5:30, 6:00] within 7 days. If the number is greater than 1, the first personalized sending time of this user will be adjusted from 6:30 to 5:30. If not, the first active days of the user in [5:30, 6:30] within 7 days are counted. If the number is greater than 1, the first personalized sending time of the user is adjusted from 6:30 to 6:00. The remaining users still send at 6:30; Last send time estimation: counts the first active days of the user within [22:15, 22:45] within 7 days. If the number is greater than 1, the first personalized send time of the user is adjusted from 21:45 to 22:15. If not, the first active days of the user within [22:15 23:59] within 7 days are counted. If the number is greater than 1, the first personalized sending time of the user is adjusted from 21:45 to 22:45. For the rest of the users, the sending time is still 21:45; As shown below:

2.2 Optimization of user clustering service in Push system

2.2.1 background

This service produces various user sets required by Push, such as full users, personalized users, interested users and regional users, which are collectively referred to as user packages.) The output of user packages depends on different upstream sources, including user centers, policies, data groups, etc. With business iteration, the following problems exist:

1) Lack of unified management, most of them are timing scripts deployed on physical machines, with single point problems and scattered monitoring and alarm of data output.

2) The storage of user packages depends on physical machines and Hadoop clusters. During the sending process, the user packages need to be fully loaded into the memory through FTP and AFS files, which takes about 30 seconds to complete a single task, affecting the timeliness.

3) Each type of user package is stored separately, resulting in waste of storage resources.

4) In the operation of multiple user packages, loading repeated user ids wastes memory resources and the process of reloading affects timeliness.

5) It takes a long time to load the concerned user package and process logic during the restart of the live recall module, which affects online efficiency and service availability. It takes 20 minutes to restart a single live recall module.

2.2.3 Service design

2.2.3.1 Comparison between old and new architectures

The original architecture

The new architecture

1) In order to distinguish user packages based on FTP and AFS clusters of physical machines in the current architecture, user groups are used to represent user sets that meet the characteristics of a single dimension.

2) The registration and management of user groups are uniformly configured through amis platform, and each user group has a unique identity.

3) The user group is represented and stored in the form of bitmap. Each user in the bitmap represents a user, and each user group can be represented by a bitmap.

4) Replace the original user package address of Push service with user group label, and support logical operation among multiple user groups, expressed by logical expression.

5) In the process of sending, the user group service is first queried through the logical expression of the user group label to obtain a bitmap of the final user to be sent; Then the user id is obtained in batch from the user group service through bitmap, processed by streaming and sent.

2.2.3.2 Design of user group management

User Group configuration:

1) The configuration layer manages the user group uniformly through the AMIS platform, which is stored in mysql in the form of tasks and supports the configuration of user group label, user package output address, update frequency, retry times and so on.

2) The scheduling layer conducts preemptive scheduling for each task and sends the captured task to the service layer for execution;

3) The service layer obtains the task and starts the database construction process:

1. Pull the remote file according to the user package address. If the remote file is pulled down successfully, modify the task status and retry times in the execution record table if the remote file fails.

2. Load the user ID in the file, calculate the corresponding CRC64 / FNV64 value, and map the result to redis (k: CRC64 / FNV64, V: user ID);

3. Calculate the bitmap of the current user group (RaoringBitmap algorithm) and store the result in redis (k: user group label, v: user group bitmap).

2.2.3.3 Online service interaction design

Online service interaction process:

1) Push task uses user group tag to specify sending user set (AMIS/scheduled task writes to mysql), multiple tags are represented by logical expression;

2) After the Push service layer obtains the sending task, label expression is used to request the online service of the user group;

3) The online user group service reads the bitmap of all user group labels from Redis according to the logical expression, carries out the logical operation, and obtains the bitmap of the final user group, which is returned to the push service layer;

4) Push service layer traverses bitmap, obtains crC64 / FNV64 value by bit, and requests user group service in batches;

5) The user group service maps crC64 / FNV64 back to the corresponding user ID from Redis and returns it to the Push service layer.

2.3 Frequency control service (UFC) optimization and transformation of Push system

2.3.1 Frequency control service mainly has the following functions:

1) Basic functions limit that a user cannot receive two Push messages within 30 minutes, and the total number of Push messages within a day cannot exceed Max

  1. Combined with the whitelist data provided by the policy, time range frequency control, and weight policy data of user ID + push type, the arrival and recovery data is used for personalized frequency control

  2. Permanent PushType and user id whitelist functions. For this type of PushType, user id is not frequency controlled

2.3.2 background

  1. Currently, uFC allocates fixed physical storage frequency control data in the form of a mod based on the hash value of the user id. The number of allocated servers and server IP addresses are fixed. Therefore, it is not easy to expand, and the extension will affect the current user frequency control data.

2) Frequently changing configurations of the push type and user id whitelist are in the form of configuration files, and it takes a long time for each change to go online.

  1. Services deploy physical servers in a mixed manner and compete with other services for resources. As a result, services may be affected or affected by other services, causing service instability.

2.3.3 Service design

Dynamic expansion, consistent hash algorithm

  1. First compute the hash value of the server (node) and configure it to a continuum of 0 ~ 232.

  2. The hash value of the key storing the data is then calculated using the same method and mapped to the same circle.

  3. It then searches clockwise from the location to which the data is mapped, saving the data to the first server it finds. If no server is found after 232, the server is saved to the first server.

Resource compression, using the Protobuf protocol for data compression

The Google Protocol Buffer(Protobuf for short) is an internal mixed language data standard of Google. Currently, there are more than 48,162 message format definitions and more than 12,183.proto files in use. They are used for RPC systems and continuous data storage systems. Protocol Buffers is a portable and efficient structured data storage format that can be used for serialization, or serialization, of structured data. It is ideal for data storage or RPC data exchange formats. A language-independent, platform-independent, extensible serialized structured data format for communication protocols, data storage, etc. Protobuf is made from things like JSON and XML, but it’s smaller, faster, and simpler. You can define your own data structure and then use the code generated by the code generator to read and write the data structure. You can even update data structures without having to redeploy the program. Using Protobuf to describe the data structure once, you can easily read and write your structured data in different languages or from different data streams.

A phrase to describe: protobuf is binary protocol, by giving the json schema is designed to provide faster parsing, mainly put {, for example, “the key equivalent, using the tag | value stock value, high transmission efficiency, transmission volume is very small, very popular in the use of the TCP/RPC.

Serialization time comparison:

Bytes Comparison of bytes:

Comparison of Push business protocols:

Conclusion: Marshal of PROto is twice faster than Marshal of JSON, and the size of data after compression is 1/4 of that of JSON, and the larger the data is, the more obvious the advantage is. Finally, the compressed frequency control data saves 75% redis resources.

Third, summary

Message Push is an important means of mobile App operation, with low cost and high efficiency. With the rapid development of the mobile Internet, the development of mobile applications is becoming more and more mature, and the update frequency of applications is also increasing. Meanwhile, the messages pushed by various applications are also varied. The core value of Push system is whether it can timely and accurately Push the information that users are interested in and improve the consumption rate of Push information.

Apple founder Steve Jobs once said, “It’s really hard to design products for the masses. Because in a lot of cases, people don’t know what they want, so it’s up to you to show them.” I feel the same applies to push content, which can be tailored to users who know exactly what they want (personalized push). For users who are not clear about what message content they want, App operators need to consider the feasibility of the message when pushing messages. It’s not just the content that needs to be chosen carefully, but also the timing, who to push, how to push, etc. (operation push).

Recommended reading:

Baidu Aipanfan data analysis system architecture and practice

Managed page front-end abnormal monitoring and governance actual combat

Based on ETCD to achieve large-scale service governance application combat

———- END ———-

Baidu said Geek

Baidu official technology public number online!

Technical dry goods, industry information, online salon, industry conference

Recruitment information · Internal push information · technical books · Baidu surrounding

Welcome to your attention