Abstract: Explore the concepts, architecture, and challenges of common Feed streaming systems and how to design a Feed streaming system using Gaussian Redis.

This article is shared by Huawei Cloud community “Huawei Cloud PB-level database GaussDB(for Redis) Revealed 6th Issue: Application in Feed Flow Scenarios”, originally written by Gauss Redis official blog.

The background,

GaussDB(for Redis for short) is a strongly consistent and persistent NoSQL database developed by Huawei and compatible with Redis5.0.

In the Internet era, our daily life is full of Feed streams. Wechat Moments, Weibo, Douyin and Toutiao are all using Feed streams to timely push to us the friends we follow or the content we are interested in, so that we can’t extrude ourselves and bring the improvement of business value. Next, I’ll explore the concepts, architecture, and challenges of common Feed streaming systems and how to design a Feed streaming system using Gaussian Redis.

Second, the concept of

Feed streaming system is a system in which the Feed producer passes the produced Feed to the Feed consumer through the storage distribution system, and finally presents the Feed in some form.

1. Feed: A single push of content. A microblog is a Feed.

2. Feed Stream: A stream of information consisting of continuous feeds.

3. Display format: The display format mainly includes Timeline and Rank. For example, weibo is displayed in Timeline. The Toutiao client is mainly presented as a recommended Rank.

4. Feed producer: For microblog, it is every user, and for headlines, it is the recommendation algorithm.

5. Feed consumer: The subject that receives Feed notifications.

6. Synchronize the storage system: This part can be divided into three parts (the implementation varies slightly).

6.1 Content Storage Module: This section is concerned with how raw Feed content is stored. Like a tweet you sent.

6.2 Association relationship storage module: For microblog, users’ followers and followers are stored; for headlines, crowds are stored (all users are classified according to user portraits).

6.3 Mailbox module: Generally, it can also be called message delivery module. Feed messages can be stored in the mailbox before the Feed flow is finally formed.

3. Architecture design

With this conceptual introduction, let’s look at how feeds move from Feed producers to Feed consumers.

The Feed producer creates a content and sends it to the Server. The Server first stores the message content into the message storage module, and then writes the notification information into the mailbox module according to the design of the query relationship repository of the mailbox module. The Feed consumer obtains timely messages by querying the mailbox.

Message storage module:

Feed content is generally semi-structured data with a large amount of data, which requires persistent content. Logically, it is a KV system, and the mapping relationship between ID and content.

Relational storage module:

Association relationships can be added or deleted. They are a set of variable length and need to support rapid addition, deletion and query. Generally, they do not need to support complex operations such as join. Therefore, NoSQL databases are suitable for this type of data storage.

Mailbox module:

When it comes to mailbox module, we will generally discuss whether to use push mode, pull mode or push-pull mode combination. The application of GaussDB(for Redis) in IM scenarios is also discussed in huawei cloud PB-level database.

In push mode, after querying the association relationship, the Feed notification will be written into the inbox of each Feed consumer. The Feed consumer can query his or her inbox to obtain the complete Feed stream, and the notification will be written into the inbox of each person who needs to be notified, and the writing will be enlarged.

In pull mode, the Feed notification is written into its own outbox. The Feed consumer first queries the relationship library, and then obtains the Feed message from the outbox of all followers and merges it into display. Therefore, the number of inbox reads is related to the number of followers, and reading will be enlarged.

Combination of push and pull modes: Push mode is used for writing for most users, and pull mode is used for specific users. When reading, Feed consumers read their own inbox and specific users’ outboxes respectively, which are displayed after merging.

Which mode to choose depends on specific business scenarios and requirements.

In the specific implementation of many services, messages will be written into the message queue first. On the one hand, it can play the role of traffic peak clipping, and on the other hand, it can realize some specific push optimization logic, such as not pushing spam or sensitive words.

Technical challenges

Let’s take a look at the data from wechat moments. On January 19, 2021, zhang Xiaolong, the founder of wechat, disclosed the latest data of wechat in the wechat open class Pro: Wechat has 780 million users entering the moments of friends every day, and 120 million users Posting moments of friends. The average person opens it a dozen times, with 10 billion page views a day. What would be the challenges if we wanted to implement a similar Feed streaming system? From the perspective of storage capacity, if users send moments three times a day on average, each content 1kB, about 100 billion records a year, storage capacity close to 100TB. From the point of view of the number of access requests, the peak OPS of write and read per day should be at least one million level, the latency of user write and read should be real-time, and the response time should be at least within seconds, otherwise the user will close the APP every minute. Therefore, we need a distributed storage system with persistence, mass storage, high throughput, easy expansion, low latency and low storage cost.

5. Advantages of Gauss Redis

5.1 Introduction to Gauss Redis

Gauss Redis is a cloud native database independently developed by Huawei cloud database team and compatible with Redis5.0 protocol. It adopts the computing and storage separation architecture. On the storage side, a self-developed storage system DFV provides unlimited capacity expansion, strong consistency, and high reliability. The computing side is based on LSM storage engine and has excellent write and read performance. With the advantages of computing separation architecture, Gaussian Redis achieves second expansion without data copy, giving full play to the advantages of cloud native elastic scaling and resource sharing.

5.2 How to take advantage of Gauss Redis in Feed Flow scenarios

For Feed flow scenarios, Gaussian Redis can be used as follows:

1. Message content storage

This can be achieved by using the KV structure of Gauss Redis. Gauss Redis adopts the storage and computing separation architecture, which can easily support massive data storage and low latency access delay.

2. Associate relationship storage

Gauss Redis collection structure or dictionary structure can easily realize the increase, delete, change and check of association relations.

3. Mailbox storage

The mailbox is implemented as a queue, supporting the ability to consume from a specified location. The Stream structure of Gauss Redis can realize queue capability and easily realize Feed Stream message reading.

5.3 Gauss Redis Feed flow practice

The following uses Gaussian Redis to realize a simple microblog sample, and adopts the write diffusion model to illustrate the feasibility.

There are four users in the system: Jay, Jolin, ZhangSan and LiSi. Among them, ZhangSan and LiSi pay attention to Jay, while LiSi pays attention to Jolin.

The above implementation of a simple microblogging system, real systems would be more complex than this, involving business scenario-specific processing logic. Using Gaussian Redis as the Feed stream storage base is an ideal technology selection.

Six, summarized

Gauss Redis has the advantages of persistence, mass storage, high throughput, easy expansion, low latency, low storage cost and so on. It is very suitable for the Feed stream storage base. Its excellent read and write performance and advanced features will greatly simplify application development. At the same time, Gauss Redis on the basis of open source Redis, a good balance between performance and cost, can be widely used in intelligent medical treatment, traffic peak cutting, counters and other fields.

Seven, end

Author: Huawei Cloud Gauss Redis team.

For more technical articles, follow the official Gauss Redis blog:

Bbs.huaweicloud.com/community/u…

Viii. Reference materials

1. GaussDB(for Redis) Official Homepage

www.huaweicloud.com/product/gau…

2. Cost Comparison between Huawei Cloud GaussDB(for Redis) and Self-built Open Source Redis

www.modb.pro/db/42739

3. “huawei cloud GaussDB petabytes database (for Redis) : the first phase of the leaked Redis and save calculate separation bbs.huaweicloud.com/blogs/23858…

4. The huawei cloud GaussDB petabytes database (for Redis) reveal the second stage: Redis message queue Stream application study “bbs.huaweicloud.com/blogs/24562…

5. The huawei cloud GaussDB petabytes database (for Redis) reveal the fifth stage: gaussian Redis application in IM scene “bbs.huaweicloud.com/blogs/24924…

6. What is feed Flow?

Cloud.tencent.com/developer/a…

7. Practice of Push and Pull of Wechat Feed Flow

Cloud.tencent.com/developer/a…

Click to follow, the first time to learn about Huawei cloud fresh technology ~