The epidemic brought offline retail to a freezing point during the Spring Festival in 2020, but it made live streaming apps popular. Live live live electrical business, education and other kinds of application has won the historic opportunity, many public begin to accept and approve the new application of convenience and value, personal feeling with the popularity of 5 g, “live + vertical + intensification of private domain flow” will be a big focus in the Internet, in the real dividend period.

Broadcast industry around more than 10 years ago began to rise, live shows and games is a successful application scenario for the PC era, live up to 16 years, with the massive popularity of the mobile Internet, usher in the first year of real, live industry hundreds of thousands of live APP appeared in public view, probably at the beginning of 18 years, live the answer at the time the fire, It was the first time that live streaming applications were popularized by the whole people, and then came the live streaming e-commerce launched by short video websites in 19 years. Looking back at the development of the live streaming industry, live streaming applications have blossomed everywhere in various fields. Do you know the technical architecture behind it?

Two years ago, I participated in the architecture design of a live answering system that supports one million users online at the same time and 200,000 concurrent sessions. The following will take the application scenario of “live answering” as an example to show you the technical architecture behind the popular live answering applications under the current epidemic. The content is divided into the following four parts:

1. Introduction to product functions

2. Technical challenges

3. Technical selection and basis

4. Architectural design scheme


01 Product Functions Overview

Livestreaming became popular at that time because its gameplay was simple enough and the cost of user education was almost zero. In simple terms, it is: the national online PK, 10 seconds to answer questions, timeout or wrong answer that exit the game, successful answer to all questions can be equal bonus, commonly known as the online version of the happy dictionary.

Before we get to know the system design and architecture, what are the core functions of live answering apps? Here are some screenshots of the APP:

1. Questions will be answered in the form of activities. Each activity will announce in advance the start time of live broadcast, the total prize money, red envelope rain and other information.

2. When the activity time is up, the user can enter the live broadcast room and see the real-time video stream. There are professional hosts for word casting, and the field controller will cooperate with the hosts to synchronously issue questions and announce answers.

3. After the question is issued, the user has 10 seconds to answer the question. If the user fails to answer the question or exceeds the time limit, the user will exit the game.

4, In order to retain the wrong answer users, during the activity, there are many red envelopes rain, users click the screen has the probability to grab.

5. During the event, users can send a bullet screen, and at the same time, there will be a scrolling round broadcast of bullet screen, so as to create a lively live atmosphere.

6. Other functions: invite new people, list of activities, cash withdrawal of bonus, etc.


1. Basic functions of live broadcast: Functional requirements such as lian-Mai interactive live broadcast (supporting multi-bit rate, multi-protocol and multi-anchor with the same frame), beauty effects, bullet screen, IM chat, likes and screen sharing, as well as non-functional requirements such as anti-theft chain and identification related to pornography and politics.

2. Personalized functions of the application itself: such as sending questions, answering and announcing answers in the answering scene, commodity display and one-click purchase in the scene of e-shopping mall, and gift reward in the scene of online celebrity live broadcast.


Technical challenges 02

At that time, we faced the following technical challenges when making live answering apps:

1. Audio and video processing and transmission: involving audio and video coding, real-time beauty, video streaming, CDN accelerated distribution, terminal adaptation and playback, traffic statistics and many other technical points, and our technical team at that time did not specialize in audio and video experts.

2. High concurrent requests: it is expected that 100W users will be online at the same time, and 20W users will answer the questions at the same time within one second according to the user level of competing products such as Chongding Conference; Within 1 second, a single user can touch the screen to initiate 4-5 requests to grab red packets, and the maximum concurrency can reach 500W QPS.

3. High bandwidth pressure: According to the standard of STANDARD Definition video, the bit stream of watching live broadcast is at least 1Mbps. If 100W users are online, the outlet bandwidth of optical video stream can reach 976.56g BPS. One bullet screen can reach 130 bytes, and 20 bullet screens need to be rolled in one second. If it needs to be pushed to 100W users at the same time, the export bandwidth of bullet screen will also reach 19.37g BPS.

4, high computing pressure: 1 questions of right and wrong judgment comes to answer than the use of props, resurrection, judge whether the user has set a new record, but also involves a series of anti cheat strategy (such as the subject in front of the answer wrong will not be able to continue to answer), population in the host announced the answer instantly, how to quickly finish 100 w users calculate the answer?

5, the correctness and security of capital flow: a single user to grab up to 3 red envelopes how not to get more? How to ensure the financial answer bonus, red envelope reward does not appear 1 cent error?

6. Low latency requirements: How to integrate video stream and business data stream in the live broadcast scenario, so as to synchronize voice, anchor screen and topic, so as to ensure user experience?

7. Low interference to the trading business of the mall: As an operation activity of the mall, the core goal is to guide the flow. It relies on the original user system, operation system, big data system and withdrawal channel of the mall, etc. How to achieve low interference to the existing trading system of the mall?

It can be seen that there are many technical challenges in this kind of pan-entertainment live broadcast scene, which depends on the user level and business process of the live broadcast application. Even in the low-frequency transaction transformation scenario of livestreaming e-commerce, if Li Jiaqi is in charge of goods, she will also face the challenge of instantaneous buying with high concurrency. Therefore, the pressure at the peak of business must be taken into account in the architecture design, and each link of the system must be dynamically scalable.


03 Technology selection and basis

I. Selection of audio and video processing and transmission scheme

As the technical team was not capable of audio and video processing and transmission, we investigated the payment solutions of various cloud manufacturers and finally adopted the live broadcast solution of Tencent Cloud. Host side: With professional camera equipment in the studio and OBS streaming software provided by Tencent Cloud, you can record and stream videos. User side: The APP side integrates the SDK of Tencent Cloud, and you can watch the live broadcast after dynamically getting the push stream address.

2. Business data flow scheme selection

Business data refers to data (such as questions, answers, bullets, red envelopes, etc.) related to application scenarios except audio and video. Tencent Cloud provides two options:

1. The topic is set in advance and directly delivered by the SDK of Tencent Cloud through the audio and video channel to the live stream.

2. Let the topic quickly reach the audience APP through the IM channel of Tencent Cloud, and then cache it on the audience APP and display the topic after the player notifies the expected NTP time stamp.

Tencent cloud of these two kinds of schemes can provide “sound – painting -” perfect synchronization, but have the following drawbacks: the answer to the user must be summary in the HTTP request way to answer the questions on the server, this a must own development, also announced the answer, grab a red envelope, barrage these business tencent is not supported by the cloud system, at the bottom, still need to develop communication channel.

Considering the limitations mentioned above and the variability of the business, we will eventually develop our own business data flow channel. In this case, the video stream and service data stream are delivered in two channels. The data volume of the service stream is small compared with that of the video stream. As long as the processing speed of the service logic and the downlink speed of the service data are guaranteed, the delay of “voice – picture – topic” is acceptable. After all, it was the 4G era, and if the speed on the user side was not good, the video stream might not be able to watch properly.

In order to achieve the independent transmission of business data flow, it is necessary to achieve a long connection, high-performance gateway server (support 100W users online at the same time, 20W concurrent questions, real-time push of barrage and other requirements), our technical selection is: Netty, ProtoBuf, WebSocket, selection reasons:

1. Netty: Netty was the most popular high-performance and asynchronous NIO framework at that time. In the business scenarios of live answering, many push scenarios such as question delivery, bullet screen and red envelope rain were involved.

ProtoBuf: as a data exchange format for client and server, PB is a binary data transmission format with excellent efficiency and compatibility. It is obviously superior to MAINSTREAM formats such as JSON, XML and Hessian in code stream and serialization speed, and supports backward compatibility and various mainstream languages.

3, WebSocket: HTML5 is a new protocol, used to achieve long connection communication between the client and the server.

3. Server deployment scheme selection

As mentioned above, live answering is only an operation activity of the mall, relying on the mall’s original user system, operation system, big data system, cash withdrawal channel, etc. The existing mall system is deployed in our self-built machine room. In order to reduce the low interference to the existing trading system of the mall, we adopt the mixed deployment scheme of “private cloud + public cloud”. The high-concurrency answer system and its dependent cache, MQ and other public components are deployed on the public cloud to facilitate flexible expansion and reduce the impact of traffic on the mall trading system.


04 Architecture Design Scheme

I. Audio and video live broadcasting architecture

The above is the architecture diagram of Tencent’s cloud live broadcasting solution. Other cloud vendors’ live broadcasting solutions have similar technical architecture. Interested students can directly go to Tencent cloud official website for details, not to expand here.



2. Data flow scheme

The audio and video stream adopts the live broadcast solution of Tencent Cloud, while the business data stream (activities, questions, answers, danmaku, red envelopes, etc.) adopts the long-link solution developed by ourselves. The answer system and answer operation background in the architecture chart are also developed by ourselves. The client processes the data of the two channels separately to control changes in user interaction.



Communication architecture based on TCP long connection

The above communication architecture is used for the transmission of business data flow, and the process is as follows:

1. The client communicates with the server using Websocket. The connection is established when the user enters the answering room and disconnected when the user exits the room.

2. Nginx does load balancing for Websockets.

3. TCP gateway is realized based on NetTY and is used to maintain long connections and forward business requests. It is not responsible for specific business logic. The heartbeat mechanism between the client and gateway ensures connection validity and detects zombie connections.

4. Message push (such as bullet screen, issue questions, publish answers and many other scenarios) is notified to the TCP gateway by the underlying business (answer system) through MQ, and then pushed to the client by the TCP gateway.



Definition of data transmission format in long connection communication

Another important aspect of communication architecture is the definition of data transmission format. Protobuf format is used between client and TCP gateway, and between TCP gateway and answering system. Let’s break down some of the key format definitions.

4.1 Client Request message format

1. Message type: -1 indicates heartbeat message, 0 indicates user authentication message, and >0 indicates service message

2. User ID: The unique ID of the user that has been obtained in the previous login process

3. Token: The authorization Token obtained by the user in the previous login process after the user logs in to the APP

BizData: real service data, protobuf serialized byte array, more specific format defined by lower-level services



4.2 Client Response Message Format

1. Message type identification: Consistent with the message type in the request

2. Status code: 0 indicates that the processing fails, and 1 indicates that the processing succeeds

3, BizData: For real service data, if the status code is 0, the field is a 4-byte exception code (100 indicates that token authentication failed, 101 indicates that request to the service layer failed). If the status code is 1, the field is a serialized byte array of the protobuf returned by the service layer. The lower-layer services also define a more specific format



4.3 ProtoBuf format definition of service data

message Message { MessageType messageType = 1; // Message type, enumeration value string sequence = 2; // Request Request = 3; Response Response = 4; // Notification Notification = 5; // Push messages}Copy the code

Here, the format is cleverly designed. The three types of messages, Request, Response and Notification, are wrapped by the top-level Message of Message, and an enumeration value of MessageType is defined to represent the Message type, and the sequence field represents the unique sequence number of the Message. It is necessary to ensure that the sender is unique, so that the sender can conduct matching processing after receiving the Response. With the unified message format above, the answering system can define different message handlers for different Messagetypes, and after the mapping relationship is configured, the message routing can be realized.



5. Overall system architecture

1. Gateway layer: both TCP gateway and HTTP gateway are used in the architecture. The TCP gateway has been explained in the previous chapter. The HTTP gateway provides Restful interfaces for the APP to request low-frequency data, such as the event home page, ranking list, personal bonus details, newcomer invitation, and sharing.

2, answer system: Dubbo implementation, the most core business service, at the same time for C end and B end system to provide RPC interface, including activity management, question bank management, live broadcast room management, as well as high concurrency interface for C end (such as join live broadcast room, answer, grab red envelopes, etc.). This cluster uses a number of common approaches to high concurrency design, such as horizontal scaling of services, multi-level caching, asynchronization, and flow limiting, as described in the following sections. In addition, MQ’s transaction message + reconciliation mechanism ensures the consistency of money flows between the answering system and the balance system.

3. Answer operation system: the backstage system is used by the operation personnel and the field control personnel during the live broadcast, which can manage activities and questions, as well as various business operations during the live broadcast (such as delivering questions, announcing answers and handing out red envelopes with the host’s oral broadcast).



Deployment architecture

The number of online users and concurrency of live answering questions far exceeds that of the trading system of the mall. In order to reduce the impact on the main trading process, the above mixed deployment scheme of “private cloud + public cloud” is adopted. The self-built machine room and the cloud machine room are connected through the network special line. The application server, storage server and network bandwidth related to live answering are all in the cloud, which can be rapidly expanded according to traffic monitoring. With the adjustment of operation plan, servers can also be added or reduced at any time, which has high flexibility.



7. High concurrency design scheme of answer system

7.1 High concurrency design scheme of answer interface

The concurrency of the answer interface is expected to be 20W QPS. Before saying the design scheme, let’s briefly list the judgment logic of the answer interface:

1. You need to judge whether the activity is still in progress

2. You need to check whether the user is out correctly

3. You need to determine whether the question entered by the user is currently being answered

4. It is necessary to judge whether the user’s last correct answer is continuous with the current one

5. Determine whether the user has timed out

6. Determine whether the user’s answer is the right one

7, if the answer is wrong, you need to determine whether the user has a resurrection card

8, if there is a resurrection card, need to determine whether the previous question has used the resurrection card

In addition to the above judgment logic, there are also some writing operations, such as updating the number of choices for each answer option, updating the number of correct answers or outgoing answers, updating the use record of the resurrection card, etc. Generally speaking, answering questions is an interface that reads more and writes less. In order to cope with high concurrency, we adopt the following technical solutions:

1. All logic of answer interface only operates cache, not database. The Write Behind Caching update mode is adopted (that is, only the cache is updated, and then the database is asynchronously updated in batch after the answer is complete).

2. Multi-level cache: local cache +Redis cache. At the same time, all static data such as activity configuration information, questions and answers will be preheated to the local cache before the answering activity starts.

3. The deployment architecture of Redis with one master, multiple slave and Sentry ensures high concurrency and high availability. At the same time, several sets of Redis instances (users, bullet screen, live reply and bonus list) are configured for further distribution by business modules.

4. Adjust the order of the judgment logic. The first four of the eight judgment logic mentioned above belong to security verification, because the client has been intercepted through interaction. Therefore, the logic of the fifth, sixth and seventh steps was preloaded, and the first four steps were verified after passing, which greatly reduced the amount of calculation.



7.2 Design scheme of pushing answer results

How to push the answer results to hundreds of thousands of users when the host broadcast the answer? The answer results of users have many states: correct and wrong answers, using and not using resurrection cards, and outgoing answers. The interaction of APP terminal is different in different states, and it is totally a kind of stateful push depending on the server. In order to solve this problem of concurrent computation, we adopted a clever design scheme that nicely converted stateful push into stateless push:

1. The calculation of the answer result will be completed synchronically before the user submits the answer (the scheme in the previous section), which is equivalent to distributing the instantaneous calculation pressure into the 10-second answer time.

2, the user’s answer results are also pushed to the user immediately after the completion of the first step calculation, without waiting for the host broadcast to announce the answer moment, otherwise it still involves concurrent reading of the answer results and stateless push, the instantaneous pressure on the storage server and bandwidth still exists.

3, but if the answer results are pushed to the client in advance, there will be security problems. Hackers can know whether the answer is right in advance, and they can use batch accounts to cheat. To solve this problem, we use XOR to encrypt the answer results symmetrically, while delaying the delivery of the secret key until the moment the answer is published, and the secret key for each question is randomly generated. This is a good solution to the security problem.

4. At the moment of announcing the answer, we only need to push a very small packet to the client, and this packet is the same for all users.



7.3 High-concurrency design scheme of the Red Envelope Snatching interface

Therefore, the concurrency of the red envelope snatching interface is very high. Within 1 second, a single user can touch the screen to initiate 4-5 requests for red envelope snatching. According to millions of users online, the maximum concurrency can reach millions of QPS.

1. The amount of red packets is calculated in advance and loaded into REDis: when creating an activity, the operation can configure the total amount of red packets and the total number of red packets. The system will calculate the amount of each red packet in advance and store it in Redis when creating an activity.

2, client current limit: in interface design, we have embedded a current limiting factor parameters, can be controlled by the server-side dynamic client current-limiting proportion, notify the client in the grab a red envelope in the interface, we according to the current number of online and red envelopes dynamic calculate the total number of current limiting factor, can control up to 10 w QPS request to the server.

3. Server-side traffic limiting: The number of tokens generated per second is calculated in advance according to the number of servers and the total number of red packets, and the second traffic limiting is carried out based on Guava’s RateLimiter.

4. The first three steps basically reduce the concurrency, and then ensure atomicity through Lua script. The result of grabbing red packets is also asynchronously refreshed to the database after the activity is over.



7.4 Other optimization points with high concurrency

1. Objects cached by Redis should be simplified as much as possible (fields not needed should not be stored), the length of key should be as short as possible (IO is the bottleneck under high concurrency), and be good at using Pipeline to assemble multiple commands (but the number of commands should not be too many).

2, various connection pool and JVM parameters adjustment: redis connection pool, Dubbo thread pool, JVM memory size, can be found in the pressure test environment reasonable values.

3. The answer system can be scaled out horizontally, and ToB and ToC interfaces can be isolated and deployed through dubbo grouping configuration to avoid mutual influence.






Finally, a brief summary of live broadcast architecture:

1, audio and video coding and transmission, these basic live broadcasting functions, unless the company is rich and powerful, it is suggested to directly use Tencent Cloud or other cloud manufacturers’ solutions (Douyu, Mogujie and other well-known live broadcasting applications are still using Tencent Cloud).

2. The architecture design focuses on the application itself, and the communication architecture (long connection or short connection, or a mixture of the two) should be determined first according to the user level and business characteristics of the live broadcast application.

3. Plan according to the business peak. If it is a high concurrency scenario, consider high concurrency as a systemic problem: from client to server, from network bandwidth to deployment architecture, and even product design, all dimensions should be considered, and all kinds of details should be taken into consideration.





About the author: Programmer, master 985, former Amazon Java engineer, now 58 turn technical director. Continue to share technical and management oriented articles. If you are interested, please scan the following TWO-DIMENSIONAL code on wechat to follow my official account: “Career Advancement of IT People”.