This paper discusses how to build a rapidly scalable microservice architecture from scratch based on specific project examples.

This paper will focus on the following themes to discuss the micro-service architecture design of live broadcast interaction:

  • How to design a stable message interaction system to support the interactive distribution of millions of messages?

  • In the case of more and more complex services, how to properly componentize the system to achieve independent deployment and upgrade in the way of micro-services.

  • In the system of hybrid cloud, how to make full use of Docker technology and elastic characteristics of public cloud to design a flexible architecture based on hybrid cloud, so as to realize the elastic expansion of service automatically under the condition of sudden increase of traffic and no one to watch.

Background and challenges of weibo live interactive platform

2017 is a big year for live streaming. Sina Weibo launched live streaming service in its App, and also got through the interaction with Taobao, Yizheng, Hongdou and other platforms.

Technically speaking, live streaming is generally divided into two parts:

  • Video streaming, including the transmission portion of streaming media.

  • Live interaction, including comments, likes, gifts and other parts.

The characteristics of live broadcast are: due to the large number of people in the room, the live broadcast platform needs to be able to support the scenario of millions of users online at the same time.

Features of Weibo live broadcasting platform

The basic model of live interactive system is a message forwarding system, that is, a user sends a message, other users receive the message and perform corresponding storage.

Message systems generally have three basic evaluation criteria: real-time, reliability, and consistency. As for the message system of live broadcast, its requirements for reliability and consistency are not high, but the requirements for real-time are very high.

If the user gives the host a gift 10 seconds or 1 minute later, the host can receive and reply. This is unacceptable. Therefore, we need to have “second” real-time.

Features of weibo live broadcast interaction:

  • The traffic is not very large at ordinary times, but when there are many visits, the traffic will produce a significant surge.

  • The surge in traffic is reflected in the huge volume of notifications. It can reach tens of thousands of messages per second.

  • For short connection requests, the user only sends a request once in order to enter the interface, and no further operations are usually performed. Therefore, there is not much pressure on the server side.

    On the other hand, live interactive scenes are different. After users join the room, they will receive a lot of push even if they do nothing. Such real-time push mode will cause constant pressure to the server.

Challenges that weibo live broadcasting platforms face

Due to the large demand of users for live broadcast and the variety of gameplay, we are faced with challenges from three aspects:

  • How quickly you iterate and respond to requirements is a measure of your business and your system.

  • How to deal with peak traffic caused by full Push.

  • How to achieve cost optimization, that is, how to use limited resources to deal with peaks and troughs.

The micro-service architecture of weibo live broadcast interactive platform has evolved

In view of the above three challenges, we have built a micro-service architecture for live broadcast interaction. The diagram above shows the architecture of live broadcast interaction. We have divided the business layer into modules, including discovery service, tripartite platform service, push service and other micro-services.

Selection of the architecture

In terms of architecture selection, let’s first look at the traditional singleton model. Singleton is suitable for many scenarios and is particularly suited for agile development.

Due to the huge flow of microblog interaction and the relatively complex system, the following problems will be faced:

  • The cost of launching is high and the iteration speed is slow. The singleton pattern is to bring all services together, deploy them together, and bring them up/down together. So for every minor change, we need to get back “online.”

    In order to eliminate possible service problems, we usually conduct full regression testing. So this kind of slow iteration speed is unacceptable to us.

  • Poor scalability. In the horizontal expansion of services, the number of connections between Redis and DB will gradually increase as the number of services increases, and each resource has a certain number of connection bottlenecks. When the number of services increases to a certain number, it will lead to further horizontal expansion of services.

  • Poor robustness. Any Bug in the system will lead to the breakdown of other services, or even the entire service.

The difference between microservices and monomers is that microservices will split some common functions, and on the basis of splitting, each module can maintain its own DB and resources.

Therefore, the benefits of microservices are as follows:

  • Each service was developed and deployed independently, with significantly faster iteration. We don’t have to test everything, just the services we develop.

  • A single service relies on fewer resources, and the capacity expansion speed increases significantly. Especially for some lightweight services, we can ensure that the capacity can be expanded and started in 30 to 40 seconds.

  • Remove the direct dependence of some push services on DB and Redis through RDC or related calls to solve the connection count problem.

  • The services are deployed independently, avoiding overall outages caused by unrelated service problems and limiting bugs to a single module.

Microservitization problem

The introduction of microservices also brings some new problems:

  • Previously, in singleton mode, we were mixing modules together without having to think about it in detail. Now that microservices have been adopted, how do you properly unbundle them?

  • Once just local calls, now microservices, how will they communicate? How do you think about network overhead?

  • How can microservices detect problems after service deployment?

Problem 1: Service split

For the split of micro services, we are most concerned about the problem is the strength of the split. We took a simple approach that you can use in your own projects:

  • If you don’t know how to split it, you can split it into core and non-core services. The benefit is that services are isolated accordingly.

    For example, in case of server shortage, we only protect core services, and non-core services do not affect core services. Therefore, it is a kind of separation according to the importance of the business.

  • Service governance, non-core services can be downgraded at any time. We use limited machine resources to support core services.

After completing the basic split, we can further split the core services based on business scenarios:

  • For Short Services, we have a private protocol resolution called Wesync. Since we don’t normally change its code, and the protocol’s parsing is universal, we split it out and deployed it separately.

  • We deployed all of our business related services, such as rooms and messages, in a Service called Biz Service. The amount of code in this section can be quite large, but the amount of code is not a measure of whether the service is split or not, and since it is not under much pressure to handle complex business, it can be grouped together.

  • By splitting Push Service, Push Service, it can maintain the long-term connection relationship with users, so as to ensure the real-time performance of message Push. In addition, the server “push” mode is more real-time than the client “pull” mode.

The reason we split the Push Service is that:

  • Because of the huge amount of messages pushed, Push Service becomes the bottleneck of the whole Service. Suppose that in a room of 100,000 people, if 10 people send a mass message to the client in a second, the system will generate 1 million pushes per second.

    Similarly, if there are 1 million people in the room, 10 million tweets will be generated. It can be seen that the magnitude of the message volume is very large, so Push Service is a Service that needs to be expanded frequently.

  • We want to design the Push Service to be as simple as possible and to be less resource-dependent

We simply split the non-core services from a business perspective as follows:

  • Barrage Service: Includes the ability to stream returns and access historical messages.

  • Third Service: Connects to third-party platforms. The reason for the split of the Barrage Service is that the Barrage Service is an exposed Service designed to batch fetch historical messages, so it inevitably consumes a lot of bandwidth to batch fetch historical messages.

Therefore, we need to adopt the following optimization methods:

  • We use THE compression technology of Gzip to compress the interface and the returned data, thus reducing the bandwidth consumption.

  • Take full advantage of CPU and other hardware capabilities.

  • By adding multiple layers to the cache, we can load directly from the local cache instead of going to the DB or message cache, thus reducing the interaction with the outside world.

As you can see in the figure above, before we did the microservitization unbundling, the services were so jumbled together that we had to deploy them together. Therefore, we split the services as described above according to various strategies.

Problem two: service communication

We also split the communication between services from a synchronous and asynchronous perspective:

  • Synchronization: REST API mode +RPC mode.

  • Asynchronous operations: Mainly around message queues.

For the communication interaction between core services and non-core services, we conducted a scenario analysis. As shown in the figure above, a Third Service has two aspects:

  • The blue line represents third-party servers pushing messages to our system. Because we hope that the messages from the third party can be presented to the front end of our App in a more real-time (synchronous) manner, we adopt RPC type Push mode.

  • The red line indicates that we generate the message from the Microblog App and then pass it to the third-party server for them to get the message in Pull mode. Since we don’t want other services, subsequent services, and non-core services to interfere with our core service functionality, we implement asynchronous decoupling, a Pull approach of type Queue.

For the Barrage Service, we share the database. Since batch retrieval of playback messages is a service with a large amount of message bandwidth, we share a database and load directly from the database to ensure the provision of system resources.

Problem 3: Service discovery

For the Service discovery of Push Service type, we abandoned the previous way of hanging in DNS and letting DNS do the Service discovery, but adopted the Dispatch Service written by ourselves.

How it works is:

  • To establish a long connection process before the user joins the room, we send a Dispatch to establish a connection with the Push Service. Once a message is generated, it is pushed through the Push Service.

  • The Dispatch Service can dynamically make the nearest selection based on the user’s server and region based on the policy.

  • This policy supports horizontal expansion of Push services. That is, after capacity expansion, we will no longer Push traffic to the old Push Service, nor return the corresponding IP address, but import all traffic to the new Push Service.

For RPC-type service discovery, we use a typical SOA architecture, including Registry to provide a list of available services, Server to provide services, and Client to discover and use services. Therefore, we used Motan RPC to quickly respond to various requirements and complete discovery.

Summary of microservitization

To sum up, microservices solve the following four problems:

  • Independently developed tests speed up the iteration.

  • Service separation reduces the impact of unrelated services.

  • The speed of service scaling is increased by reducing resource dependence.

  • Of course, it also increases the difficulty of service deployment, operation and maintenance.

Since the framework of rapid expansion is adopted, we need the participation and deployment of operation and maintenance students. Let’s discuss the elastic scaling strategy of live interaction.

Flexible expansion and contraction of weibo live interactive platform

Docker-based hybrid cloud architecture

Since the use of Weibo is a typical traffic surge scenario, the traditional solution of blindly purchasing servers will cause an imbalance in the overall load saturation.

For example, people generally do not “check weibo” during the day or at midnight, and the utilization of the server and network will be low. Explosive loads occur only during the evening rush hour.

Therefore, we adopted the strategy of rapid elastic expansion and shrinkage, that is, we made use of all kinds of rapid elastic expansion resource services of the public cloud.

However, since the private cloud environment was under our own control before, the introduction of public clouds now brings environmental differentiation issues. Therefore, we refer to the general scheme in the industry and adopt Docker.

Docker network model selection generally includes Bridge, Container, Host and other implementation modes:

  • We initially used Bridge mode, Docker’s default setting, in our test environment.

  • Docker will have a Docker0 virtual Ethernet bridge when started with the Daemon.

  • Veth pair technology is used to virtualize Docker daemons, that is, virtual network card connection is established between the container and the host, and corresponding message forwarding is carried out outside the container.

  • Problem: Since the container uses a virtual IP address, we register the RPC service with that virtual IP. However, after startup, the Client cannot access the virtual IP address.

Therefore, we adopt the Host mode, that is:

  • The same eth0 in Host mode can be shared, so parties can share the Host’s network namespace.

  • It is also faster than Bridge mode because it saves the overhead of various routes.

It can be seen that Docker has the advantage of being simple and light, which is very suitable for the application scenarios of weibo. In addition, together with some resources of the public cloud, a docker-based hybrid cloud architecture is formed, which we call DockerDCP.

It is worth mentioning that DCP is already Open source. There is a service map of Open DCP on GitHub, you can search for it.

The purpose of the DCP is to be able to scale up and deploy 1,000 machines in 10 minutes to handle traffic surges such as “Big Three”.

As a result, it makes 600 billion API calls and trillions of RPC calls every day.

In order to realize automatic operation and maintenance deployment and capacity expansion related to live broadcast interaction and DCP, we will Push messages to SLB (load balancing) every time, and realize cross-network Service deployment through Push Service of Push Service.

To achieve capacity expansion, the first step is to know the source of equipment. The DCP helps us distinguish between machines on the Intranet and those on the extranet (public cloud).

For example, three service parties on the Intranet – A, B, and C – all have multiple servers of their own, and we set them as A “shared pool”.

Service C applies for three servers in the pool due to peak traffic and returns resources to the pool when services are idle. This gives us the freedom to quickly scale up both the private cloud and the shared cloud, known as “double cloud scaling”.

Automatic capacity expansion and shrinkage process

Our automatic capacity expansion and reduction process for live interaction can be roughly divided into:

  • Set monitoring indicators, that is, set thresholds for monitoring indicators, and then perform capacity expansion. It is usually obtained through stress testing.

  • Collect indicators, including how to collect indicators and which indicators to collect.

  • Data is transferred to the capacity decision system to determine whether to expand capacity.

  • A series of standard processes for service expansion are shown in the figure above.

The capacity reduction process is similar to the above capacity expansion process, which will not be described here.

For the monitoring indicators mentioned above, we can divide them into two categories:

  • Business performance metrics vary from business to business. Some API services can support 1000 Query Per Second (QPS), while others can support only 200. Therefore, the performance indicators collected and monitored vary with different services.

  • Machine performance indicators focus on universal machine performance indicators, including bandwidth, PPS, CPU, performance, and IOPS. Whenever you find a bottleneck, you should expand it as soon as possible.

Accordingly, the process for specifying monitoring indicators is as follows: perform corresponding stress tests on the performance system > discover service bottlenecks > analyze bottleneck points > develop monitoring indicators.

Now we are also trying to automate monitoring through machine learning. We stress test services on a weekly or regular basis to find service bottlenecks on a timely basis.

Since the introduction of new machines may lead to overall performance inconsistencies, and overall service bottlenecks may migrate to other locations as service requirements and code volume increase, we achieved real-time understanding of bottlenecks through evolutionary stress testing.

As far as the index monitoring of Push Service is concerned, the business performance monitored during the stress test includes:

  • The long connection number of user data. Because a single machine may support thousands of users for long connections.

  • The amount of messages pushed. In some business scenarios, the number of long links may not be large, but the volume of messages pushed may be large. So we need to monitor from different dimensions.

Performance indicators include bandwidth, PPS, CPU, memory, and IOPS.

In terms of the collection of monitoring indicators, we can divide it into two aspects:

  • Each service system collects service performance indicators.

  • The performance indicators of the machine are uniformly collected by the operation and maintenance monitoring service.

Summary of elastic expansion and contraction capacity

To sum up, we achieved the following three points in terms of elastic expansion and contraction:

  • Docker-based service solves the problem of environmental difference through container technology. Both achieve faster scaling capacity, and let the whole virtualization more standard.

  • The hybrid cloud architecture DCP solves the problem of elastic resource scaling.

  • After the architecture was built, the unguarded scene was broadcast live through automatic scaling.

Typical Case Sharing

Here is a comparison of two live cases before and after scaling architecture was implemented.

Without automatic expansion and shrinkage, we have done a live broadcast of the shenzhou spacecraft recovery. Since it happened after 3am and the parties were not notified, we did a full Push without knowing how much viewing traffic would be expected.

Through the post-mortem analysis of the next day, we found that the flow of the service had reached the bottleneck point, and there were many alarms. Fortunately, there were personnel on duty, so there was no failure.

From the perspective of maintenance team exhaustion and service assurance, we decided to implement automatic scaling.

As is shown in the figure above, this is a live broadcast after we have realized automatic expansion and contraction. Each trough in the blue line on the left represents an expansion operation.

When the trough is at the minimum long connection number, there is no traffic coming in at that moment because of the automatic expansion of a bunch of machines, and the connection number is basically zero.

The number of connections quickly rose and the service did four rapid automatic expansions within 30 minutes. The automated capacity expansion is transparent to the operation and maintenance personnel, except that there will be an email alert when the capacity expansion or reduction.

Author: Warmth

Editors: Chen Jun, Tao Jialong, Sun Shujuan

Wen Shi, Senior system r&d engineer of Sina Weibo, engaged in the research and development of weibo video and communication related systems. Currently, I am responsible for the research and development of weibo live news interactive system, advocating the design of microservice architecture with high availability, flexible expansion and low coupling. I am technically good at message communication and have rich practical experience in dealing with sudden increase of traffic and high concurrency.