Overall Architecture

Weibo uses Ali Cloud to realize the technical system of doubling the scale of minute-level servers, including the experience in the combination of Docker and virtual machines, network architecture, and cross-IDC service deployment of load balancing and cache architecture.

DCP design “Beginner’s mind”

Figure 1 Initial idea of DCP design

DCP is a microblog containerized hybrid cloud elastic scheduling operation and maintenance platform. Its original intention is to achieve elastic capability at the lowest cost. The DCP system provides cluster management and scheduling between service pools. At present, businesses using DCP system cover the core business of Microblog, including microblog platform, Hongbao Fei, mobile microblog, etc.

The initial design objectives of DCP are as follows:

  1. It was estimated that at the peak of the Spring Festival Gala, 16,000 cores and 25600GB of computing resources were needed in 10 minutes.

  2. It can save costs. When designing, it hopes that the overall cost of 2016 Spring Festival Gala will not exceed that of 2015. Moreover, unit cost can be greatly reduced in the future through the on-demand payment mode of public cloud such as Ali Cloud.

  3. It can provide a standard technology platform to bridge the differences in different languages and operating environments, and provide standard flexibility to various business systems of Weibo.

DCP hybrid cloud architecture design

八九民运

八九民运

FIG. 2 Design principles of DCP hybrid cloud architecture

When designing such a system architecture, it is complicated to coordinate all links because different business parties and departments are involved. Therefore, there are several specific principles that must be followed when designing an architecture.

  1. When using the public cloud, not only the public cloud should be used, but also the combination of public cloud and private cloud should be used to maximize the pay-as-you-go feature of the public cloud to reduce unit costs. For example, in the Spring Festival Gala of 2016, Weibo cooperated with Ali Cloud to deploy the corresponding public cloud resources several hours before the arrival of the traffic peak. At the same time, services need to be deployed between public and private clouds without differentiation.

  2. As a service, system components communicate with each other through Restful apis instead of the original o&M intervention mode. The components are required to have good expansibility and pluggable implementation mechanism.

  3. Scalability: Each system component can automatically scale based on the scale of the management cluster. Each component has no status or single point. Operation and maintenance operation automation.

Figure 3 DCP architecture layers

The DCP architecture is divided into four layers. The first layer is immutable infrastructure. The appearance of Docker has changed the original operation and maintenance mode to a large extent. The practical experience in containerization, system initialization, image distribution and bandwidth monitoring will be introduced in detail below. The second layer mainly completes elastic scheduling, including inter-cloud service scheduling, scheduling mechanism establishment, and capacity evaluation. The system dynamically allocates infrastructure resources to each service node. The third layer is mainly about service discovery. After elastic service deployment, the caller needs to quickly discover nodes distributed in the service cluster and quickly realize the discovery mechanism by configuring the center, load balancing, and RPC coordination. The fourth layer mainly completes the container arrangement, including automatic capacity expansion and monitoring alarm.

Figure 4 Overall DCP architecture

The diagram above is the overall DCP architecture. The lowest layer of the architecture is the computing resources of the private cloud and Ali Cloud. Each system communicates with each other through API, and uses ali Cloud’s Open API to dynamically create the required computing nodes. The next layer is the infrastructure management system, which is responsible for abstraction and implementation of services such as virtual machine creation and image distribution, and provides the same computing environment as the private cloud. The next layer realizes the dynamic scheduling framework of business container by Swarm and Mesos. The top layer is the business system. On the far left is a service discovery framework that implements the service discovery mechanism by establishing a configuration center based on Consul cluster.

Next, the implementation of each layer is dissected in detail.

Immutable infrastructure

Weibo began to experiment with Docker technology in 2015 Spring Festival Gala, aiming to decouple its business from the operating environment of the host computer. Docker solves business requirements for versions and components of running software, encapsulates these dependencies into Docker Image, and unifies the interaction standards between private clouds, public clouds and different business lines.

FIG. 5 Containerization

The DCP provides a very standard base operating environment for the business layer. The bottom layer uses the image of CentOS7.1.1503 of ECS, on which is Docker Devicemapper-direct-lVM file continuity system, version selection is Docker version 1.6.2. The middle tier is an implementation of the scheduling framework using Mesos 0.25 and Swarm 1.0.0, as well as some container-level monitoring such as cAdvisor 0.7.1. Fix is contributed to the open source community. PHP and Java correspond to different back-end applications. The core Feed, user relationship and other basic services of Microblog are realized by Java, while the partial business system is realized by PHP. Docker systems corresponding to Java and PHP are consistent, but the versions used are different. Docker 1.3.2 needs to be compatible with some offline computing platforms. At present, the Java and PHP low-level running environment has been normalized.

Figure 6 Initialization

With the containerized indifference operating environment, the elastic capacity expansion of thousands of computing nodes on Ali Cloud at a minute level needs to be optimized. First of all, each Stage should be parallelized as much as possible. At present, it can realize the ability of initializing hundreds of units within 4 minutes. At the same time, the time-consuming operation is asynchronized by Ansible Callback mechanism. In addition, the Ansible automatic scaling function automatically scales based on initialization requirements, supporting tens of millions of initialization capabilities within a minute.

Figure 7. Mirror distribution

After the Docker environment and ECS computing resources are fully prepared, the next step is to quickly deploy the business image. Since the size of a single image is in GB level, hundreds of ECS are required to do this task if dynamic pull is adopted, which greatly increases the capacity expansion cost.

Weibo stratified the whole image, kept the basic image in the cloud service image environment, through local Load to realize the image loading, greatly reducing the bandwidth requirements of image distribution, but also has a certain increase in speed; Another operation is reverse caching. When large-scale elastic capacity expansion is performed on the public cloud, a cold start occurs. That is, no corresponding service image exists on the public cloud, and removing the image will occupy a large amount of private line bandwidth. In order to cope with this situation, reverse caching of the private Cloud Registry is routinely deployed on the public cloud, and internal changes, pushes and configurations of the private cloud Registry are synchronized to the reverse caching nodes of the public cloud. In addition, the distribution network can be automatically scaled. In the future, weibo is trying to use P2P distribution to deal with the scale of more than ten million.

FIG. 8 Hybrid cloud network architecture

In hybrid cloud, the core is the entire network architecture. When connecting public and private clouds, carefully consider the network environment and IP address planning. At present, Weibo uses aliyun’s VPC service to build a network environment consistent with public and private clouds. In addition, multi-dedicated lines are used to ensure high availability of the network. In terms of routing, routing forwarding rules are configured on the core switch and the proximity principle is adopted to minimize the number of hops and ensure low network latency. When a network fault occurs, backup routes are automatically enabled.

At the same time, based on the original bandwidth monitoring, cross-cloud dedicated line monitoring is implemented, which is refined to the IP level. According to the IP address of each ECS, the IP is automatically aggregated to the corresponding product line, and the overall bandwidth consumption is monitored to ensure that the actual single-bandwidth usage of each service product line is not affected.

Flexible scheduling

Once the immutable infrastructure is deployed, you have the capability to create large-scale ECS compute nodes on the public cloud. The next problem is how to properly schedule services to compute nodes.

Figure 9. Elastic scheduling of services across clouds

When cross-cloud services are deployed, ensure that the minimum service scale is deployed and the online services are provided on the public cloud through pre-payment. In addition, the call of cross-cloud private lines should be reduced as far as possible to keep the bandwidth under control. Loacl-like service back-end resources such as Memory Cache should be configured to reduce cross-private lines requests. When cross-line requests occur, some traffic compression protocols need to be enabled. At the same time, weibo internal through the WMB cache data bidirectional synchronization mechanism, based on the message distribution strategy, the private cloud internal cache read and write in the way of message synchronization to the public cloud cache, the delay is generally at the millisecond level, at the same time, in the case of abnormal line, the message will not be lost, so as to achieve high availability.

Figure 10 Service hybrid cloud deployment mode

This is a typical business hybrid cloud deployment on a spring night in 2016. Inside is a typical back-end Internet service technology station, through four layers of load, through Nginx to achieve seven layers of load, and then some WEB layer of computing and RPC services, the bottom is the cache layer of resources, databases. Due to the high request volume and data security requirements of the database, the DB layer is not placed on the public cloud for the time being. On the far right side of the architecture, SLB services are adopted, and RPC, WEB, and Nginx are deployed on ECS.

Figure 11. Elastic scheduling mechanism

Open source solutions, such as Swarm and Mesos, are used in the specific elastic scheduling framework. On the basis of them, the UNIFIED scheduling API is encapsulated to unify various application platforms scattered in the original proprietary cloud, which greatly saves the time cost in basic operation and maintenance.

Figure 12 capacity evaluation

Through online capacity evaluation, you can determine whether a service needs to be expanded on the public cloud. By selecting multiple service indicators to comprehensively evaluate whether a service has a traffic surge or performance bottleneck. For example, the average response time, QPS, and internal computing resource thread pool are collected, and the redundancy percentage of the total resource pool is calculated to determine whether dynamic capacity expansion is required.

Choreography and container discovery

After business is deployed to Aliyun, the mechanism of choreography and container discovery is needed to quickly connect business providers and callers.

Figure 13 business container Choreography

In the process of realizing DCP system, there may be other systems within Weibo, such as the original operation and maintenance system, release system, monitoring system and so on. These original systems need to get through, which is the core task of business orchestration. Another problem is the realization of automation. It may take hundreds of steps to expand a complete microblog back-end business, so the problem of inconsistent data can be avoided by automatic operation. There are also some service discovery mechanisms, which used to be expensive to manage thousands of elastic business nodes through the service discovery mechanism configured by the administrator.

Figure 14. Automatic capacity expansion and contraction

The picture above shows the specific process of automatic expansion of Weibo. First, when traffic is predicted to arrive, the DCP system is instructed to expand capacity. The DCP platform first applies for resources from the shared pool or public cloud, creates a resource quota module, initializes the resource, and enters the scheduling center. In the scheduling center, the service is started on the public cloud based on the dependency between services. For example, the cache service needs to be started first, then the RPC service, then the WEB front-end service, and finally the APPLICATION PHP service. After the service is started, service registration and service health check must be performed on Consul cluster.

Figure 15 Service discovery

For service discovery, the common service discovery mechanism is based on Nginx Reload local files. This service discovery implementation is expensive to manage and does not support concurrent registration, resulting in performance losses. At present, Weibo implements automatic discovery service based on Consul configuration center, which solves the performance problem of Reload.

Figure 16 automatic discovery of resources

For the dynamic discovery of resources, the original way is to put the configuration of back-end cache, Redis and other resources in the configuration framework, which will lead to the rapid expansion of the configuration file when aliyun RDC is added. Now, Weibo synchronizes configuration to Consul cluster, and dynamically resolves resource IP changes on Ali Cloud through domain names without changing business codes, which brings great convenience to the overall elastic scaling.