This sharing introduces the container technology practical experience of Qiniu data processing team, and shares how Qiniu creates a data processing platform with easy expansion, easy deployment, high freedom, high availability and high performance through the independently developed container scheduling framework.

I. Data processing business scenarios

Firstly, I would like to introduce the background of Qiniu’s data processing business. Qiuniuyun currently has more than 500,000 corporate customers on its platform, with more than 200 billion images and more than 1 billion hours of video accumulated. After storing these pictures and videos on Qiniu, users will have some data processing requirements, such as zooming, cropping, watermarking and so on. These files continue to be online and have a variety of data. If users process these files on their own substrates and then upload them to Seven Cows, it is very uneconomical. And seven cattle first is based on the data processing function of storage convenience to users to do data processing, the data processing is usually placed on the client or server to operation of the enterprise, docking seven NiuYun stored data processing interface, can be carried out on the picture and audio rich real-time transcoding functions, code to generate new specification files in seven cows provide caching layer for App calls, No need to occupy storage space, not only greatly reduce costs for enterprises, but also improve development efficiency. The following figure is an example of data processing for picture cropping:

Review images

Qiniu’s File processing program is referred to as File Operation FOP (FOP). Different File processing operations use different FOP. Users only need to upload an original file can get various styles of files by using the data processing function of Seven Cattle. The following figure shows the flowchart of file uploading, storage, processing and distribution:

Review images



2. Challenges of massive data processing platforms

The massive data of Qiniu cloud contributes to Dora’s powerful data processing capability. At present, Qiniu’s data processing service has processed nearly ten billion times a day. In the face of such massive data processing requests, the original data processing platform also faces new challenges:

  1. At present, the system has nearly ten billion data processing requests every day, with nearly one thousand computing clusters, and the whole stock and increment are very large. And the vast majority of the machines in the data processing cluster are used to run pictures, audio and video transcoding, these are CPU intensive computing, which means that the background needs many machines, and the more CPU cores the better. At the end of the year, the data processing platform may be several times larger than the current computing cluster of nearly a thousand units, requiring rapid physical expansion and efficient intelligent management.

  2. Server load is unbalanced and resource utilization is low. Real-time online processing takes a short time but requires a large amount of services. Therefore, a large number of instances are required to cope with high concurrency. Asynchronous processing takes a long time and requires sufficient resources to be allocated. When the real-time services are not busy but the services increase asynchronously, the resources allocated to the real-time services cannot be used. This static resource allocation mechanism causes incorrect allocation, resulting in unbalanced server loads and low resource utilization.

  3. Unexpected traffic cannot be measured and a large number of redundant resources cannot accurately predict the number of requests for new users. The original mode is to rapidly expand the capacity of the machine and verify that the machine is online, which requires a certain amount of processing time. For such unplanned requests, a large number of redundant resources should be prepared to cope with unexpected traffic.

  4. Cluster load is overweight, cannot be automatically extended as needed Individual users exploding load stress lead to cluster data processing requests, CPU processing slow, slow request, the request task accumulation, affect other business, not on the basis of existing resources to rapidly expand, can according to the actual business pressure for on-demand extension cluster instances automatically.

  5. User-defined applications (UFOP) Unknown Quality and scale In addition to providing official data processing services, Qiuniu supports customers to deploy custom data processing modules to the nearest computing environment of Qiuniu cloud storage, avoiding the performance and traffic costs of remote data reading and writing, and meeting users’ multi-dimensional data processing requirements. However, if various UFOPs run on the same platform, there may be some quality problems of UFOP or insufficient resources allocated due to large requests, which may affect the normal operation of other services on the platform.

Iii. Introduction of self-developed container scheduling system

In order to solve the above problems, Qiniu independently developed a set of Container scheduling framework (DoraFramework) based on resource management system Mesos, and created Dora, a data processing platform with easy expansion, easy deployment and high degree of freedom, through container technology. The overall architecture is shown as follows:

Review images

Component introduction:

Mesos: The basic OPERATING system of the Mesos data center, consisting of ZooKeeper, Mesos Master, and Mesos Agent, centrally manages all physical machines in the equipment room and performs resource scheduling. It is the most basic operating environment for the layer 2 scheduling system.

DoraFramework: a service layer scheduling framework that uses Mesos to manage all physical machine resources and schedule and manage service processes through DoraFramework.

Consul: An open source cluster management system that includes Service discovery, health check, and KV storage capabilities. The DoraFramework scheduling system uses Consul’s service discovery and health check mechanisms to provide basic service discovery capabilities and KV storage capabilities to store DoraFramework metadata.

Prometheus: An open source monitoring system for machine-level, container-level, and business-system monitoring.

Pandora: Internal log control and management system of Seven Cows, which is responsible for aggregation and processing of all logs in the production environment.

In this architecture, we chose to implement flexible real-time scheduling across machines through container technology. The scheduling framework can dynamically schedule the number of containers according to the specific service load, which effectively solves the problem of low resource utilization caused by static configuration. The second start feature of the container also solves the problem that the service can be started quickly when a large number of sudden requests enter. In terms of network, as UFOP is a service deployed and run by users, it is not known whether the user has opened other ports for use, so it uses Bridge mode, and all ports that need to be used externally need to be exposed through NAT. In this way, the ports used internally in the service will not affect the external environment. Very good security isolation of the platform environment.

The scheduling system of the data processing platform is Mesos’s own container scheduling framework (DoraFramework). Mesos is chosen as a resource management system because Mesos is more mature than other container scheduling systems. Kubernetes released the version that can run in production environment in 2015, and Docker Swarm released in 2016. At the time of research, the production practices of these two products had little large-scale production practice experience, while Mesos has a history of seven or eight years. In addition, resource management has been practiced in large companies such as Apple and Twitter, showing good stability. The second reason is that Mesos supports dispatching thousands of nodes. Considering that Qiniu has reached the scale of nearly a thousand physical machines at present and is growing significantly every year, Meoso’s resource management framework supporting super-large scale scheduling is more suitable for the business development of Qiniu. Third, due to the simplicity, openness and scalability of Mesos, Mesos is an open source distributed elastic resource management system. The entire Mesos system adopts a two-tier scheduling framework. The first layer collects resource information of the entire data center and allocates resources to the framework. The second layer assigns resources to its own internal tasks by the framework’s own scheduler. Mesos itself manages only the resource layer, and with this simplicity comes stability. The container scheduling framework can use open source frameworks such as Marathon/ Chronos or be developed in-house. Kubernetes although the function is very rich, but also more complex, components and concepts are more, and lack of openness and scalability, can only use it to provide the scheduling function, and can not be customized according to their own business scheduling framework, will cause too dependent on Kubernetes.

There are three reasons why Marathon, the core framework of Mesos, is not chosen for self-development. 1. Some aspects of Marathon do not support the posture we expect, such as the poor seamless service discovery. 2. Marathon is developed in Scala, so it is not easy to troubleshoot any problems, nor is it convenient for us to do secondary development; 3. If Marathon is selected, we still need to make another layer of packaging for Marathon to serve as Dora scheduling service, so that there will be more modules and deployment operation and maintenance will be complicated.

DoraFramework is a container scheduling framework developed by Seven Cows using go language. DoraFramework, a core component of the Dora scheduling system, implements the scheduling of service processes in Mesos and Consul. It interacts with Mesos and Consul components and provides apis externally. The architecture diagram is as follows:

Review images

DoraFramework main functions:

  • Automate application deployment

  • Service registration and discovery

  • Number of elastic scheduling containers

  • Load balancing

  • Supports adding or subtracting instances on a given machine

  • High availability support

  • Version and upgrade management of applications

  • Obtain instance status and log data

  • Supports service-level monitoring

  • Supports instance troubleshooting

DoraFramework vs. Marathon scheduling architecture:

  1. Consul implements service registration and discovery for the DoraFramework scheduling system. Consul is used to discover and configure services in distributed systems, discover internal and external services across DCS, and provide DNS interfaces externally. Marathon-lb does not support service discovery across data centers.

  2. Marathon discovers services through the servicePort or VHOST of marathon-LB node. The network mode must be Bridge. Because Marathon-LB is also responsible for load balancing, in large business environments, if Marathon-LB is abnormal, it can affect the correct service discovery of the framework.

  3. The Dora scheduling system allows for more precise flexible scheduling. Because it supports not only resource usage level monitoring, but also business level monitoring, instances can be scheduled according to the actual business pressure.

  4. The load balancing component within the Dora scheduling system distributes load based on the service load of each instance by obtaining the addresses of all available instances from Consul. Marathon-lb does not have monitoring data for the business layer.

  5. Consul provides system-level and application-level health Check. You can define a health Check using a configuration file or HTTP API, and supports TCP, HTTP, Script, Docker, and Timeto Live (TTL) Check. Marathon’s default Health Checks only check the status of tasks in Mesos. Tasks that are running are considered to be in the Health state, so application-level Health Checks cannot be done. Marathon can check the application health status through the REST API, but only supports TCP, HTTP, and Command.

  6. The monitoring stack provided by the Dora scheduling system collects and collects service running indicators, such as request times and request delay, during the running of service processes. Service processes expose a standard HTTP monitoring interface, and the data output of the monitoring interface complies with Prometheus monitoring data format. After configuring Consul as the service discovery address, Prometheus obtains the list of service processes whose monitoring data needs to be collected from Consul and obtains monitoring data from the HTTP monitoring interface exposed by the service process.

Consul is used as a registry for service registration and discovery. Consul provides key/value storage, enables service discovery using DNS interfaces, health check functions, and cross-data center service discovery. The API Gateway can use the DNS interface provided by Consul to query the list of all available instances of the service and forward the request.

Review images

  1. Automatic Registration and revoking of services When a new microservice instance is added, wait for the instance to be in the active Consul Client state, register its access address with Consul Client Service Registration, and configure a health check for the Service. Data is then synchronized to Consul Server’s service registry. When reducing an instance, the principle is to remove the instance from Consul Server’s service registry, wait for the cooling down time, and then destroy the instance through the dispatch system. This completes automatic registration and cancellation of the service.

  2. Service discovery If an external system wants to access a service, it can query the access addresses of all health instances registered on Consul Server from the DNS interface provided by Consul Server using the service name, and then send the request to the instance.

Iv. Mass data processing platform practice

The configuration management of our production environment adopts Ansible. Ansible uses SSH for remote connection by default, without the need to install additional software on the managed nodes. IT can configure, deploy and run commands in batches, which is very suitable for the large-scale IT environment of Qiniu. Playbooks is the foundation of a simple configuration management system and multi-machine deployment system that is easy to use and readable, ideal for complex applications. Using Ansible, we can install and delete data processing platforms, add and delete nodes, upgrade and roll back component versions, and modify production environment configurations in batches, simplifying complex operation and maintenance configuration management.

Review images

In practice, select one host as the central controller, install Ansible, configure SSH trust between the central controller and all remote hosts, and configure Playbook files on the central controller to perform batch operations on multiple hosts. For simple operations, run the following command:

$ansible-playbook main.yml -i hosts

Edit all required operations in main.yml and write all required host IP addresses in the hosts file to perform batch operations on all hosts in the hosts file. For complex operations, you can configure them by writing Playbook. For example, if the roles performed on the Mesos Master and Mesos Agent are different, the roles can be stored in different roles for Mesos, Zookeeper, and Consul. The Handlers are the tasks that are triggered in the Tasks. Template is a template file. For example, if you need the default configuration file for Personality Consul, you can save the modified configuration file in this directory and replace the default configuration file with this file during execution.

Review images

In terms of monitoring, the data processing platform has a complete monitoring system, including host monitoring, container monitoring, service monitoring, traffic monitoring, log monitoring. Host and container monitoring is mainly through Prometheus’ various exporters, collecting real-time CPU, memory, network, and disk usage, and service monitoring and traffic monitoring are through Qiniu’s own monitoring program. You can monitor the status, activity, number of handles, number of requests for all processing commands, number of failures, etc. Log monitoring is performed through the Pandora system, the internal log platform of Seven Cows, including collecting system logs, container logs and service process logs. By modifying the output of Filebeat, the open source file collector, all the collected logs are sent to Pandora, the internal log monitoring system of Qiniu, for log monitoring.

Review images

The monitoring data is as follows:

Review images

The above is the case of Qiniu Cloud data processing platform based on container technology practice. At present, Qiniu’s data processing platform has zero operation and maintenance, high availability and high performance data processing service capability, which enables users to easily deal with real-time and asynchronous processing scenarios of pictures, audio and video and other kinds of data. Qiniu’s data processing business system can not only process the data processing requests from Qiniu cloud storage, but also support the data processing requests from non-Qiniu cloud storage. It can also directly process the data processing requests from Qiniu Cloud distribution Fusion to improve the processing speed of CDN intermediate source data. Dora and data processing platform is an open platform, not only seven cows can run their own data processing services, also support run user-defined data processing services, and has rich operational management functions, users can from multifarious operations management and architectural design, so as to concentrate on the data processing unit. The business support capacity of Qiniu data processing platform is as follows:

Review images


Q&A

Q: What is the management system based on? Will the system be open source?

A: Dora’s scheduling framework is based on the basic GO language. Currently not open source, but available for private deployment.

How do I call a custom executor from a custom Scheduler?

A: Schesuler and Executor are based on the latest HTTP API from Mesos v1. There are no incompatibables, but the SDK for the Mesos Go version is A bit old and slow to update. We made some changes as needed.

Q: What is the current size of Consul cluster? Have you considered the performance bottlenecks of Consul extension?

A: Consul has an Agent on each slave node. We have more than 200 machines dedicated to data processing in one machine room, so the scale of Consul cluster is only as large as that of A single room. There is no bottleneck for us at the moment, because our use of Consul is relatively simple: Metadata is a reliable storage of Metadata, and Metadata updates are not frequent. We have referred to some performance tests done by others and some of our own tests, and the performance is satisfactory. Another feature is service discovery and instance health check. The Consul Agent running on each machine is responsible for the Consul health check, which is distributed evenly on each machine. In fact, the number of instances on a single machine is not very large, so there is not too much pressure on this part. Of course, this is also related to the scale of the business. Assuming that someday Consul’s scalability becomes a problem for us, it also means that our business volume is very, very large, and we are looking forward to this day.

Q: Does Dora support automatic MySQL scaling?

A: Dora is still used to run stateless services such as data processing commands. Dora is not the best system to run on. If you want MySQL to run on Mesos, you need to implement a scheduler specifically for MySQL, because the size of MySQL instances and the repair of instance failures have their own specific requirements. The containerization of stateful services such as MySQL in our company is implemented by another container platform of our company. MySQL uses Percona XtraDB Cluster. We use another container platform API to write a Percona XtraDB Cluster scheduler. Automates most of the Percona XtraDB Cluster operations on the container platform.

Q: Are your Ansible host files dynamically generated? Is code pushed via Ansible? How are operations such as adding and deleting nodes and rolling back nodes implemented?

A: In practice, it is not dynamically generated in Consul. In fact, we can get simple configuration information about nodes and nodes in the current cluster from Consul. In addition, we can use Consul to dynamically generate host files for Ansible gray scale. Code push is also done using Ansible, or GitHub if you can connect to an external machine. Since our Playbook’s roles are defined by components, we only need to modify the Host file, add the corresponding node, install or remove the corresponding component. For example, rollback:

$ ansible-playbook rollback.yml -i hosts -e "hosts_env=XXX app_env=XXX version_env=XXX"

Parameter Description:

  • Hosts_env: indicates the host group to be rolled back, for example, Master

  • App_env: component to be rolled back, such as ZooKeeper

  • Xxx_version: indicates the component version to be rolled back, for example, V1.0.1.20160918

Q: What is Dora’s scheduling policy? Can you give us a brief introduction?

A: First, ensure that instances of the same data processing command are distributed as evenly as possible on different machines, and then ensure that the load is balanced on each machine.

Q: Prometheus is standalone at present, how to deal with large data volume? Does Prometheus monitor data exist InfluxDB?

A: At present, we split the server by business, and the data volume can support it. We are not using InfluxDB and are using the native LevelDB.

Q: Do you have any special storage technology for such a large amount of files? How to achieve a balance between high performance and mass storage?

A: Qiniu cloud storage is designed for the storage of massive small files, so the first change it makes to the file system is also de-relationship, that is, de-directory structure (directory means parent-child relationship). Therefore, sevenox cloud storage is not a file system, but a key-value storage, or object storage. Each of our large files is cut into small files and stored, and the meta information is stored separately in the database. When users request, they will be returned through the business layer after combined processing. Therefore, disk stores only small files theoretically, and the performance of large file storage and reading mainly lies in file cutting and merging.

Recommend a training

The Docker actual combat training | guangzhou station 】 training content involves the Docker, container storage, container network, monitoring, enterprise, micro service, old driver to take you to learn, click on the images below to view the specific training content.

Review images



Click the link to read the original article to register directly.