Recommended reading: “the blockbuster | Dragonfly promotion become CNCF incubation project”

On April 10th, the Cloud Native Computing Foundation (CNCF) technology Supervision Committee voted to promote Dragonfly, an open source project from China, to the management project of CNCF incubation level, becoming the third Chinese project to enter the CNCF incubation stage after Harbor and TiKV.

CNCF, founded in July 2015, is one of the important open source organizations under the Linux Foundation. Centering on the four major features of microservices, DevOps, continuous delivery and containerization, CNCF is committed to maintaining and integrating cloud native open source technologies to support the choreography of containerized microservice architecture applications.

At present, CNCF has more than 300 member companies, including AWS, Azure, Google, Aliyun and other mainstream cloud computing manufacturers in the world. CNCF’s Technical Oversight Committee consists of 11 representatives with rich technical knowledge and industry background to provide technical leadership to the cloud native community.

Today, “cloud” has become the public infrastructure, cloud native technology is considered as the 2.0 standard of cloud computing technology, and CNCF is the vane leading the development of cloud native technology, has a pivotal position in the industry. So how does Dragonfly qualify as a CNCF incubator? What role does it play in the cloud’s native technology ecosystem? In order to deeply understand the characteristics of Dragonfly project, as well as the development status of cloud native technology in China, we invited the first Chinese member of the TECHNICAL Supervision Committee of CNCF (TOC), Ali Cloud senior technical expert Mr. Li Xiang, invited him to share the relevant situation of CNCF and Dragonfly.

About CNCF and TOC

CNCF is one of the most influential open source organizations today. As one of the only 11 TOC members in CNCF, we are very curious about Li Xiang’s daily work. According to Li, the CNCF foundation is essentially project-centric, with the goal of making CNCF absorb better projects, which in turn attract more final customer groups. After more customer groups use CNCF projects, manufacturers will combine these open source project groups into products or cloud services for customers to use with lower cost and higher efficiency, boosting the whole cloud ecosystem to form a healthy industrial closed loop.

Therefore, CNCF hopes to connect foundations, developers and manufacturers through projects. Therefore, the core goal of TOC is to collect the best and most suitable projects for the foundation’s cloud concept, and Li Xiang’s main job is to find the best projects, just like a “talent scout”.

Li Xiang also introduced the internal promotion mechanism of CNCF. Each project entering CNCF is divided into sandbox, incubation, and graduation phases.

The sandbox stage is a project in the early stages of development, and TOCs look for potential projects and advise them on how to get them to the sandbox stage. Unlike the Linux Foundation, which has been around for more than a decade, the CNCF Foundation still needs to define some things, like what does a sandbox project really mean? What is the process of getting to the sandbox stage? How do you go from the sandbox stage to incubation? What are its criteria? It is also TOC’s responsibility to define these, and Li Xiang himself has put a lot of effort into this.

Also, how to allocate CNCF’s limited resources to ensure that the foundation can operate as a project-centric operation, how to make CNCF not only innovative, but also to absorb a large number of projects while maintaining its advanced, neutral and cloud-native concept, these are also issues that TOC needs to care about.

What is Dragonfly?

According to official information, Dragonfly is a project to solve the image distribution problem of the distributed application choreography system based on Kubernetes. Dragonfly’s architecture solves four major problems: large-scale image download, long-distance transmission, bandwidth cost control, and secure transmission.

1. Large-scale image download

Figure note:

  • PouchContainer: Alibaba Group open source efficient, lightweight enterprise-class rich container engine technology;
  • Registry: repository of container images, each image consists of multiple image layers, and each image layer is represented as a common file;
  • SuperNode: the server of Dragonfly, it is responsible for the life cycle management of the seed block, the construction of P2P network and scheduling clients to exchange specified blocks;
  • Block: When downloading an image using Dragonfly, SuperNode splits the entire file into blocks. The blocks in SuperNode are called seed blocks. The seed blocks are downloaded by several initial clients and quickly spread among all clients.
  • DFget: client for Dragonfly, installed on each host and responsible for block upload and download and command interaction with container daemons;
  • Peer: Hosts that download the same file are called peers.

The processing steps are as follows:

1. Pouch Container initiates the Pull image command, which will be intercepted by the DFget agent. 2. DFget sends a scheduling request to the SuperNode. 3. After receiving the request, SuperNode will check whether the corresponding file has been cached locally. If not, it will download the corresponding file from Registry and generate the seed block data (once the seed block is generated, it can be propagated immediately. It does not need to wait until the SuperNode has downloaded the entire file before distributing it. 4. The client parses corresponding tasks and downloads block data from other peers or supernodes. When all block downloads of a Layer are completed, a Layer is downloaded and passed to the container engine for use. The entire image is downloaded.

Through the P2P technology, Dragonfly can completely solve the bandwidth bottleneck problem of the mirror warehouse, make full use of the hardware resources and network transmission capacity of each Peer, to achieve the effect of the larger the transmission faster. It’s worth noting that Dragonfly’s architecture doesn’t involve any changes to the container technology architecture, and can seamlessly support containers with P2P image distribution capabilities to greatly improve file distribution efficiency.

2. Long-distance transmission

Dragonfly uses CDN caching technology to enable each client to download a seed block from the SuperNode as close as possible, without having to transfer it across a network. The principle of CDN caching is as follows:

The first requester of the same file will trigger a check mechanism to calculate the cache location based on the request information. If the cache does not exist, it will trigger a backsource synchronization operation to generate a seed block. Otherwise, send a HEAD request to the source site with the if-modified-since field, whose value is the last modification time of the file returned by the server. If the response code is 304, it indicates that the file in the source site has not been Modified, and the cache file is valid. Then determine whether the file is complete according to the meta information of the cache file. If it is, the cache is completely hit. Otherwise, you need to use breakpoint continuation to download the remaining files in segments. The prerequisite for breakpoint continuation is that the source site must support segmented download. Otherwise, you still need to synchronize the entire file. If the response code of the HEAD request is 200, it indicates that the source file has been modified and the cache is invalid. In this case, synchronize back to the source. If the response code is neither 304 nor 200, the source station is abnormal or the address is invalid, and the download task fails.

CDN cache technology can solve the problem of client back source download and nearby download. However, if the cache is not matched, the efficiency of SuperNode back source synchronization will be very low for cross-domain long-distance transmission scenarios, which will directly affect the overall distribution efficiency. In order to solve this problem, Dragonfly uses an automated hierarchy of warm-up mechanisms to maximize cache hit ratios. The principles are as follows:

In the process of pushing the image file to Registry through the Push command, the SuperNode will immediately trigger the synchronization of the layer image to the SuperNode in P2P mode. In this way, You can make full use of the time between Push and Pull operations (about 10 minutes) to synchronize all layers of the image to the SuperNode, so that when the user executes the Pull command, the SuperNode cache files can be directly used. Naturally, there are no long-distance transmission problems.

3. Reduce bandwidth costs

Through dynamic compression, you can implement compression policies for the files most worthy of compression without affecting the normal running of SuperNode and Peer. In this way, you can save a lot of network bandwidth resources and further improve the distribution rate. Compared with traditional HTTP native compression, Dynamic compression has the following advantages:

The advantages of dynamic compression are dynamic. It can ensure that compression is enabled only when SuperNode and Peer load is normal, and only the most compressible blocks in the file are compressed, and the compression strategy is also determined dynamically. In addition, multithreaded compression can greatly improve the compression rate, and thanks to SuperNode’s caching capability, the entire download process only needs to be compressed once, which is at least 10 times more profitable than HTTP native compression.

In addition to dynamic compression, SuperNode’s powerful task scheduling capability enables Peer communication on the same network device to be divided into blocks as much as possible, reducing traffic across network devices and equipment rooms, and further reducing network bandwidth costs.

4. Secure transmission

When downloading some sensitive files (such as secret key files or account data files), the security of transmission must be effectively guaranteed. In this regard, Dragonfly mainly does the following aspects:

1. HTTP Header transmission is supported to meet download requests requiring permission verification through the Header; 2. Use self-developed data storage protocols to package and transmit data blocks, and then re-encrypt the packaged data. 3. Plug-in security encryption function will be supported. 4. The multiple verification mechanism prevents data from being tampered with.

How did Dragonfly get promoted?

The fact that Dragonfly made it to the CNCF incubation stage shows that the project itself does have something to impress TOC. Dragonfly mainly solves the problem of container image distribution in large-scale scenarios, which is very different from the traditional solution, Li xiang said.

The traditional solution is centralized storage and distribution, with advantages of simple implementation and convenient control. However, this approach has some challenges in large-scale scenarios, mainly due to the difficulty of flexible horizontal scaling to handle sudden traffic.

“For example, in some internal scenarios of Ali and ali Cloud container service customer scenarios, especially in some batch computing businesses, it is possible to have thousands of containers created in a minute, and there will be corresponding throughput pressure for image distribution. The best way to deal with such sudden and large-scale traffic is to make use of the characteristics of P2P to do distributed distribution. Dragonfly has built a system based on this concept to help users and enterprises deal with large-scale container scenarios, enabling the container ecosystem to cover more and more complex scenarios. “Dragonfly’s concept is a bit of a pioneer in this particular area of containers. It’s probably the first and most successful exploration and practice.”

As to why Dragonfly was able to move from sandbox to incubation project, Li xiang introduced us to CNCF’s internal evaluation criteria. CNCF has some basic requirements for incubation projects, such as the maturity of the project, the popularity of the use, the distribution of contributors, and so on, and Dragonfly from these relatively objective indicators is fully in line with the requirements of incubation.

On the other hand, CNCF will also consider whether the project can help the technology and community development in the cloud native field, and whether it can help the development of CNCF as a foundation. This is a relatively subjective part, and therefore a vote for TOC, an 11-member organization. Judging by the results of the vote, Dragonfly was accepted as an incubator project because most people recognized its value to the cloud native community and the foundation.

During the sandbox phase, Dragonfly demonstrated its value in a number of real-world production environments, including e-commerce, telecommunications, finance, and Internet scenarios. Users include Alibaba, China Mobile, Shopee, Bilibili, Ant Financial, Huya, Didi and iFLYTEK, among others.

For example, China Mobile Zhejiang Branch has been using Dragonfly in the production environment for more than 3 years, involving more than 1000 physical computers, and currently running more than 200 business systems and more than 1700 application modules on Dragonfly. Shopee, an e-commerce platform in Singapore, has also been using Dragonfly in its production environment for more than a year, involving 10K+ physical machines. Bilibili, a Chinese video barrage site, has adopted Dragonfly in test and production environments on more than 3,900 machines. Engineers from Bilibili have worked with and contributed to the Dragonfly community on registry validation, stability, and more.

Of course, as TOC’s Li Xiang, who is also a senior engineer of Ali Cloud, in the process of promoting Dragonfly project, he also provided a lot of technical and ecological guidance and suggestions to the project team. For example, the connection with Harbor Ecology and the interaction with Aliyun products to better popularize Dragonfly to end users.

What’s the point?

We know that Li Xiang is also the author of Etcd, another open source project in the incubation stage of CNCF. So what does it mean for project maintainers to enter the incubation stage?

“It basically means a greater responsibility to serve the cloud native audience and to better connect the ecosystem, and Dragonfly’s future work will revolve around those two goals. Open source to simplify the installation and upgrade process, improve the ease of use, security and other basic capabilities, so that users can easily use Dragonfly project in enterprise scenarios out of the box. In addition, we will promote Dragonfly and CNCF ecosystem harmonious development, improve integration capabilities, with Harbor, Quay, Clair and other projects, and promote OCI standardization in Distribution.” Li said.

For CNCF itself, the promotion of new projects from the sandbox stage to the incubation stage also means that the original cloud ecological map has been further improved. Li Xiang revealed that in fact, the sandbox stage project does not belong to the formal CNCF project in a sense, so CNCF will not invest a lot of resources to the sandbox stage project, including operation, marketing, technical guidance, etc. “The incubation program is the first phase of CNCF’s official program, and the Foundation will invest more resources to assist and support the development of the program, and provide more technical guidance and support to help Dragonfly graduate from the Foundation and become an important part of the cloud native field.”

When it comes to the final stage of CNCF project (graduation), Li Xiang said that the main consideration for graduation is the overall ecological health of the project and the maturity of the project. CNCF hopes that the project can be productive and meet the demands of most enterprises.

The future of cloud native

Now, the global cloud native construction is in full swing, many domestic first-line factories have also begun to actively embrace cloud native. Li Xiang introduced that the main development trend of cloud native technology mainly has the following two points:

  • ** Standardization: ** More and more cloud native technologies emerge, making the need for unified description, management and integration of these technologies an urgent requirement;

  • ** Floating to the application layer: ** cloud native started in the infrastructure layer, but is gradually moving closer to the user’s application layer, and finally realize the vision of software born on the cloud and growing on the cloud. The main obstacle is that the link between the infrastructure and the application layer has not been fully established, and this part is also very fragmented, which does not maximize the value of cloud native at a further level away from the user. At present, Ali is actively involved in this work, helping CNCF Landscape to complement the application definition and application Delivery field, which is also being promoted by CNCF SIG App Delivery.

At present, the whole ecological construction of cloud native can be said to be based on the container choreography system Kubernetes as the core. There is a saying that Kubernetes is the Linux of cloud native era, while there is another broader saying that cloud native is a big foundation of open source. “Kubernetes’ success is actually attributed to its implementation of a standardized abstraction of cloud infrastructure (computing, network, storage, etc.), which is exactly the same value as Linux, a standardized operating system, shielding us from the underlying hardware details. It is with this standardized infrastructure abstraction that the entire cloud computing ecosystem is able to define more application-layer capabilities on top of it, effectively realizing the core value of the cloud’s native connection between ‘applications’ and’ cloud ‘.”

At the end of the interview, Li Xiang also gave some suggestions to developers who want to contact and learn cloud native technologies: currently, the development of cloud native abroad mainly focuses on resource infrastructure management, application infrastructure (such as service grid, observability, etc.), and application operation and maintenance and delivery technologies. The domestic cloud native focus is currently focused on infrastructure management, but is rapidly moving up to the level of developer-oriented applications. For young developers, CNCF official communities, blogs, etc., are a good way to get into cloud native technology.

The guest is introduced

Xiang Li holds a bachelor’s degree from Zhejiang University and a master’s degree from Carnegie Mellon University. He is one of the founders of CoreOS and is involved in the creation of open source projects such as ETCD, Operator Framework and RKT. In the open source community, Li Xiang is known to developers as the author of ETCD. The project currently has over 400 contributors, 14,000 submissions, and over 150 releases, and is well received by developers. In January 2019, it became the first Chinese TOC of CNCF.

After joining Ali Cloud, Li Xiang was mainly responsible for Alibaba’s large-scale cluster scheduling and management system, helped Alibaba initially complete the transformation of infrastructure through cloud native technology, realized the significant improvement of resource utilization rate and software development and deployment efficiency, and synchronously supported the technical evolution of cloud products.

Course recommended

In order for more developers to enjoy the dividends brought by Serverless, this time, we gathered 10+ Technical experts in the field of Serverless from Alibaba to create the most suitable Serverless open course for developers to learn and use immediately. Easily embrace the new paradigm of cloud computing – Serverless.

Click to free courses: developer.aliyun.com/learning/ro…

“Alibaba Cloud originator focuses on micro-service, Serverless, container, Service Mesh and other technical fields, focuses on the trend of cloud native popular technology, large-scale implementation of cloud native practice, and becomes the public account that most understands cloud native developers.”