Overview: In the cloud native trend, the proportion of applications that are containerized is growing rapidly, and Kubernetes has become the new infrastructure of the cloud native era. Forrester predicts that by 2022, enterprises and organizations around the world will be running container applications in production environments. Looking at the current state of use of containers and Kubernetes today, we can see two common phenomena. First, hosting Kubernetes on the cloud has become the preferred choice for enterprises to run their cloud and container. In addition, the way users use containers is changing, from stateless applications to core enterprise applications to data intelligence applications, with more and more enterprises using containers to deploy stateful applications for production, complexity, and high-performance computing. Web services, content repositories, databases, even DevOps, AI/ Big data applications.

Author: CNFS

In the cloud native trend, the proportion of applications that are containerized is growing rapidly, and Kubernetes has become the new infrastructure of the cloud native era.

Forrester predicts that by 2022, enterprises and organizations around the world will be running container applications in production environments. Looking at the current state of use of containers and Kubernetes today, we can see two common phenomena. First, hosting Kubernetes on the cloud has become the preferred choice for enterprises to run their cloud and container. In addition, the way users use containers is changing, from stateless applications to core enterprise applications to data intelligence applications, with more and more enterprises using containers to deploy stateful applications for production, complexity, and high-performance computing. Web services, content repositories, databases, even DevOps, AI/ Big data applications.

How do we solve the arrangement and storage of massive containers in the cloud native era? How can we improve the performance and stability of container storage?

Evolution of storage capabilities under the trend of container applications

With the gradual evolution of infrastructure from physical machines to virtual machines, to container environments such as Kubernetes, and even to Serverless, computing and applications today are facing huge changes. The most obvious is that applications used to monopolize a CPU memory partition in virtual machines. Today, Serverless has evolved to serve users at the functional level.

Under such a technology system, the storage capacity also needs to be changed, which is mainly reflected in the following aspects:

1. The density

In the VIRTUAL machine era, a VIRTUAL machine corresponds to a complete storage space, which can be used to store all the data access and storage requirements required by a single application. However, in today’s K8s, the storage in serverless environment is shared. A container needs to access a huge storage resource pool. The cost is that the storage density is very large, and the requirement to access the same storage capacity becomes higher.

2. The elastic

When we create a physical machine or virtual machine, we often access and use the storage media in a relatively stable period. But in today’s container environment, the elastic scaling of front-end computing services changes very quickly, from tens to hundreds of units in an instant, and therefore requires very high elastic storage capacity.

3. Data isolation

In K8s and Serverless, it is difficult to achieve exclusive memory and storage resources, because the storage resources, computing resources, and even the operating system and some dependent base packages are all shared in the container environment. At this time, it is necessary to realize a kind of security isolation at the infrastructure level. Meanwhile, at the upper application level, it is also necessary to realize data isolation through perfect security policies and means, which is also a very big change and challenge.

What storage capacity does the enterprise need in a container environment?

Block storage, file storage, and object storage are common container storage solutions, so what file storage capabilities do enterprises need in a container environment?

1. Application compatibility

It’s hard to change the way the enterprise is used quickly. In many scenarios, enterprises use shared or distributed storage clusters. At this time, storage is very important for the compatibility of applications. Whether the container environment and non-container environment can be consistent, so that the transformation of applications as little as possible, or even no transformation, is an urgent and important demand to meet.

2. Extreme elasticity

One of the characteristics of container deployment is the need to meet rapid and resilient demand, with business peaks and troughs. And when the upper layer of computing becomes elastic, the lower layer of storage needs to be able to quickly follow up, rather than spend a lot of time synchronizing the lower layer of data.

3. Sharing In big data, high-performance computing and other scenarios, the application data set is very large, often in the order of TB, more than a dozen TB level, some scenarios can even reach hundreds of TB. If the data of this specification cannot be shared, but needs to be synchronized through copy transmission in a flexible container environment, it is difficult to guarantee the cost pressure and the loss of timeliness.

4. Safe and reliable

No matter how the underlying infrastructure changes, be it physical machines, virtual machines, K8s containers or Serverless, no matter how abstract, the most fundamental demand of business applications must be security, and applications must not pollute each other. Therefore, storage must be based on the ability of data sharing to provide protection for data security.

5. Optimize costs

The enterprise’s pursuit of cost optimization is tireless in almost all application scenarios. Even in the most core application scenarios, we still need to control costs. Because today’s business growth and change is very fast, the growth rate of data is also fast. How to optimize the cost while the data is growing rapidly is also a big challenge for storage.

Ali Cloud container network file system CNFS

Aiming at the advantages and challenges of using file storage in the container, Ali Cloud launched the container network file system CNFS, which is built into the Kubernetes service ACK hosted by Ali Cloud. CNFS by ali cloud file storage for a K8s abstract objects (CRD), an independent management, including the creation, deletion, description, mount, monitoring and expanded operations such as operation, users can enjoy the container in the use of file storage and convenient at the same time, improve the performance of file storage and data security, and to provide consistent container declarative management.

CNFS is deeply optimized for container storage in terms of elastic scaling, performance optimization, accessibility, observability, data protection, declarative, etc., which makes it have the following obvious advantages over similar solutions:

In terms of storage types, CNFS supports file storage and object storage. Currently, CNFS supports Ali Cloud NAS, CPFS, OSS cloud products, Kubernetes compatible declarative life cycle management, one-stop container and storage management, and supports PV online capacity expansion and automatic capacity expansion. Optimized for elastic scaling of containers Supports better data protection combined with Kubernetes, including PV snapshot, recycle bin, delete protection, data encryption, and data disaster recovery. Supports application-level application consistency snapshots, automatic analysis of application configurations and storage dependencies. One-key backup and one-key restore PV-level monitoring Supports better access control to improve permission security of shared file systems, including directory quotas and ACLs. Provides microsecond performance optimization and cost optimization for reading and writing small files. Provides low-frequency media and conversion policies to reduce storage costs

Typical usage scenarios and best practices

1. Container application scenarios with extreme elasticity

Take the Internet and large financial service burst applications as an example. In this scenario, a large number of containers need to be expanded flexibly in a short period of time, which has high requirements on the elastic scaling of resources. Therefore, the container storage must have the general flexibility and rapid scaling capabilities. Typical applications in this scenario include: media/entertainment/live streaming, Web services/content management, financial services, games, continuous integration, machine learning, high-performance computing, etc.

In this scenario, Pod needs to be flexible in mounting and unmounting the storage PV, the mounting of the storage needs to match the rapid startup of the container, and there is a large amount of file I/O; When the volume of persistent data grows rapidly, the storage cost pressure will be greater. It is recommended to use the combination of ACK+CNFS+NAS. In combination with CNFS, the following optimization can be achieved:

The built-in file storage class can start thousands of containers in a short time and mount file storage in milliseconds. PV The built-in file system (NAS) provides shared read and write capabilities for massive containers, rapidly achieving high availability of container applications and data. It optimizes low latency and small files to achieve microsecond read and write performance. Solves the requirements on file storage performance due to high concurrent access by containers. Provides file storage life cycle management and automatic hot and cold storage tier, reducing storage costs

2. Application scenarios of AI containers

More and more AI businesses are training and reasoning in containers. The combination of cloud infrastructure and IDC also provides AI with more flexible scheduling of computing power. When AI businesses train and reason on the cloud, the data set applied is very large. For example, in the field of autonomous driving, the data set can reach the scale of 10 PB or even more than 100 PB. When conducting AI training under such a huge amount of data, it is necessary to ensure the timeliness of training, so that container AI mainly faces the following challenges:

AI data flows are complex, causing I/O bottlenecks in the storage system. AI training and reasoning require high-performance computing and storage; AI computing power is coordinated, and cloud and IDC resources/applications need unified scheduling

In this scenario, you are advised to use the combination of ACK management cluster +CNFS+ file storage NAS/CPFS to achieve the following optimization:

Optimized file storage NAS read and write performance provides high performance shared storage, which perfectly matches AI scenarios, supports access to massive small files, and accelerates AI training and reasoning performance. Computing clusters, such as GPU cloud servers and bare-metal servers in container environments, provide high throughput and IOPS capabilities. CPFS can also support the hybrid deployment of ACK management clusters on/off the cloud, and ACK management of IDC’s self-built Kubernetes cluster. Unified resource pools are formed on/off the cloud, and unified scheduling of heterogeneous resources/applications can maximize the computing advantages of cloud volume infrastructure

3. Application scenarios of gene computing

Now genetic testing technology has gradually become mature, and is slowly introduced in many hospitals, through the measurement of patients’ genes more accurately and quickly to solve complex diseases. For each of us, the data sampling of the gene is very large, dozens of GIGABytes. However, for certain types of targeted genetic analysis, it is far from enough to collect only individual samples. It may be necessary to collect 100,000 or even a million samples, which will bring great challenges to container storage, including:

Large-scale sample data mining needs massive computing resources and storage resources, data growth is fast, storage costs are high, and management is difficult. Massive data needs to be quickly and safely distributed to many places in China, and multiple data centers need to share access. Batch sample processing takes a long time, requires high performance, and has obvious peaks and valleys in resource demand, which makes it difficult to plan

For the scenario of genetic computing, it is recommended to use the combination of ACK+AGS+CNFS+ file storage NAS+OSS to solve the following problems:

NFS built-in file storage class, can quickly build speed, low cost, high precision of genetic calculation container environment, meet the demand of gene sequencing calculation and data sharing CNFS support object storage OSS types of PV, can save the plane data and after assembly, and the results of the analysis data, used for data distribution, archive, delivery, Ensure that a large number of users upload and download data at the same time to improve data delivery efficiency. At the same time, AGS can provide massive storage space, archive and store cold data through life cycle management, and reduce storage cost. AGS performs GPU accelerated computing for hot data of genetic computing, and its performance is 100 times higher than that of traditional models, rapidly reducing the time and cost of gene sequencing

Of course, in addition to the above three typical examples, CNFS can provide a deep optimization solution for the use of container and storage in many scenarios. Welcome to understand through the document: help.aliyun.com/document\_d…

Case study: Building modern enterprise applications using CNFS and file storage

Through deep integration with CNFS, Ali Cloud file storage NAS has become the most ideal solution for container storage. Here are a few real customer cases to help you more directly understand how to use Alibaba Cloud container service ACK and file storage to build modern enterprise applications.

Video service

Baiyun is a leading one-stop video service provider in China. During the epidemic, hundreds of cloud traffic skyrocketed, and the volume of business increased tens of times in a short period of time. Such a rapid expansion needs to be completed without the awareness of customers; In addition, the baijia Cloud service scenario has a large number of read and write requirements. Four clusters are added to the computing cluster at the same time. During the recording transcoding process, the original storage system encounters I/O bottlenecks, which severely tests the processing capability of baijia Cloud with large traffic and high concurrency.

The requirements for storage include rapid adaptation of container applications and rapid data access after scaling. Finally, through the combination of Alibaba Cloud container service ACK and file storage NAS, the container cluster architecture is optimized and the elastic capacity of resources is expanded by 10 times in 3 days.

The file storage NAS can be automatically and periodically scaled based on the container service ACK (ACK). Thousands of containers can be started in a short time, which perfectly matches the elasticity of container applications. Adopts file storage NAS, provides standard access interface, is compatible with mainstream transcoding software, and can easily mount video editing workstation. The K8s cluster of Baiyicloud has very high performance requirements. The high-performance NAS service can provide a maximum throughput of 10GB, solve the IO bottleneck, perfectly cope with the scenario of heavy traffic and high concurrency of Baiyicloud, and ensure the smooth launch of live streaming and recording services during the epidemic.

autopilot

The second case is a typical customer in the automotive industry. The customer is a leading smart car manufacturer in China and a technology company that integrates the cutting-edge innovation of Internet and ARTIFICIAL intelligence. Its products are equipped with a number of AI technology services, such as voice assistant and autonomous driving.

The problem faced by this enterprise is that in the automatic driving scenario, the training materials are usually hundreds of millions of small pictures of 100KB, and the total amount is up to hundreds of TB. During the training process, the GPU usually needs to repeatedly and randomly access some pictures in the training set, and the file system needs to provide high IOPS file access capability to accelerate the training process. The stability and performance of large-scale storage systems cannot be linearly expanded with the scale. In addition, with the rapid growth of storage resources, the cost is high and the operation and maintenance management is complicated.

By using Ali Cloud file storage, the customer’s intelligent driving high-performance computing platform is perfectly supported, and the training speed of random access to small files is finally increased by 60%; Files are stored on multiple data nodes in the cluster and accessed by multiple clients at the same time, supporting parallel expansion. And Ali Cloud file storage supports multi-level storage data flow, greatly simplifying the process of data collection, transmission and storage of automatic driving.

Genetic calculation

Finally, a case from the Genetic computing scenario, the client is the world’s leading life science frontier organization. Problems faced by customers: Data growth is fast. Current storage cannot meet the requirements of linear expansion of capacity and performance. Genetic computing performance encounters I/O bottlenecks. Large scale sample data storage cost is high and management is difficult.

The shared storage for NAS high-performance computing (high-performance computing) genetic data analysis can be stored by mounting files in a container cluster. The storage capacity can be low latency and high IOPS, and the storage performance can be increased from 1GB/s to 10GB/s. 12 hours of end-to-end data processing, including data on the cloud, results under the cloud distribution.

The file storage NAS provides bandwidth for elastic scalability and high throughput. The NAS allocates capacity and provides matching bandwidth based on the service scale to meet service elasticity requirements and save TCO. The file storage NAS schedules heterogeneous computing resources on and off the cloud in a unified process and with unified resources, thus completing genetic computing tasks in a cost-effective and efficient manner.

The original link

This article is ali Cloud original content, shall not be reproduced without permission.