The author | Shang Zhimin, f

Meeting a complete video review: www.bilibili.com/video/av886…

On February 12, Ali Cloud and CNCF jointly held an online seminar, which for the first time fully introduced the layout of Ali Cloud for Kubernetes community, including 10 categories, more than 20 open source projects, providing a complete Kubernetes lifecycle management. This article has compiled the full video review and download of the conference, and sorted out the questions that were not answered in time

Pay attention to the “Alibaba Cloud original” public account, the background reply “meeting” to download PPT.

What is a SIG Cloud Provider

Over time, more and more enterprises are using Kubernetes in their production environments. Kubernetes is widely accepted because of its good design and thriving community. There are about 20 interest groups (SIG) around Kubernetes. SIG Cloud Provider is one of the important interest groups of Kubernetes, which is committed to pushing all Cloud vendors to provide Kubernetes services with standard capabilities.

SIG- Cloud-provider-Alibaba is the only sub-project of SIG Cloud Provider in China.

Cloud Provider SIG is the Cloud vendor interest group of Kubernetes. It is committed to making the Kubernetes ecosystem evolve in a vendor-neutral direction. It is responsible for coordinating different vendors to meet the needs of developers with a unified standard as far as possible. At present, seven Cloud vendors have joined the Cloud Provider SIG, including AWS, GCP, AliYun, and IBMCloud.

Why does Ali Cloud join SIG Cloud Provider

1. Jointly promote the multi-cloud standard with global cloud vendors, and feed the excellent practices of Aliyun back to the community

In the all-cloud era, the cloud has reshaped enterprise IT architecture. Cloud native computing is a set of best practices and methodologies for building scalable, robust, loosely-coupled applications in public, private, and multi-cloud environments that allow faster innovation and low cost trial and error.

Ali Cloud as an international influential cloud manufacturer, also hope to promote the further standardization of Kubernetes, and further and horizontal cloud manufacturers such as AWS, Google, Azure technology coordination, optimization of cloud and Kubernetes connection, and unified different components of the modular and standardized protocols.

2. Bring transparent and controllable, collaborative and smooth evolution capabilities to Kubernetes developers in Aliyun

For Kubernetes developers and users, we hope to build the best operating environment of Kubernetes based on Ali Cloud, and will open source around Kubernetes Ali Cloud plug-in. Alibaba Cloud container service ACK also tries to reuse these components.

  • Transparent and controllable: For research developers, you can build your own Kubernetes cluster based on these plug-ins; Users of container service ACK can also be more transparent about the implementation;
  • Co-construction and collaboration: Developers who need Kubernetes in computing, network, storage and other fields on AliYun can raise issues or contribute to open source component development and participate in the formulation of RoadMap;
  • Smooth evolution: Aliyun Kubernetes open source plug-in provides the deployment capability of Day 1, but puts forward higher requirements for enterprise operation and maintenance, upgrade, stability control, etc. If you need the expert services of Day 2, such as continuous upgrade, high availability guarantee, and error correction recommendation, you can smoothly evolve to the container service ACK.

Operation mechanism of SIG Cloud Provider Alibaba

  • Slack
  • Bi-monthly meeting
  • Conference notes: Google Docs, YouTube
  • Language: Chinese, English

Aliyun Kubernetes product family introduction

Aliyun Kubernetes open Source suite family photo

As an application operating system in the cloud native era, Kubernetes has become the de facto standard. Ali cloud in Kubernetes practice in the process of open source many projects, such as at the bottom of the computing, storage, network, security and other related 5 big categories and the upper field related AI, application management, migration, Serveless and other 5 big categories, for users to provide full stack life cycle management.

Sig-cloud-provider-alibaba provides a bridge for K8s to communicate with the best practices of Cloud native on Ali Cloud. Through interest groups, all participating individuals and organizations can understand the principle of CloudProvider and apply it to production practice to realize its business value.

See below.

CloudController

  • Cloud-provider
  • Cluster-api

network

  • Terway (CNI)
  • Flannel (CNI)
  • ingress
  • External-dns

storage

  • CSI
  • FlexVolume
  • auto-provision

The elastic

  • Cron HPA
  • cluster-autoscaler

security

  • KMS provider
  • Kube2ram
  • RAM Authenticator
  • SGX device plugin

The migration

  • Derrick
  • Velero
  • Image Builer

AI

  • Arena
  • GPU share

ServiceBroker

  • ServiceBroker

Serverless

  • Virtual-kubelet

Application management

  • Kube-eventer
  • metrics-adapter
  • log-pilot
  • openKruise
  • OAM

Introduction to some open source components

CloudController

CloudController refers to the CLOUD-Controller-Manager component of K8s (CCM for short). It provides the interconnection capability between Kubernetes and basic services of various cloud vendors (including network load balancing, VPC routing, ECS, and DNS). These controllers are implemented by NodeController, ServiceController, RouteController, and PVLController.

NodeController Manages compute nodes, for example, ECS node lifecycle management. NodeController identifies compute nodes with availability area, Region, and hostname to provide comprehensive information for the scheduling system to schedule workloads on compute pools. In addition, the SYSTEM periodically polls THE ECS IP address and detects the ECS resource status (whether it is released) to dynamically update node information to ensure that the choreographer system responds to compute node events in a timely manner.

ServiceController implements load balancing management for applications. It automatically configures and manages load balancing services (SLB configuration, listener configuration, and virtual server group configuration) for applications by monitoring Kubernetes Service object changes. Back-end server groups that dynamically adjust load balancing based on application copy changes without human intervention. On this basis, we define a set of rich annotations to customize the configuration of application load balancing. At the same time, we actively cooperate with the community to jointly promote the standardization of configuration. At the same time, we extend the elastic network card passthrough mode on the service discovery model of K8s, reducing the network hierarchy of service discovery. The overall application network performance is improved by 10%.

High-performance networking component Terway

Terway implements the Kubernetes CNI specification, optimized for Ali Cloud environment, and supports rich enterprise features, including VPC routing mode, ENI mode, ENI multi-IP mode, etc., with excellent performance, ENI mode is about 10% better than native VPC.

The deep integration of Terway and The underlying IAAS network of Ali Cloud enables Pod to seamlessly use CEN, SLB and other network products as first-class citizens of the cloud network. The use of elastic network card enables zero loss of network performance, so that there is no experience and performance degradation in the process of containerization. At the same time support Kubernetes network policy, Qos flow control and other advanced functions.

High-performance containers store CSI

Aliyun CSI plug-in realizes the life cycle management of container storage volumes in Kubernetes, and supports the dynamic creation, mounting and use of cloud data volumes. The current CSI implementation is based on versions above K8S 1.14; Supported Ali cloud storage: Cloud disk, NAS, CPFS, OSS, and LVM.

High-performance log collection LogPilot

Log-pilot is an efficient intelligent container Log collection tool. It can not only easily collect standard output logs of containers, but also dynamically discover Log files inside containers. It can automatically sense the status of containers in the cluster to dynamically configure container log collection. It also has many advanced features, such as automatic CheckPoint and handle retention mechanism of logs, automatic log data marking, and custom Tag mechanism. Log data can be collected flexibly to various log storage backends, such as ElasticSearch, Kafka, Logstash, Redis, Graylog, etc.

Machine learning lightweight solution Arena

Arena is a kubernetes-based lightweight machine learning solution that supports the full lifecycle of data preparation, model development, model training, and model prediction to increase the productivity of data scientists. Data scientists and algorithm engineers can quickly start using Aliyun resources, including ECS cloud server, GPU cloud server, distributed storage NAS, CPFS, object storage OSS, and Elastic MapReduce, load balancing and other services) perform tasks such as data preparation, model development, model training, evaluation, and prediction. Deep learning capabilities can be easily converted into service apis to accelerate integration with business applications. In addition to improving the efficiency of data scientists, the utilization of GPU resources in a cluster can be improved by visualizing GPU resource management and sharing device scheduling.

Welcome to SIG Cloud Provider

This webinar introduced aliyun’s community layout in Kubernetes for the first time. There is not enough time or space to cover all the details of open source components, but I hope it will give you some ideas to help interested developers find the corresponding open source projects. We welcome more developers to join us on the Roadmap, whether it’s PR or Issue, or suggesting Roadmap. In the future, SIG Cloug Provider Alibaba will share principles and best practices for specific components.

Q & A

Q1: Can ali Cloud K8s Cloud Provider add parameters for each function point to switch?

A1: Specific function points can be realized by configuring annotations. For details, please refer to the documentation.

Q2: If we want to make changes based on Ali CCM, is there a problem with the K8s version, because we want to use our own Kubernetes specific version?

A2: Yes, CCM does not depend on the K8s version.

Q3: Are the kubernetes-based container services of Aliyun directly using open source CCM? If so, what adjustments were made internally before going live? Also, what exactly is the provider_id format?

A3: Yes, completely based on the open source version of CCM. The provider_id format is ${regionID}.${nodeID}.

Q4: must CCM nodename of K8s be the same as ali Cloud instance ID? Before operation said it had to be the same, but such meaningless Nodename is disgusting to use. A4: No. For now, you only need to configure the PROVIDerID parameter.

Q5: How does terway bottom speed up? Kernel level or DPDK?

A5: Terway is divided into different network modes. The network configurations in different modes are unavailable.

  • The Eni-exclusive mode directly uses the NIC of the IAAS layer as the NIC of the Pod. Virtualization is not involved in host, and the USER Pod can use DPDK to speed up the application network. Outside the node, it relies on the high-performance IAAS network developed by Ali Cloud;
  • The lightweight virtualization solution of Ipvlan used in shared ENI mode as a means of intra-node virtualization has a very low performance loss compared with Host network performance.

Q6: Do POD kernel parameters allow namespace?

A6: POD kernel parameters whether to allow namespace depends on the support of the kernel, generally on the newer kernel, such as Aliyun Linux2 4.19 kernel, most of the kernel parameters can be specified and modified in POD.

Q7: In terms of safe containers, what products does Ali have now?

A7: At present, Ali Cloud container service has provided security sandbox as an optional container engine for users, and part of Ali Cloud Serverless products such as SAE and ECI are also built on the security container.

Q8: Does Arena support multi-tenant and virtual Gpus?

A8: Arena reuses Kubernetes’ existing user authorization and multi-tenancy mechanisms. Different users can be assigned different KubeconFig for authentication, while resources are isolated and shared through namespace. However, from the perspective of Arena, users can only see the training and reasoning tasks of the namespace, while other tasks under the namespace are not visible.

The virtual GPU here refers to the virtual GPU technology of Nvidia. At present, it is aimed at the virtual GPU that supports P4 on Ali Cloud and has been integrated with Ali Cloud container service Kubernetes, which can be experienced on Ali Cloud container service. However, from the perspective of Arena, virtual GPU is not a special GPU resource, but can realize the scheduling and orchestration of this resource.

Q9: Does multi-container shared GPU solution support resource isolation? Can you limit the video memory?

A9: First of all, thank you for your attention to our GPU sharing solution. Ali Cloud Container Service contributes to the only open source GPU sharing solution in the industry. At present, our scheme still implements multi-container GPU sharing at the scheduling level, and can be combined with TensorFlow and other frameworks to realize GPU resource limitation at the application level. You can see the current usage through our documentation.

However, we are also working with the underlying team of Aliyun to develop secure and high-performance GPU isolation solutions. We believe that in the near future, everyone will be able to experience a complete solution from GPU sharing scheduling to isolation.

Q10: Does ExternalDNS currently support the DNS service of Alicloud? To what extent?

A10: PrivateZone supports the DNS service of Alicloud. It can synchronize the services/Pods of the K8s cluster to the DNS service to reduce the loss caused by the coreDNS deployed in the cluster.

Q11: What are the main differences between Nginx Ingress and nginx Ingress?

A11: Ali Cloud implements more advanced functions based on the community, such as dynamic update of NGINX Server configuration, support of mixed grayscale publishing strategy based on Header, Cookie and request parameters and weights, etc.

Q12: What is the release cycle of Aliyun Kubernetes and these developed suites?

A12: Support for K8s major version is to update a stable version every six months. Bugfixes and security fixes will be made from time to time.

Q13: Ask whether the commercial stable version of edge version ACK@Edge has been released and whether relevant users are using it

A13: ACK@Edge can be used in the production environment. Currently, it has been used by users in online education, video, IoT, CDN and other fields and industries. The commercial version is expected to be launched before June 2020.

Q14: Does the host WORKER node experience a cGroup memory leak that causes POD Cannot allocatie memory? If so, how to solve it?

A15: The cgroup driver used by the container service is the Systemd cgroup driver. This problem does not occur.

Q15: Are THE CPU memory resources of the POD isolated from the host? How is it isolated?

A15: This can be achieved by kubelet reserving resources for the host so that Pod resources are limited to the remaining resource space.

Q16: Aws has EckCTL, does Aliyun have the corresponding tool? Call ackctl?

A16: Refer to the documentation.

Q17: How does Ali Cloud support Windows containers?

A17: Windows 1809 is currently supported, 1903 is coming. Windows nodes can be added to Linux clusters.

Q18: Can a single open component be integrated into an existing K8s cluster?

A18: Yes. Existing K8s clusters fully meet the K8s Conformance test.

“Alibaba Cloud originators pay close attention to technical fields such as microservice, Serverless, container and Service Mesh, focus on cloud native popular technology trends and large-scale implementation of cloud native, and become the technical circle that knows most about cloud native developers.”