24 to 26 June 2019, Hosted by CloudNative Computing Foundation (CNCF), the KubeCon + CloudNative Vecon + Open Source Summit (Shanghai) will be held in Shanghai, China.
Following the first successful landing of KubeCon in China in 2018, this year KubeCon will attract thousands of technicians from all over the world to participate in this grand event, to participate in the in-depth discussion and case analysis of all CNCF projects and topics, and to listen to the sharing of CNCF project operators and end users. The program committee of this year’s KubeCon + CloudNativeCon + Open Source Summit consists of 75 experts reviewing 618 proposals, In KubeCon China 2019, a total of 26 alibaba technical presentations were selected.
In this KubeCon, Ding Yu (Shu Tong), responsible person of Ali Cloud intelligent container platform, CNCF TOC, etCD project author, Li Xiang, senior technical expert of Ali Cloud container platform, CNCF ambassador, Kubernetes project maintainer, Ali Cloud senior technical expert Zhang Lei and many other cloud native technology giants will all be present and do technology sharing, At the same time, it will bring you the latest trends and progress of many advanced Cloud Native technologies, including open source Virtual Cluster strong multi-tenant design, OpenKruise open source project, Cloud Native App Hub and so on. We look forward to your meeting, communication and technical cooperation with ali Container Platform team on KubeCon China.
KubeCon + CloudNativeCon Alibaba special page online
The “KubeCon + CloudNativeCon Alibaba Special page” has been officially launched to fully display alibaba Cloud in this KuebCon speech topics and cloud native ecological achievements. Here, you can master the topics of Ali’s speech on KubeCon, track the update of the course “CNCF X Alibaba Cloud Native Technology Open Class”, understand the dynamics of Ali cloud native products, and the arrangement of the manual salon on June 24. Click the link or “read the original text” at the end of the article to directly enter the special page.
Special page link: yq.aliyun.com/promotion/8…
We recommend that you focus on the following presentations:
1. Kubernetes is at the right time and cloud native has a promising future
The speaker
Ding Yu (Shu Tong), Head of Aliyun Intelligent Container Platform
As a practitioner of cloud native applications, Ali Cloud not only supports the double 11 with huge traffic, but also undertakes the large-scale daily business of Alibaba’s economy. This talk will share aliyun’s successful thinking on Kubernetes technology, and look into the future development trend of cloud native.
2, Keynote: Alibaba scale cloud native
The speaker
Ali Cloud container platform senior technical expert Li Xiang
Ali Cloud has successfully implemented the large-scale implementation of cloud native, this speech aims to share specific experience to the audience, involving scale expansion, reliability, development efficiency, migration strategy and other aspects, and discuss the optimization for large-scale scenarios. Cloud native works for Alibaba. Cloud native works for (almost) everyone.
3. Alibaba uses high availability + extensible Prometheus and Thanos
Ali Cloud Container Platform Senior technical expert Qin Guoan (Yan Lie) ali Cloud Container Platform senior development engineer Li Tao (Lv Feng)
Alibaba Group is using Kubernetes to support the world’s largest e-commerce business. Providing reliable fine-grained monitoring and alerting services is a real challenge in terms of availability and scalability. This talk will share experiences in developing fine-grained monitoring systems with high availability and scalability based on open source projects Prometheus and Thanos. The system mainly supports Alibaba’s cluster management system with 8 million TPS and 10K requests. Topics will be discussed:
1) How to use Prometheus to support large-scale scenarios? 2) How to use Thanos to solve data query problems caused by multiple Prometheus instances? 3) Lessons learned from the configuration of Prometheus and Thanos, such as target discovery and logging rule management and alert rules.
4. Manage microservices across regions and across clusters using Istio
The speaker is Wang Xining, Senior technical expert of Aliyun Container Platform UniCareer Xiaozhong Liu
Job You is an e-learning career development platform designed to meet the diverse needs of students and working professionals around the world, and to serve users from many regions of the world. These applications are deployed on multiple Kubernetes clusters in different regions of Ali Cloud to reduce service access latency in different regions. In order to manage these microservices effectively, a multi-cluster service grid is needed to control microservice traffic and ensure service-to-service communication.
Istio is a service grid built on Top of Kubernertes that supports multiple topologies to manage application traffic across multiple Kubernetes clusters. Throughout the case study, we will share deployment designs and technologies related to multi-cluster traffic management using the Istio service grid, and discuss some of the challenges and practices based on the needs and limitations of the underlying platform.
5. Efficient utilization of resources by hosting CPU and GPU workloads
He Jian, senior technical expert of Aliyun Container Platform, Technical expert of Ant Financial Platform Data Technology System Department, Cen Penghao (Cooper)
This talk will focus on how to mix AI Training tasks and long services on top of the Kubernetes cluster. The main purpose is to improve resource utilization and save resources by mixing workload. We will describe how we achieve mixing and evaluate utilization from various dimensions including Qos class, Cgroup, scheduling, and so on. Over the past few months, we’ve built a GPU and CPU hybrid cluster of several hundred nodes, and we’ll cover best practices for mixing long service and AI batch tasks in a production cluster.
6-1-5-10: How to quickly recover from large-scale container faults
Speaker Xiong Huan (Ning Zhuo), Technical expert of Aliyun Container Platform
In the cloud era, container-based applications proliferate in enterprises. Due to manual operations and hardware failures, container failures are more likely to occur. Therefore, how to ensure the reliability of large-scale containers without increasing resource input becomes a huge challenge for cloud platforms. Alibaba operates millions of containers and has the 1-5-10 theory for restoring container-related faults: MTTD (mean detection time) 1 minute, MTTI (mean detection time) 5 minutes and MTTR (mean resolution time) 10 minutes. In this meeting, we will discuss how to use 1-5-10 to improve reliability of large containers:
1) How to establish an effective proxy locally and detect problems within 1 minute; 2) How to use expert knowledge base to diagnose container problems intelligently; 3) How to automatically recover container problems in a failure-driven manner.
7. Understand the scalability and performance of Kubernetes Master
Speaker Chen Xingyu (Yumu), Senior software Engineer of Ali Cloud Container Platform zeng Fansong (Zhuling), senior technical expert of Ali Cloud Container Platform
Currently, Kubernetes has a size limit of 5K nodes, so if you want to use it to manage web-sized clusters like 10K nodes, you may not be able to do so. Are you wondering what the performance bottleneck is for Kubernetes to manage nodes beyond 5K? When you want to take its scalability to the next level, which component is holding you back? Etcd, Apiserver or Scheduler? Understanding these issues is key to operating a large Kubernetes cluster. At Alibaba, we ran into a lot of issues, such as pod creation being very slow as clusters got bigger and bigger. In this talk, we want to share how to do various benchmarking and analysis and find bottlenecks, as well as how to tune the control components and achieve over 100x performance improvements.
8, Intro: containerd
The speaker is Fu Wei (Yuge), senior development engineer of Aliyun Container Platform, and Liu Lantao, software engineer of Google
This talk will focus on containerd’s architecture and how it can be enhanced with plug-ins, different image storage, and strongly isolated container runtime solutions. At the same time, the demo case of ContainerD’s container runtime integration with gVisor and Firecracker will be shown to the audience to better understand the best integration mode of ContainerD.
9. Alibaba uses K8S, Kata containers and bare-metal cloud to build no servers
Speaker zhang Yifei (Wu Peng), Technical expert of Ali Cloud Container Platform tang Huamin, Senior Development engineer of Ali Cloud Container Platform
Serverless computing is a popular form of computing, which greatly reduces the cost for developers to deploy, manage, and run applications. In a serverless platform, the services of different users are often mixed on the same node. Therefore, it is necessary to provide a trusted operating environment in multi-tenant scenarios. At Alibaba, we use Kata Containers as secure container runtime to ensure multi-lease hard isolation and service runtime performance at storage, network, hardware and other levels. In this sharing, we will discuss in detail how to achieve high performance of hard multi-tenancy and service operation in multi-tenancy scenarios based on our production practices.
10. Open source community exploration driven by Alibaba digital
Speaker Zhao Shengyu, Senior Community Manager of Alibaba Open Source Governance Office
The operation of the open source community has always been a pain point in open source software development, especially for the community dominated by pure developers. How to effectively manage the open source community, discover the active contributors in the community, and find the problems existing in the community management through data are all urgent problems to be solved. The content of this presentation will include:
1) How do you evaluate a developer’s individual activity in the community? 2) How to measure the overall activity of the open source community? 3) What can be seen and gained from the current analysis of the world’s top open source projects under these models? 4) What role should community management tools play in the open source community? 5) Based on the above content, what attempts and results did Alibaba make?
11. Alibaba: Experience and lessons of e-commerce giant’s evolution to cloud
Zhang Lei, Senior technical expert of Aliyun Container Platform wang Siyu, Senior Development Engineer of Container Platform (Jiuzhu)
Migrating a global e-commerce giant like Alibaba to a cloud-native platform is no easy task. In this talk, we will share the lessons learned from our work over the last year from a technical and community perspective, including:
1) What are the major obstacles for Alibaba to migrate to cloud native technology? 2) What are Alibaba’s major technical liabilities? How can we solve these problems? Is our approach working? 3) What if your application is managed differently than Kubernetes in your organization? 4) Why is predictability important for e-commerce? Does Kubernetes have predictability out of the box? If not, why not? How to solve this problem (possibly without solution)? 5) How to verify scalability issues in a cluster of thousands of nodes? 6) Can a large team cooperate with upstream communities for win-win results?
12, Intro: Dragonfly
Speaker Hu Zuozheng (Zhengxi), technical expert of Ali Cloud Application operation and Maintenance Platform zhang Jin (Taiyun), Senior development engineer of Ali Cloud application Operation and Maintenance Platform
As container technology becomes more and more widely used in industry, how to distribute images safely and efficiently is a new challenge facing engineers. Dragonfly project is an image and file distribution system based on open source intelligent P2P. This project aims to address all distribution issues in the cloud native scenario. Currently, the Dragonfly project focuses on:
Simple: clearly defined API (HTTP) for users, non-invasive and efficient for all container engines: CDN support, P2P-based file distribution to save enterprise bandwidth Intelligent: host detection to achieve host-level speed limits, intelligent traffic control security: data block transfer encryption, HTTPS connection support
In this talk, we will focus on distributing container images through dragonflies. We will review the challenges facing the organization, including mass distribution, secure transmission, bandwidth costs, and provide solutions. This presentation will discuss practical use cases.
No more chaos: large-scale Kubernetes audits and inspections
The speaker is Chen Jie, technical expert of Aliyun Container platform, and Ma Jinjing, senior development engineer of Ant Financial
As we all know, accurate exception discovery and fast problem analysis are key to ensuring the availability and stability of the Kubernetes cluster. But throughout the Kubernetes project, there are countless monitoring indicators. In our Kubernetes cluster alone, we observed thousands of monitoring data like this being generated every second. How to make reasonable use of these complex and large numbers of data and indicators, effectively record and analyze them, and turn them into easy to understand visual display and accurate alarm information is a very challenging task.
In this speech, we hope to share with you our practice and experience in Kubernetes cluster monitoring, audit and inspection in Alibaba. First, we’ll talk about Kubernetes’ key stability statistics and metrics, and how to understand them. We will talk about how to integrate and analyze these data and indicators in the form of cases. Finally, we will share alibaba’s best practices for efficient, real-time automated inspection and analysis of these data.
14. Minimize GPU cost for deep learning running on Kubernetes
Speaker Zhang Kai, senior technical expert of Ali Cloud Container Platform Che Yang, technical expert of Ali Cloud Container platform (intended as the term)
More and more data scientists are running NvidiaGPU based deep learning tasks on Kubernetes. At the same time, they found that idle Gpus in the cluster wasted more than 40% of their cost. Therefore, how to help improve GPU efficiency has become an important challenge. We will introduce a GPU sharing solution based on native Kubernetes:
1) how to define the GPU sharing API 2) how to schedule GPU sharing without changing the bare-bones code of the scheduler. 4) We will also demonstrate how Tensorflow users can run different jobs on the same GPU device in a Kubernetes cluster
15. Three ways to speed up image distribution in the cloud native era
Speaker Jiang Yong (Yifang), technical expert of Aliyun Container Platform
This talk will share the practices and lessons from alibaba’s network scale to improve the efficiency of image distribution. We use different image distribution methods according to different scenarios. P2P CNCF/Dragonfly distribution is the most direct way to ease mirroring center bandwidth and reduce distribution time. In addition, the remote file system snapshot program in CNCF/ Containerd directly remote stores the image, enabling the container engine to read the image content over the network, requiring little time to distribute. You will find that the second approach depends on network stability, so how do you balance dynamically loading an image from remote to local storage based on mirror content read requests? Finally, we will summarize how to choose a suitable mirrored distribution.
Dynamically adjust Pod resource limits in a Web-level cluster
Speaker wang Cheng, Ali Cloud Container Platform technical expert Zhang Xiaoyu (Zhong Yuan)
As a huge global e-commerce giant as Alibaba, the number and types of applications it owns are extremely large. How to manage the resources of these containers scientifically and reasonably has always been a great challenge for us. In this talk, we will share our practical work experience and technical achievements from multiple dimensions, including technology and community evolution. These include:
1) What is the current status of community resource management for containers? 2) What are the specific challenges of alibaba’s large-scale application deployment? 3) How do we diagnose and treat all kinds of difficult problems in resource management? 4) How can we achieve a significant increase in resource utilization while ensuring the stability of online services? 5) How to balance cloud native evolution with fast delivery of work? 6) How can our experience help you and how can we feedback the community to achieve a win-win situation?
KubeCon China 2019 Alibaba Technology Talk Overview
Recommended courses of the week
Author: K8S little hotshot
The original link
This article is the original content of the cloud habitat community, shall not be reproduced without permission.