The author | east alibaba technical experts

< Pay attention to alibaba Cloud’s original public account, reply to the investigation to download e-books >

Introduction: “Kubernetes” book a total of 12 technical articles, to help you understand 6 core principles, understand the basic theory, a learning of 6 typical problems of the gorgeous operation!

What is Kubernetes?

Let’s see what Kubernetes are. In this part, I will share my views with you from four perspectives.

1. What is the future

This is an architectural map of the future back-end IT infrastructure of most companies. Simply put, all corporate IT infrastructure will be deployed in the cloud. Based on Kubernetes, users will divide the underlying cloud resources into specific cluster units for different services. With the deepening of business microservitization, service governance logic like service grid will become a category of infrastructure just like the following two layers.

Currently, almost all of Alibaba’s business runs on the cloud. And about half of those businesses have migrated to their own custom Kubernetes clusters. In addition, as FAR as I know, Ali plans to complete 100% business deployment based on Kubernetes cluster this year.

And the service grid, in some of Alibaba’s divisions, like Ant Financial, is already wired for business. We can learn about their practice process by sharing with ant.

Although the view in this chart may be a bit absolute, the current trend is very clear. So in the next few years Kubernetes will definitely become as ubiquitous as Linux as a clustered operating system.

2. Kubernetes and operating system

This is a comparison of a traditional operating system and Kubernetes. You know, as a traditional operating system, like Linux or Windows, they act as an abstraction layer on the underlying hardware. They manage the computer’s hardware down, like memory or CPU, and then abstract the underlying hardware into easy-to-use interfaces that support the application layer up.

And Kubernetes, we can also think of it as an operating system. To put it bluntly, this operating system is also an abstraction layer. The hardware it manages downward is not the hardware such as memory or CPU, but a cluster composed of many computers. These computers are ordinary stand-alone systems with their own operating system and hardware. Kubernetes manages these computers as a pool of resources to support applications.

The applications here are special in that they are containerized applications. For those of you who don’t know much about containers, you can simply think of these applications as an application installation file. The installation file packages all the dependent libraries, such as liBC. These applications do not rely on the underlying operating system’s library files to run.

3. Kubernetes and Google operation, maintenance and decryption

In the image above, a Kubernetes cluster is on the left, and a very famous book, Google O&M and Decryption, is on the right. I’m sure many of you have read this book, and many companies are currently implementing the methods in this book. Including fault management, operation and maintenance scheduling, etc.

The relationship between Kubernetes and this book, we can compare them to the relationship between swordsmanship and Qigong. I don’t know how many people here have seen the swordsman. The Huashan school in the swordsman’s river is divided into two factions, the Qi Clan and the Sword Clan. The Qigong school focuses on the practice of qigong, while the Jian School emphasizes the subtlety of fencing. In fact, the separation of the Qizong and the Jianzong was due to the fact that two students of the Huashan school secretly learned a sunflower scripture, and each wrote down a part of it, and eventually split into two schools because of their different views.

Kubernetes is actually derived from Borg, Google’s cluster automation management and scheduling system, which is the object of the operation and maintenance approach described in this book. The Borg system and the various operations described in the book can be seen as two sides of the same coin. If a company only learns their operation and maintenance methods, such as opening the position of SRE, but does not understand the system managed by this method, it is actually learning sunflower Treasure book, but only part of learning.

Borg is an internal system of Google, so it is invisible to most of us, while Kubernetes basically inherits some of Borg’s core concepts in cluster automation management. So if you read this book, feel very impressive, or practice the method in this book, then you must understand Kubernetes deeply.

4. History of technological evolution

In the early days, when we built a website backend, we might just put all the modules in one executable file, like the one above. We had UI, data, and business modules, and these three modules were compiled into one executable and ran on a server.

However, with the huge growth of the service volume, we cannot expand the capacity by upgrading the server configuration. This is where we have to do microservitization.

Microservitization breaks down monolithic applications into small, low-coupling applications. Each of these applications is responsible for a piece of business, and each instance of the application has a server to itself, calling each other over the network.

The key here is that we can scale the applet horizontally by increasing the number of instances. This solves the problem that a single server cannot be expanded.

One of the problems with microservices is that one instance takes up one server. This kind of deployment mode, the waste of resources is actually quite serious. At this point, the natural thought is to mix these instances with the underlying server.

But mixing will introduce two new problems, one is the dependency library compatibility problem. These applications may rely on completely different versions of library files, which are bound to cause problems when installed on the same operating system. Another problem is application scheduling and cluster resource management.

For example, when a new application is created, we need to consider which server the application is scheduled to, and whether the resources are enough after it is scheduled.

The dependency library compatibility problem here is solved by containerization, that is, each application comes with its own dependency library, and only shares the kernel with other applications. Scheduling and resource management are the problems solved by Kubernetes.

By the way, we may not be able to troubleshoot problems like slow response due to the number of applications in the cluster and the complexity of their relationships. So service governance technologies like service grid will definitely be the next trend.

How to learn Kubernetes?

1. Kubernetes

In general, Kubernetes is relatively high threshold, more difficult to learn, one is that its technology stack is very deep, including kernel, virtualization, container, software defined network SDN, storage, security, and even trusted computing, absolutely can be called the full stack technology.

At the same time, the implementation of Kubernetes in the cloud environment will definitely involve a lot of cloud products. For example, on Ali Cloud, our Kubernetes cluster uses ECS cloud server, VPC virtual network, load balancing, security group, log service, cloud monitoring, middleware products like AHAS and ARMS. Service grids, elastic scaling, and many more cloud products.

Finally, because Kubernetes is a general-purpose computing platform, it can be used in a variety of business scenarios, such as databases. As far as I know, our PolarDB Box is planned to be built based on Kubernetes. There are also edge computing, machine learning, flow computing and so on.

2. Understand, do, and think

Based on my personal experience, to learn Kubernetes, we need to grasp it from three aspects: understanding, doing and thinking.

It’s important to understand, especially the history of technology, and the big picture of technology.

We need to know the evolution history of various technologies, such as how container technology evolved from chroot, and what problems are behind the evolution of technologies. Only by knowing the evolution history of technologies and the driving forces of development can we make our own judgments about the future direction of technologies.

At the same time, we need to understand the technology panorama. For Kubernetes, we need to understand the whole cloud native technology stack, including containers, CICD, microservices, service grid, and so on, and know where Kubernetes is in the stack.

In addition to these basic background knowledge, learning Kubernetes technology, hands-on practice is very important.

In my experience working with a lot of engineers to solve problems, many of them don’t really dive into the technical details. We often joke that there are two kinds of engineers. One is “search Engineer” and the other is “research Engineer”. Many engineers run into a problem, Google it, and if they can’t find the answer, they just do it. It’s hard to understand a technology in depth.

Finally is how to think, how to sum up. My personal experience is that we need to understand the technical details, and constantly ask ourselves, is there something more fundamental behind the details? That is, we need to simplify complex details and find common patterns.

Let me use two examples to illustrate the above method.

3. Use refrigerators to understand cluster controllers

The first example is about cluster controllers. We will hear several concepts as we study Kubernetes, such as declarative API, Operator, end-state oriented design, etc. All of these concepts are essentially talking about one thing, which is the controller pattern.

How do we understand the Kubernetes controller? The diagram above is a classic Kubernetes architecture diagram. In this diagram, there are cluster management nodes and working nodes. On the management node, there are central database, API Server, scheduler and some controllers.

The central database is the core storage system of the cluster, the API Server is the entrance to the management and control of the cluster, and the scheduler is responsible for scheduling applications to the nodes with abundant resources. And the controller is what we’re talking about here. The function of the controller, we can use a word, is “let the dream into reality”. In this sense, I myself often play the role of the controller. If my daughter says, “Dad, I want to eat ice cream,” then my daughter is the user of the cluster, and I am the person responsible for realizing her wish, which is the controller.

In addition to the managed nodes, the Kubernetes cluster has many working nodes, which are deployed with the Kubelet and Proxy proxies. Kubelet is responsible for managing work nodes, including things like starting and stopping applications on nodes. The Proxy is responsible for implementing the definition of the service into specific iptables or IPVS rules. The concept of a service here is simply to use iptables or IPVS for load balancing.

If we look at the first diagram from the controller’s point of view, we get the second diagram. That is, a cluster actually consists of one database, one cluster entry, and many controllers. These components, including the scheduler, Kubelet, and Proxy, actually observe the definitions of various resources in the cluster and then implement these definitions into specific configurations, such as container startup or iptables configurations.

When we look at Kubernetes from the controller’s point of view, we actually get one of the most fundamental principles of Kubernetes. That’s the controller mode.

In fact, the controller mode is everywhere in our life, and I’ll take the refrigerator as an example. When we control the refrigerator, we don’t directly control the refrigeration system or the lighting system in the refrigerator. When we open the refrigerator, the light inside turns on, and when we set the temperature we want, the cooling system stays at that temperature even when we’re not at home. Behind this is the controller mode at work.

4. Why cannot the namespace be deleted

As a second example, let’s look at the process of troubleshooting a real problem. The problem is that namespaces cannot be deleted. It’s a little bit more complicated, so let’s take it step by step.

Namespaces are an in-box mechanism for Kubernetes clusters, as shown in the first picture here. This box is the namespace that holds the eraser and pencil.

Namespaces can be created or deleted. We often have problems with not being able to delete namespaces. Encounter this problem, if we don’t know how to check completely. A good first step might be to look at how the API Server handles this delete operation, since API Server is the administrative entry point to the cluster.

API Server itself is an application, and we can improve the logging level of the application to understand its operation process. In this case, we can see that the API Server receives the delete command, but there is no other information.

When a user deletes a namespace, the namespace will not be deleted directly, but will be changed to the “deleting” state. At this point, the namespace controller will see this state.

To understand the behavior of the namespace controller, we can also raise the logging level of the controller to see the detailed logs. At this point, we’ll see that the controller is trying to get all the API groups.

There are two things we need to understand here. One is why the controller gets the API grouping when the namespace is deleted. The second is what API grouping really is.

Let’s start with the second question, what exactly is an API grouping. To put it simply, the API group is the classification mechanism of cluster apis. For example, networking apis are in this group. Resources created by grouping network apis belong to this group.

So why does the namespace controller get the API group? The reason is that the controller needs to delete all resources in the namespace. This operation does not delete all the files in the folder as we delete the folder.

Namespaces hold resources that actually point to the namespace using an index-like mechanism. The cluster can only delete resources one by one if it iterates through all the API groups to find all the resources that point to the namespace.

Traversing the API group causes the cluster’s API Server to communicate with its extensions. This is because extensions to API Server can also implement some API grouping. So to find out if the deleted namespace contains a resource that contains the extension definition, the API Server must communicate with the extension.

At this point, the problem really becomes one of communication between the API Server and its extensions. The problem of deleting resources becomes a network problem.

Aliyun’s Kubernetes cluster is created in a VPC network, also known as a virtual local area network. By default, a VPC identifies only the VPC network segment, but containers in a cluster use different network segments from VPCS. For example, if a VPC uses network segment 172, the container might use network segment 192.

By adding routing entries for container network segments in the VPC routing table, containers can communicate with each other using the VPC network.

In the figure in the lower right corner, we have two cluster nodes whose addresses are 172 network segments. If we add 192 network segments to the routing table, the VPC can forward the data destined for the container to the correct node, and then the node can send the data to the specific container.

The routing entries are added by the routing controller when the node is added to the cluster. When the routing controller finds that a new node joins the cluster, it immediately responds by adding a routing entry to the routing table.

Adding a route item is an operation for a VPC. This operation requires some authorization, because this operation is similar to an offline machine accessing resources on the cloud, so it definitely needs authorization.

The authorization used by the routing controller is bound to the cluster node of the routing controller in the form of RAM role. The RAM role, however, would normally have a set of licensing rules.

Finally, we checked and found that the user had changed the authorization rules, which caused the problem.

< Pay attention to alibaba Cloud’s original public account, reply to the investigation to download e-books >

Course recommended

In order for more developers to enjoy the dividends brought by Serverless, this time, we gathered 10+ Technical experts in the field of Serverless from Alibaba to create the most suitable Serverless open course for developers to learn and use immediately. Easily embrace the new paradigm of cloud computing – Serverless.

Click to free courses: developer.aliyun.com/learning/ro…

“Alibaba Cloud originator focuses on micro-service, Serverless, container, Service Mesh and other technical fields, focuses on the trend of cloud native popular technology, large-scale implementation of cloud native practice, and becomes the public account that most understands cloud native developers.”