Author | Zhang Jintao

This article is part of the “2021 Technology Review” series, focusing on the important progress of Kubernetes Ecology in 2021.

“2021 Year-end Technical Inventory” is a major project launched by Digger, covering Serverless, Service Mesh, big front end, database, artificial intelligence, low code, edge computing and many other technical fields. Looking to the past, to the future, review the development of IT technology in 2021, inventory the major events of IT technology, and forecast the future trend of IT technology. In the meantime, we’re kicking off our 15th tech-themed essay, with a look at what you see as the tech trends of 2022.

As the year 2021 has come to an end, let’s take a look back at Kubernetes and its related ecology.

The 2021 Kubernetes

Starting in April 2021, Kubernetes will release every four months instead of every three months. So in 2021, Kubernetes will release three major releases. Includes v1.21, V1.22, and v1.23.

In terms of the overall function, it mainly focuses on the following aspects.

Resource utilization

Memory Manager (Kubelet)

In Kubernetes V1.21, a new memory manager is added to the Kubelet component ecosystem. In Linux, it guarantees memory and large memory page allocation on multiple NUMA nodes for pods requiring QoS. This feature is particularly useful when database classes or applications that use DPDK for high-performance packet processing are deployed to Kubernetes, where memory is critical to performance.

Here’s a little bit about NUMA. In order to ensure efficiency, nodes are defined as local memory or local memory according to the relative distance between memory and CPU. At the same time, uneven memory allocation may occur due to different actual locations. For example, we can use the Numactl management tool to view the situation on the current machine:

[tao@moelove ~]# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 20 21 22 23 24 25 26 27 28 29
node 0 size: 65186 MB
node 0 free: 9769 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19 30 31 32 33 34 35 36 37 38 39
node 1 size: 65536 MB
node 1 free: 15206 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 
Copy the code

You can see that on my current machine there is a fairly obvious memory imbalance. So when a process reaches its local memory limit, it naturally affects its performance.

Kubelet’s memory manager can be set with –reserved-memory and –memory-manager-policy when starting Kubelet. Such as:

--memory-manager-policy static --reserved-memory 0:memory=1Gi,hugepages-1M=2Gi --reserved-memory 1:memory=2Gi
Copy the code

Note: Memory-manager-policy must be set to static. If it is not set, it defaults to None, meaning that no action is taken.

However, this feature is in its early stages and is currently only supported for PODS of the Guaranteed QoS class. In addition, if this feature is enabled correctly, the details can be seen in the machine’s /var/lib/kubelet/memory_manager_state.

This will eventually affect the topology manager.

QoS of memory resources

Kubernetes is using CGroups V1 for Pod QoS only for CPU resources. Kubernetes V1.22 provides an alpha feature by introducing CGroups V2 that allows QoS for memory resources as well. (If I remember correctly, it seems to be the KEP submitted by Tencent Cloud team)

ReplicaSet reduction algorithm adjustment

The current capacity reduction algorithm is mainly to delete the Pod with the shortest life cycle first. This modification is mainly to avoid some scenarios:

For example, in the reduction of capacity, all the newly expanded Pod to delete and so on. So the plan is to do a logarithmic calculation of them, which can simply be interpreted as a relatively random attempt to clean up the Pod.

This adjustment does avoid the scenario mentioned above, but it may also introduce some other serviceability issues, such as the fact that the longer a Pod runs, the more users the current service may have, and the connection destruction may have a greater impact than the new Pod. (Of course, these can be avoided in other ways, too.)

The Node swap support

This feature is now in Alpha.

Swap isn’t fast enough, but there are many scenarios where swap is needed, especially in Java and Node applications.

There is a 5 year old discussion in Kubernetes’ Issue list about whether swap support can be enabled or not. The current feature, once enabled, applies to the entire Node and is not specific to a Pod.

You can enable this feature by performing the following steps:

  • Enable swap in Node;
  • Open kubeletNodeMemorySwapFeatures;
  • Set up the--fail-on-swap=false
  • Optional in the configuration of KubeletMemorySwap.SwapBehavior=UnlimitedSwap

For more information: github.com/kubernetes/…

HPA V2 API reaches GA

HPA V2 was first proposed about five years ago, and after five years of development, it has now reached the GA level.

security

An alternative to Pod Security Policy

PodSecurity Admission Controller is a replacement for PodSecurity Policies that were deprecated in Kubernets V1.21.

This admission Controller can enable Pod Security Standards at the namespace level in the following three modes:

  • Enforce: Pod violating the policy will be rejected.
  • Audit: Pods that violate the policy will add audit comments, but otherwise will be allowed;
  • Warn: PODS that violate the policy will trigger user-facing warnings;

You can control it through the following configuration files:

apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
  configuration:
    defaults:  # Defaults applied when a mode label is not set.
      enforce:         <default enforce policy level>
      enforce-version: <default enforce policy version>
      audit:         <default audit policy level>
      audit-version: <default audit policy version>
      warn:          <default warn policy level>
      warn-version:  <default warn policy version>
    exemptions:
      usernames:         [ <array of authenticated usernames to exempt> ]
      runtimeClassNames: [ <array of runtime class names to exempt> ]
      namespaces:        [ <array of namespaces to exempt> ]
Copy the code

scalability

New OpenAPI V3

This feature is Alpha level and can be enabled via OpenApiv3 feature Gate.

This feature was added mainly because CRD is currently defined through OpenApi V3, but apI-Server is not currently supported. Some of this information is lost when converting from OpenApi V3 to V2.

More details can be found in KEP #2896

CRD Validation expression language

This is an Alpha level feature and is disabled by default. By increasing CustomResourceValidationExpressions feature gate to open it. This alpha-level feature is introduced separately because extensions to Kubernetes based on Custom Resource Definitions (CRDs) have become popular, but there are limited validation rules that can be added to CRD. More scenes require additional Admission.

This feature uses a Language called Common Expression Language (CEL) for rule definition and adds rules through the X-Kubernetes-validation-Rules field.

For example, a CRDs has the following contents, where minReplicas is less than replicas and Replicas is less than maxReplicas.

. openAPIV3Schema: type: object properties: spec: type: object x-kubernetes-validation-rules: - rule: "self.minReplicas <= self.replicas" message: "replicas should be greater than or equal to minReplicas." - rule: "self.replicas <= self.maxReplicas" message: "replicas should be smaller than or equal to maxReplicas." properties: ... minReplicas: type: integer replicas: type: integer maxReplicas: type: integer required: - minReplicas - replicas - maxReplicasCopy the code

Then, Kubernetes will reject any custom resource created as follows.

apiVersion: "stable.example.com/v1"
kind: CustomDeployment
metadata:
  name: my-new-deploy-object
spec:
  minReplicas: 0
  replicas: 20
  maxReplicas: 10
Copy the code

And return the following error:

The CustomDeployment "my-new-deploy-object" is invalid:
* spec: Invalid value: map[string]interface {}{"maxReplicas":10, "minReplicas":0, "replicas":20}: replicas should be smaller than or equal to maxReplicas.
Copy the code

In this way, it is much more convenient for us to evaluate the Admission system than in the past.

Ease of use

increasekubectl alpha eventsThe command

This command is added mainly because there are some restrictions on viewing events without modifying kubectl get. Therefore, it is more convenient to add kubectl events command to obtain the required information. In particular, event is a piece of information that you often need to look at in Kubernetes. Some typical problems of Kubectl get Events, such as sorting (although it can be solved by adding parameters), watch, and unable to view events according to the timeline, etc.

Kubernetes Ecology 2021

Service Mesh

In the Kubernetes ecosystem, the main difference this year is the arrival of a new player in the Service Mesh category: Cilium Service Mesh.

Previously, typical architectures of Service Mesh were represented by Istio and Linkerd, based on the Sidecar architecture pattern. Automatically inject a Sidecar into the Pod of the application for traffic management and other related capabilities.

But Cilium Service Mesh introduces the Sidecarless model, which is enhanced by eBPF technology to provide greater security, high performance, and visibility.

This will lead to a new round of change.

Serverless

This year, there are some new players in the Severless field, such as OpenFunction, which is open source by Chinese team Qingyun.

In 2012, the concept of Severless was formally proposed. In 2014, Serverless started commercialization. 2021 is the key year for Serverless to be implemented in scale.

2022 outlook

In 2022, Kubernetes’ technology trends will likely revolve around security and eBPF.

security

With the spread of cloud native, more and more companies have moved beyond the initial exploratory phase to actual use or larger scale adoption.

Security has become another core concern. These include supply chain security, DevSecOps, and various security issues with Kubernetes.

In 2022, this aspect will also receive more attention.

eBPF

EBPF technology will become more widespread, not just for observability. It will also be used to improve the overall network performance of the Kubernetes cluster, as well as architectural changes such as Cilium Service Mesh.

All of these will increase the importance of eBPF technology in the cloud native era.

The authors introduce

AI Cloud native Technology expert, Apache APISIX Committer, Kubernetes Ingress-Nginx Reviewer, Containerd Docker/Helm/Kubernetes/KIND of many open source projects such as contributor, “K8S ecological weekly” maintainer, Microsoft MVP. He has a lot of practice and in-depth source code research on container technology such as Docker and Kubernetes. He is one of the core organizers of PyCon China and a lecturer of several well-known conferences in the industry. He has written columns such as “Kubernetes Hands-on Practice” and “Docker Core Knowledge must know and must know”. Public account: MoeLove

Related reading:

Serverless: industry, academic, community blossomed everywhere, domestic manufacturers quickly stuck