“K8S Ecology Weekly” mainly contains some recommended weekly information related to K8S ecology that I have come into contact with. Welcome to subscribe zhihu column “K8S Ecology”.

Kubernetes V1.21 is officially released

As the first release in 2021, Kubernetes V1.21s bring a number of great features. A total of 51 feature changes, including 13 upgrades to Stable, 16 projects to Beta, 20 to alpha, and 2 to be deprecated. Let’s take a look at some of the things that I think are important.

CronJob upgraded to Stable

As the name implies, a CronJob is a scheduled/periodic task. Cronjobs were introduced from Kubernetes V1.4 and entered the Beta stage by V1.8. In fact, as of February 2021, CronJobV2 controller has become the default controller version, which means that if you do not want to use CronJobV2 controller when using Kubernetes V1.21, To switch back to the original controller, you need to explicitly disable it, for example:

--feature-gates="CronJobControllerV2=false"
Copy the code

However, I personally recommend using CronJobV2 Controller, which uses delay queues and informer caches. The original controller was a bit clunky and brought some problems, such as infinite Pod leaks when mirroring/services are unavailable.

I use CronJob quite a bit in production, for backup/synchronization tasks, etc., and of course have stumbled on the pitfalls mentioned above, but overall CronJob is a very useful feature.

Memory Manager (Kubelet)

In Kubernetes V1.21, a new memory manager is added to the Kubelet component ecosystem. In Linux, it guarantees memory and large memory page allocation on multiple NUMA nodes for pods requiring QoS. This feature is particularly useful when database classes or applications that use DPDK for high-performance packet processing are deployed to Kubernetes, where memory is critical to performance.

Here’s a little bit about NUMA. In order to ensure efficiency, nodes are defined as local memory or local memory according to the relative distance between memory and CPU. At the same time, uneven memory allocation may occur due to different actual locations. For example, we can use the Numactl management tool to view the situation on the current machine:

[tao@moelove ~]# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 20 21 22 23 24 25 26 27 28 29
node 0 size: 65186 MB
node 0 free: 9769 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19 30 31 32 33 34 35 36 37 38 39
node 1 size: 65536 MB
node 1 free: 15206 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 
Copy the code

You can see that on my current machine there is a fairly obvious memory imbalance. So when a process reaches its local memory limit, it naturally affects its performance. The first time I struggled with NUMA-related issues was when I used MongoDB a lot in previous years, and it took me some time to do so, but that doesn’t stop me from enjoying MongoDB

Kubelet’s memory manager can be set with –reserved-memory and –memory-manager-policy when starting Kubelet. Such as:

--memory-manager-policy static --reserved-memory 0:memory=1Gi,hugepages-1M=2Gi --reserved-memory 1:memory=2Gi
Copy the code

Note: Memory-manager-policy must be set to static. If it is not set, it defaults to None, meaning that no action is taken.

However, this feature is in its early stages and is currently only supported for PODS of the Guaranteed QoS class. In addition, if this feature is enabled correctly, the details can be seen in the machine’s /var/lib/kubelet/memory_manager_state.

This will eventually affect the topology manager.

ReplicaSet reduction algorithm adjustment

The current capacity reduction algorithm is mainly to delete the Pod with the shortest life cycle first. This modification is mainly to avoid some scenarios:

For example, in the reduction of capacity, all the newly expanded Pod to delete and so on. So the plan is to do a logarithmic calculation of them, which can simply be interpreted as a relatively random attempt to clean up the Pod.

This adjustment does avoid the scenario mentioned above, but it may also introduce some other serviceability issues, such as the fact that the longer a Pod runs, the more users the current service may have, and the connection destruction may have a greater impact than the new Pod. Of course, these can also be avoided through other ways, interested partners can leave me a message, discuss together.

Other notable changes have been introduced in the weekly K8S Ecology report, so if you are interested, you can refer to the “upstream progress” section of each issue of K8S Ecology Weekly. Refer to its ReleaseNote for additional changes in this release

Containerd v1.5.0 – rc. 0

This week containerd released v1.5.0-RC.0, its sixth major release, with a number of notable changes:

runtime

  • Annotations have been configured for task Update APIS because of the cri-API update.
  • #4502 Add binary log support when terminal is true;

Distribution

  • Record the exception status code returned by Registry in the log;
  • #4653 improved pull image performance under Registry for HTTP 1.1 protocol;

CRI

  • Cri# 1552 experimental add Node Resource Interface (NRI) injection point;
  • # 4978Allows docker-like configuration of Registry for CRI/etc/docker/certs.dDirectory, can be more convenient configuration;
  • Ocicrypt support is enabled by default.

For more information about changes to this release, see its ReleaseNote

Cilium v1.10.0 – rc0 release

I’ve talked about Cilium several times, but if you’re interested, check out my previous article, “What is Cilium, the Next data surface Chosen by Google?” The project recently released v1.10.0-RC0. Let’s take a look at the major changes:

  • # 13670 cilium/ciliumadded--datapath-mode=lb-onlyOption to support LB only mode, so that cilium-agent can run as a separate loadbalancer, without having to connect to kube-Apiserver or Kvstore.
  • – Added NodePort BPF support to Wireguard, Tun, etc.
  • · Added arm64 support to Cilium/Cilium;

For more information about changes to this release, see its ReleaseNote


Please feel free to subscribe to my official account [MoeLove]