Kubernetes high availability cluster deployment something to say

We have set up a high availability cluster with 3Master nodes and 4 nodes. This article discusses the summary of the process.

High availability policy

The key to Kubernetes cluster availability is the high availability of the Master node. There are two kinds of high availability topologies:

Stacked ETCD Topology

In this scheme, kuBE-Apiserver, KuBE-Controller-Manager, KuBE-Scheduler, etCD and other components are deployed on all Master nodes. Both Kube-Apiserver communicate with the local ETCD, which synchronizes data between the three nodes
External ETCD Topology

In this scheme, all Master nodes are deployed with kuBE-Apiserver, KuBE-Controller-Manager and Kube-Scheduler components, while etCD distributed data storage cluster runs independently. This topology decouples the Master node component from the ETCD.

A vcontrol plane node is the Master node

Analysis of high availability solutions

According to the official topology, there are two key points:

etcd
load balancer

In other words, the entire Kubernetes cluster is highly available as long as both of the above are accomplished.

Etcd high availability

Kubernetes chose ETCD as its back-end data storage warehouse because of its distributed architecture and no single point of failure. Although single-node ETCD can also work, the recommended deployment solution is to form an ETCD cluster of three or five nodes for Kubernetes to use.

The high availability scheme of ETCD is the most obvious difference between the two official topologies, and the difference is the location of ETCD. There is also an official comparison between the two ETCD schemes, and MY summary is as follows:

Stacked way
1. Less hardware resources are required
2. The deployment is simple and easy to manage
3. Horizontal scaling is easy
4. The disadvantage is that when a host machine is down, the Master node and ETCD are missing, which greatly reduces the redundancy and robustness of the cluster.
External way
1. The Master node is decoupled from the ETCD. The failure of the Master node does not affect the ETCD, improving the cluster robustness
2. The disadvantages are that the hardware resources are almost twice that of the stack solution, and the operation and maintenance complexity is higher.

Both personally and as an enterprise, I have no hesitation in choosing stacking, ease of scale-out, and the need for hardware. In addition, in the Scaling Kubernetes to 2,500 Nodes, the practice proved that the way etCD mounted local SSDS significantly improved the performance of very large clusters (with more than 2000 Nodes).

In addition to the above two ways, I found another way mentioned in an article when SURFING the Internet:

Self-hosted EtCD proposed by CoreOS

Run etCD on top of Kubernetes, which is supposed to serve Kubernetes underneath. Kubernetes to achieve their own dependent component management. In this mode, etCD clusters can directly use etcd-operator to automate operation and maintenance, which is the best use of Kubernetes.

Of course, there is not much information about this method, and no specific practice method has been found. Moreover, there are many experimental prefixes in the official documents, and the official documents are as follows:

The self-hosted portion of the control plane does not include etcd, which still runs as a static Pod.

Also do not know is once in the version of support later cancelled or what, here do not go into it, just mention a topic. Everything is subject to the two ways of official documents.

Kube – apiserver high availability

In the official topology, it is easy to ignore the load Balancer node except for etCD. Kube-apiserver node implements high availability through load balancer, but there is no official recommended solution. Common ideas or schemes can be summarized as follows:

External load balancer

Whether you use a load balancer service provided by the public cloud or build your own load balancer using LVS or HaProxy in the private cloud, you can fall into this category. Load balancer is a very mature scheme. How to ensure the high availability of load balancer is a new problem to be considered when choosing this scheme.

At present, most of the information on the Internet is through HAproxy to achieve kube-Apiserver high availability and Keepalived way to ensure that HAproxy itself high availability, the previous article we use is this way, just start to follow the flow is always wrong.
The network layer performs load balancing

For example, ECMP can be implemented using BGP on the Master Node or NAT can be implemented using iptables on the Node. This solution does not require additional external services, but requires a certain amount of network configuration.
Reverse proxy is used to load balance multiple masters on Node nodes

This solution also does not rely on external components, but how to dynamically configure the load balancer on the Node becomes another problem to be solved when the Master Node is added or removed.

At present, in addition to their own way, for other ways still need to be left for follow-up research and try.

Kube-controller-manager and Kube-Scheduler are highly available

These two services are components of the Master node, and their high availability is relatively easy, requiring only running multiple instances. Kubernetes itself ensures that only one replica is currently running through the leader election mechanism.

As for the mechanism of leader election, simply put: Multiple copies of Kube-Controller manager and Kube-Scheduler preempt the endpoint lock resource. The one who can grab and write his/her information into the annotation of the endpoint becomes the leader. The leader needs to update the message at a configurable interval, and other nodes can see the current leader and expiration time. When the expiration time expires and the leader has not been updated, the other replicas attempt to acquire the lock, competing with the Leader until they become the leader themselves.

So much for Kubernetes’ high availability solution.

ipvs VS iptables

I used IPVs instead of the default iptables for the previous deployment, initially thinking that the custom version would be better than the default version, but actually doing some research on it myself.

From the previous anthology group configuration file kubeadm.yaml, you can see that the IPVS configuration is for the Kube-Proxy component. Kube-proxy is a key component in Kubernetes. His role is to load balance between the service (ClusterIP and NodePort only) and the back-end Pod. Kube-proxy operates in three modes, each with a different implementation technology: Userspace, iptables, or IPVS. The Userspace pattern is old, slow, and no longer recommended.

iptables

Iptables is a Linux kernel feature that is an efficient firewall and provides a lot of packet processing and filtering capabilities. It can Hook a set of rules on the core packet processing pipeline. In iptables mode, kube-proxy implements NAT and load balancing functions in NAT pre-routing Hook. This approach is simple and effective, relies on mature kernel functionality, and works well with other applications that work with Iptables, such as Calico.

However, the use of Kube-proxy is an O(n) algorithm, where n grows with the cluster size, which is more specifically the number of services and back-end pods.
ipvs

IPVS is a Linux kernel feature for load balancing. In IPVS mode, Kube-Proxy uses IPVS load balancing instead of IPTable. This pattern also works, IPVS was designed to load balance a large number of services, with an optimized API and an optimized lookup algorithm rather than simply looking up rules from a list.

Thus, the complexity of kube-proxy connection in IPVS mode is O(1). In other words, his connection processing efficiency is mostly independent of cluster size.

As an independent load balancer, IPVS includes a variety of load balancing algorithms such as polling, minimum expected latency, minimum connections and various hashing methods. Iptables has only one randomly equal selection algorithm.

One potential drawback of IPVS is that it processes packets in a different path than the iptables filter normally does. If you plan to use IPVS in an environment where other programs use Iptables, some research needs to be done to see if they can work together. (Calico is now compatible with IPVS Kube-Proxy)

I have posted an article about the kube-Proxy model comparison: Iptables versus IPVS? Comparing Kube-proxy modes: iptables or IPVS? Thanks to the two authors, this article compares the advantages and disadvantages of the two models in detail. I just posted the conclusion:

Kube-proxy’s IPVS mode provides better performance with more than 1000 services. Although there may be a number of different cases, it is generally better to have microservices running modern kernels with persistent connections. If you are running an older kernel or cannot use persistent connections, IPVS mode may be a better choice.

Performance issues aside, the IPVS pattern has the added benefit of having a wider selection of load balancing algorithms.

If you are not sure whether IPVS is appropriate, stick with the Iptables schema. This traditional model is supported by a large number of production cases and is an imperfect default option.

My own implicit summary is that there is no difference at the scale of an ordinary business.

About ipvS configuration when initializing the cluster

When initializing the cluster using kubeadm, I turned on the ipvS configuration as follows:

#misconfigurationapiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration featureGates: SupportIPVSProxyMode: True # This configuration error error error!! Kubeproxy component will fail to start and retry!! mode: ipvs
#Configured correctly
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs

Copy the code

Enable IPVS in the established cluster

Modify kube-proxy configuration

kubectl edit configmap kube-proxy -n kube-system minSyncPeriod: 0s scheduler: "" syncPeriod: 30s kind: KubeProxyConfiguration metricsBindAddress: 127.0.0.1:10249 mode: "ipvs" # modify nodePortAddresses: nullCopy the code

Delete all kube-proxy pods

kubectl delete pod xxx -n kube-system
Copy the code

check

Kubectl logs kube-proxy-xxx-n kube-system Using ipvs Proxier

Check the IPVS proxy rules

kubectl get svc --all-namespaces

#You can see a number of rules for service
ipvsadm -ln 
Copy the code

Kubelet’s default cgroup driver

Error occurred during initialization:

failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
Copy the code

Cgroupfs driver = systemd; Docker driver = cgroupfs; Docker driver = systemd; Docker driver = cgroupfs; So both Docker and Kubelet need to be set to Systemd for actual deployment.

Keepalived Configuration file notes

Test script

vrrp_script chk_haproxy { script "/bin/bash -c 'if [[ $(netstat -nlp | grep 9443) ]]; then exit 0; else exit 1; Interval 2 is executed every 2 seconds. Weight 11 is executed every 2 seconds.Copy the code

Check whether scripts are healthy. If Haproxy is installed in binary mode, determine the process. If it is installed in docker mode, the light judging process is invalid, and it is necessary to determine whether the port number of Haproxy has listener.

Of course, I think the most perfect way should be to judge the state of Apiserver, which needs to be improved later, leave a question to be dealt with.

Note the configuration file

interface

The default Keepalived file interface is eth0
```
interface eth0
Copy the code
```
Keepalived will fail if it depends on your situation
unicast_peer

First of all, my three Master nodes are cross-network segment, keepalived default mode is multicast, does not support cross-network segment, so I started with the default configuration, there are two masters in different network segment, the solution is to change to unicast mode:
```
Unicast_peer {10.128.2.53 10.128.2.52}Copy the code
```
Nopreempt preempt/non-preempt

In preemption mode, when the master recovers from a fault, the VIP is stolen from the BACKUP server. In non-preemption mode, the master does not steal the VIP from the BACKUP server after the BACKUP server recovers from a fault.
```
vrrp_instance myland_slb {
    ...
    state MASTER  # 状态
    nopreempt    #不抢占表示
    priority 90  # 优先级（0~255）
    ...
}
Copy the code
```

Kubectl get CS deprecated

In the first article, I encountered the above problems when setting up a cluster quickly, and also gave a solution: Check the configuration file/etc/kubernetes manifests/kube – scheduler. Yaml and/etc/kubernetes manifests/kube – controller – manager. Yaml and configuration items of the two configuration files –port=0

However, the configuration file was restored after the version upgrade. This is not common sense, indicating that the modification of the configuration file is not approved by the authorities. Kubectl get CS has been deprecated since version 1.19. Port =0 is added to meet the requirements of CIS Kubernetes security white paper. It is not recommended to enable this HTTP port.

Reference article:

# 553

# 1998

You can directly view the pod running status of related components

kubectl get po -n kube-system
Copy the code

Configure the automatic completion command

All master nodes execute

#Install the bash autocomplete plug-in
yum install bash-completion -y
Copy the code

Set kubectl and kubeadm command completion, and login takes effect next time

kubectl completion bash >/etc/bash_completion.d/kubectl
kubeadm completion bash > /etc/bash_completion.d/kubeadm
Copy the code