How to build a suitable Kubernetes cluster on Aliyun

Abstract:

The statement

This article mainly introduces how to build a K8S cluster in Ali cloud practice, just as a reference, we can make adjustments according to the actual situation.

Cluster planning

In actual cases, it is found that many students use a lot of small ECS to build K8S cluster, which does not achieve the purpose of saving money and does not give full play to the advantages of K8S cluster. Because there are several downsides to building clusters with lots of small ECS:

Woker ECS with small specifications has limited network resources
If a container can take up almost one small ECS, then the remaining resources of the machine can not be utilized (building new containers or recovering failed containers), and in the case of a large number of ECS, it is a waste.

So how to choose the Worker ECS specification?

Determine the total number of cores in daily use and the tolerance of availability for the entire cluster. For example, the total number of cores is 160, while tolerating 10% errors. Then choose a minimum of 10 machines with ECS of 16 cores, and the peak load should not exceed 160
90% = 144 cores. If the tolerance is 20%, choose a minimum of 5 32-core machines with no more than 160 peak loads

80% = 128 cores. In this way, even if there is a whole machine crash can support business operation.
But this calculation is theoretical, because smaller machines are likely to have a higher percentage of unusable resources. So it’s not that smaller machines are better.
Choose the RATIO of CPU to Memory. For applications that use a large amount of memory, such as Java applications, you are advised to use a 1:8 model.

Some benefits of high specification ECS:

The advantage of high specifications is that the network bandwidth is large, and the resource utilization is high for large-bandwidth applications.
The proportion of container built communication in a machine increases, reducing network transmission
The pull mirror is more efficient. Because the image only needs to be pulled once to be used by multiple containers. For ECS of small size, the number of pull mirrors will increase at this time. In the scenario where the linkage ECS is required to scale, it takes longer time, but cannot achieve the purpose of immediate response

Choose Dragon server

Ali Cloud has launched bare-metal server: DpCA. Two typical scenarios of DPCA services are selected:

If the daily scale of the cluster can reach 1000 cores, it is recommended to choose all Shenlong servers. This allows you to build a cluster of 10 to 11 DpCA servers.
When a large number of containers need to be rapidly expanded, especially when e-commerce is greatly promoted and traffic peaks are met, shenlong service can be considered as the new node, so that one shenlong can support the operation of many containers.

Dpca services, as the basis for building container clusters, also have the following benefits:

Super Network: equipped with RDMA technology. Through Terway container network, full play of hardware performance across the host container bandwidth of more than 9Gbit/s
Computing performance Zero jitter: Self-developed chips replace hypervisors, eliminating virtualization overhead and resource preemption
Security: physical level encryption, support Intel SGX encryption, trusted computing environment, support blockchain and other applications

Build cluster option notes

There are a number of options to be aware of when building a K8S cluster:

Network selection

If an external service, such as RDS, needs to be connected, reuse the existing VPC instead of creating a new VPC. VPCS are isolated from each other. But you can create a new switch and put all the K8S machines on that switch for easy management.
Network plug-in selection: Flannel provides direct access to the VPC with the highest performance. One is Terway, which provides network policy management for K8S.
POD CIDR, a network of pods for the entire cluster. This can’t be set too small. Because the Settings are too small, the number of nodes that can be supported is limited. This is related to the “number of pods per node” in the advanced option. For example, if POD CIDR is the /16 network segment, then there are 256*256 addresses, and if the number of pods per point is 128, a maximum of 512 nodes can be supported.

Disk selection

Try to select SSDS
For Worker nodes, try to select “Hang on data Disk”. /var/lib/docker is used to store local images. Avoid too many mirrors that burst the root disk. After running for a period of time, there are many useless local mirrors. A faster way is to go offline, rebuild the disk, and go online again.

Daily operation and maintenance Settings

For ECS monitoring, CPU, Memory, and disk alarms must be set for routine o&M. Again, try to keep /var/lib/docker on a separate disk
Be sure to configure log collection

Whether the Worker node needs to be built immediately

The current ECS model for cluster creation is charging by volume. If you need to package the year and month, you can consider not creating Worker nodes first, and then after creating K8S cluster, purchase ECS separately and add them to the cluster.

The stability of K8S is considered

Reference yq.aliyun.com/articles/59…

Serverless Kubernetes

If managing and maintaining a Kuberntes cluster is too much trouble, why not try our Serverless Kubernetes?

Author:
The elder brother – duff

The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.