Kubernetes best practices in production

DevOps has come a long way since its inception. Platforms including Docker and Kubernetes have also helped companies deliver software applications faster than ever before. At the same time, Kubernetes is gaining popularity and acceptance among enterprise users as a de facto container choreography tool, as the containerization build and release rate of applications continues to rise.

Kubernetes has outstanding features such as support for scaling, zero-interruption deployment, service discovery, automatic failover, and automatic rollback. For managing large-scale container deployments, Kubernetes has become a must-have tool for enterprises that support flexible allocation of resources and workloads, widely used in production environments. But at the same time, Kubernetes application requires operators to spend a lot of time to be familiar with and master it, there is a certain technical threshold. Given that many companies now want to use Kubernetes in production, it is important to take the lead in reviewing best practices in this area. In this article, we will introduce some of the best practices of Kubernetes in a production environment.

Kubernetes performance in production

Garner predicts that by 2022, more than 75% of organizations worldwide will be running containerized applications in production environments. The figure is expected to increase to 85 percent by 2025 from less than 30 percent in 2022. A major reason for the rapid growth is the growing demand for cloud-native software applications in terms of infrastructure automation, DevOps, and operational expertise, tools and technologies that are often hard to find in enterprise IT organizations.

Second, there is a general consensus that running containers in a production environment is not easy and requires a lot of computing resources and effort. There are several container choreography platforms to choose from, but Kubernetes is the only one that has been endorsed and endorsed by major cloud providers.

Third, Kubernetes, containerization, and microservices bring new security challenges as well as technological benefits to enterprise users. Kubernetes’ Pod has the ability to quickly switch between all infrastructure classes, resulting in more internal traffic and associated security risks, plus The fact that Kubernetes tends to be more vulnerable than we expect, As well as the highly dynamic and temporary environment of Kubernetes and the fusion gap between the original security tools, it can be predicted that using Kubernetes is not an easy task.

Finally, Kubernetes’ rich functionality leads to a complex and steep learning curve, and operating in a production environment requires as much care and caution as possible. Enterprises that do not have a professional in this area can consider outsourcing the services of kubernetes-as-A-Service (KaaS) providers to obtain Kubernetes best practices. But assuming that users are managing Kubernetes clusters in a production environment entirely on their own, it is especially important to understand and implement Kubernetes best practices, especially in terms of observability, logging, cluster monitoring, and security configuration.

In conclusion, it is imperative to develop a Kubernetes management strategy to apply best practices in security, monitoring, networking, container lifecycle management, and platform selection. Here are some key measures to consider for Kubernetes application management.

Use the service status probe for health checks

Managing large distributed systems is a complex task, especially when things go wrong. Therefore, it is important to configure The Kubernetes health check to ensure that the application instance works properly. By creating custom health checks, you can better meet the detection needs of users’ environments and applications. Service status probe includes service ready probe and service activity probe.

Ready probe: The purpose is to let Kubernetes know if the application is ready to serve. Kubernetes will always allow service request traffic to POD after confirming that the ready probe has passed detection.

Active probe: The purpose is to help the user confirm that the application is alive and well. If the application is abnormal, Kubernetes will start a new Pod to replace the abnormal Pod.

Resource management

It is a good practice to specify resource requirements and resource limits for individual containers. Another good practice is to separate Kubernetes namespace environments for different teams, departments, applications, and clients. Provides a relatively independent running resource environment to reduce resource conflicts.

Resource use

Kubernetes resource usage Usage of containers/pods in production environment. Therefore, it is important to keep a close eye on resource usage of pods and containers; the more resources used, the higher the running costs.

Resource utilization

Operations teams typically work to optimize and maximize the percentage utilization of Pod allocated resources. Resource usage is often an important indicator of Kubernetes optimization. Arguably, the best optimized Kubernetes environment has the best average CPU utilization of the internal running containers.

Enable the RBAC policy

Role-based access control (RBAC) is a control method that restricts access to users and applications in a system or network.

Kubernetes since version 1.8, the introduction of the RBAC access control technology, using RBAC. Authorization. K8s. IO application API to create the authorization policy. RBAC authorization includes enabling access to users or accounts, adding or deleting permissions, and setting rules. It adds an additional layer of security to the Kubernetes cluster, restricting which access can reach the production environment of the Kubernetes cluster.

Cluster configuration and load balancing

Production-grade Kubernetes infrastructure typically requires high availability with key features such as multiple control nodes and multiple ETCD clusters. The configuration of such clustering features is usually implemented using tools such as Terraform or Ansible.

Generally, when all configurations are complete in a cluster and a Pod is created, the Pod is configured with a load balancer to route traffic to the appropriate application service. The load balancer is not the default configuration of the Kubernetes project, but is provided by the extended integration tool for the Kubernetes Ingress controller.

Annotate Kubernetes objects

Labeling objects such as Kubernetes’ Pod with key/value pairs is often used to mark important object properties, especially those that are significant to users. Therefore, an important practice not to be overlooked when using Kubernetes in a production environment is the use of tags, which can help with batch querying and batch manipulation of Kubernetes objects. Tags also have the unique ability to organize Kubernetes objects into clusters, and a best practice application for this is the ability to group and manage pods by application. In addition, there is no limit to the number and content of tags, and the o&M team can create and use them at will.

Setting network Policies

Network policy Settings are important for the Kubernetes platform in a production environment.

A network policy is essentially an object that allows users to declare and decide which traffic is allowed or prohibited. Kubernetes blocks all unwanted and non-compliant traffic. Therefore, it is strongly recommended that Kubernetes implement network policy configuration as one of the basic and necessary security measures to define and limit network traffic in the cluster.

Each network policy in Kubernetes is defined as a list of authorized connections. Whenever a network policy is created, all pods on the platform have the right to establish or accept the list of connections. Simply put, a network policy is a whitelist of requests to authorize and permit connections. Traffic to a Pod is allowed to pass only when at least one network policy allows it to “input” or “output”.

Cluster monitoring and logging

Monitoring is critical to the health of Kubernetes and directly affects platform configuration, performance, and traffic security. It helps users to know the status of the platform in a timely manner, diagnose problems, and ensure compliance. When cluster monitoring is enabled, logging must be enabled at each layer of the platform so that the logs generated can perform security, auditing, and performance analysis.

Use stateless applications

While this concept is changing as the Kubernetes application organization grows, it is much easier to manage and run stateless applications than stateful ones. In fact, for teams new to Kubernetes, it is recommended to start with a stateless application design. Stateless back-end applications are also recommended to allow developers to deploy applications more efficiently and achieve zero downtime for services. However, the development team needs to ensure that there are no long running connections on the back end that affect the elastic scaling of the runtime environment. Stateless applications are also considered capable of easy migration and rapid scaling based on business needs.

Enabling automatic extension

Kubernetes service deployment has three automatic scaling capabilities: Pod horizontal automatic scaling (HPA), Pod Vertical automatic scaling (VPA), and cluster automatic scaling.

Pod horizontal automatic scaling automatically expands the number of PODS running applications and adjusts replica controllers, replica sets, or state configurations based on CPU utilization.

Pod Vertical automatic expansion It is recommended to set proper CPU and memory requirements and upper limits for applications. VPA can automatically scale and configure the appropriate amount of resources according to the situation.

Automatic cluster scaling scaling is able to scale the resource pool size of the working nodes to automatically adjust the size of the Kubernetes cluster based on current resource usage.

Controls resources at run time

If a Pod is allowed to pull images from a common library without knowing what it is actually running, users should control resources in the cluster of containers they are running to avoid runaway resource usage. If the image is extracted from a trusted registered node, a control policy can be applied on the registered node to limit the extraction to secure and authenticated images.

Keep learning

Continually evaluate, learn, and improve the state of your application. For example, by looking at the container’s historical memory usage, you can determine that less memory can be allocated to save money.

Focus on protecting core services

With the Pod Priority feature, you can set importance levels for different services. For example, RabbitMQ Pods can be configured to take precedence over application pods for better stability. Or configure input controller PODS with higher importance than data processing pods to maintain service availability.

Guarantee zero service downtime

The zero downtime capability of services can be implemented through a full HA architecture to support zero downtime upgrades of clusters and services. In this way, higher service availability is guaranteed for customers. Use Pod anti-affinity configurations to ensure that multiple replica pods are scheduled to different nodes to ensure that planned and unplanned cluster node outages do not affect service availability, or use Pod to interrupt provisioning capabilities to ensure that the minimum number of replicas are retained within the available cost.

Specify a plan for failure

Borrow a quote to understand how to deal with hardware failure. Hardware Eventually Fails. Software eventually works. (Michael Hartung).

conclusion

Kubernetes, as the industry is known, is in fact the standard DevOps platform. The Kubernetes environment running in the production environment must have features and performance characteristics such as availability, scalability, security, resilience, resource management, and monitoring. Since many companies use Kubernetes in production, it is recommended to follow the Kubernetes best practices mentioned above for smooth and reliable operation and management of applications.