Kubernetes production environment best practices

Kubernetes is notoriously hard! Here are some best practices to follow when using it in production. Following these steps ensures greater safety and productivity.

There’s no question that DevOps has come a long way! The Kubernetes orchestration platform allows companies to release software faster than ever before. With the increasing use of containers for building and publishing software, Kubernetes has become the de facto standard for container choreography tools and is very popular among software companies.

Kubernetes has great features such as support for scalability, zero downtime deployment, service discovery, automatic restart and rollback. To manage container deployments on a large scale, Kubernetes is a must. It supports flexible allocation of resources and workloads. There is no doubt that Kubernetes in production is a good solution, but it takes some time to set up and become familiar with the tool. Since many companies now want to use Kubernetes in production, it is important to consider some best practices. In this article, we will discuss some Kubernetes best practices.

Kubernetes in production

Kubernetes is a complex orchestration tool with a steep learning curve, but it is rich in features. Production operations should be handled with as much care as possible. If you are facing an internal talent shortage, you can outsource it to a PaaS vendor to provide you with all the best practices. But suppose you manage Kubernetes alone in production. In this case, it is important to focus on best practices, especially regarding observability, logging, cluster monitoring, and security configuration.

As many of us know, running containers in a production environment is not an easy task. It requires a lot of work and computing resources and so on. There are many orchestration platforms out there, but Kubernetes has gained significant traction and support from most cloud providers.

Bottom line — Kubernetes, containerization, and microservices are all beautiful infrastructure, but they also present security challenges. Kubernetes Pods can quickly switch between all infrastructure classes, resulting in increased internal traffic between pods, causing security concerns. In addition, The attack surface of Kubernetes is generally larger. You have to take into account the fact that Kubernetes’ highly dynamic and new environment doesn’t mesh well with older security tools.

Gartner predicts that by 2022, more than 75% of global organizations will be running container applications in production, up from less than 30% today. By 2025, more than 85% of global organizations will be promoting container applications in production, a significant increase from less than 35% in 2019. On-campus cloud applications require a high degree of infrastructure automation, DevOps, and specialized operational skills that are hard to find in the average IT organization.

So you must use some of Kubernetes’ policies to apply best practices in security, monitoring, networking, governance, storage, container lifecycle management, and platform selection. Let’s take a look at some of Kubernetes’ production best practices.

Running Kubernetes in production is not easy; There are several aspects to pay attention to.

Are survival probes and ready probes used for health checks?

Managing large distributed systems can be complex, especially when problems occur and we are not notified in a timely manner. To ensure that the application instance works properly, it is important to set up the Kubernetes health check.

You can create a customized health check to effectively prevent zombie services from running in a distributed system. You can adjust the health check based on the environment and requirements.

Readiness- Probe Readiness

The purpose of the ready probe is to let Kubernetes know if the application is ready to serve traffic. Kubernetes will always ensure that the ready probe passes before the service is allocated, sending traffic to the Pod.

Liveness- Survival probe

How do you know if your application is alive or dead? Survival probes allow you to do just that. If your application dies, Kubernetes will remove the old Pod and replace it with a new one.

Resource Management- Manages resources

It is a good practice to specify resource requests and limits for individual containers.

Another good practice is to divide the Kubernetes environment into separate namespaces for different teams, departments, applications, and clients.

Kubernetes resource usage

Kubernetes resource usage refers to the number of resources used by containers/pods in production.

So it’s important to keep an eye on pods’ resource usage. One obvious reason is cost, since higher resource utilization proves less resource waste.

Resource Utilization Resource utilization

Ops teams typically want to optimize and maximize the percentage of resources consumed by Pods. Resource usage is one of the indicators of how optimized the Kubernetes environment actually is.

You can consider the average CPU and other resource utilization of the container running in the optimized Kubernetes environment to be optimal.

Enable the RBAC

RBAC stands for role-based access control. It is a method used to restrict access and access to users and applications on a system/network.

They introduced RBAC from Kubernetes 1.8. Use rbac.authorization.k8s RBAC to create an authorization policy.

In Kubernetes, RBAC is used for authorization, and with RBAC, you will be able to grant users, accounts, add/remove permissions, set rules, and so on. Therefore, it basically adds an additional layer of security to the Kubernetes cluster. RBAC restricts who can access your production environment and cluster.

Cluster provisioning and load balancing

Production-level Kubernetes infrastructures typically need to consider certain critical aspects, such as high availability, multiple hosts, multiple ETCD Kubernetes clusters, and so on. The configuration of such clusters typically involves tools such as Terraform or Ansible.

Once the clusters are set up and pods are created for running applications, these pods are equipped with load balancers; These load balancers route traffic to services. The open source Kubernetes project is not the default load balancer; Therefore, it needs to integrate with tools such as NGINX Ingress Controller with HAProxy or ELB, or any other tool that extends Kubernetes Ingress plug-in to provide load balancing capabilities.

Add labels to Kubernetes objects

Labels are like key/value pairs attached to objects, such as Pods. Tags are used to identify attributes of an object that are important and meaningful to the user. One important issue that cannot be ignored when using Kubernetes in production is labeling; The tag allows you to batch query and manipulate Kubernetes objects. Tags are special in that they can also be used to identify and organize Kubernetes objects into groups. One of the best use cases for doing this is to group pods according to the application to which they belong. Here, teams can build and own any number of tag conventions.

Configuring Network Policies

When using Kubernetes, setting up network policies is critical.

A network policy is nothing more than an object that enables you to explicitly declare and decide which traffic is allowed and which is not. In this way, Kubernetes will be able to block all other unwanted and non-compliant traffic. Defining and limiting network traffic in our cluster is one of the basic and necessary security measures that is highly recommended.

Each network policy in Kubernetes defines a list of authorized connections as described above. Whenever any network policy is created, all pods referenced by it are eligible to establish or accept listed connections. Put simply, a network policy is basically a whitelist of authorized and allowed connections — a connection, whether it’s to or from a POD, is only allowed if at least one network policy applied to a POD allows it.

Cluster monitoring and logging

Monitoring the deployment is critical when using Kubernetes. Ensuring that configuration, performance, and traffic remain secure is even more important. Without logging and monitoring, it is impossible to diagnose problems as they occur. Monitoring and logging become important to ensure compliance.

When monitoring, it is necessary to set up logging capabilities at each layer of the architecture. The generated logs will help us enable security tools, audit capabilities, and analyze performance.

Start with stateless applications

It is much easier to run stateless applications than stateful ones, but with the growing number of Kubernetes operators, this idea is changing. For teams new to Kubernetes, it is recommended to start with stateless applications.

A stateless back end is recommended so that the development team can ensure that there are no long-running connections, making scaling more difficult. With stateless, developers can also deploy applications more efficiently with zero downtime.

It is widely accepted that stateless applications can be easily migrated and extended based on business needs.

Enable automatic capacity expansion

Kubernetes has three auto-scaling capabilities for deployment: horizontal Pod auto-scaling (HPA), vertical POD auto-scaling (VPA), and cluster auto-scaling.

Horizontal POD AutoScaler automatically scales deployment, ReplicationController, Replicaset, and StatefulSet based on perceived CPU utilization.

Vertical Pod Autoscaling recommends appropriate values for CPU and memory requests and limits, and it can automatically update these values.

Cluster Autoscaler Expands and reduces the size of a working node pool. It resizes the Kubernetes cluster based on the current utilization.

Controls the mirror pull source

Controls the mirror source that runs all containers in the cluster. If you allow your Pod to pull images from a common resource, you don’t know what’s really running inside.

If they are extracted from a trusted registry, policies can be applied on the registry to extract secure and authenticated images.

Continuous learning

Constantly evaluate the state and Settings of your application to learn and improve. For example, a review of the container’s historical memory usage leads to the conclusion that we can allocate less memory and save money in the long run.

Protect vital services

With Pod priority, you can decide how important it is to set up different services to run. For example, for better stability you need to make RabbitMQ Pods more important than your application pods. Or your entry controller Pods are more important than your data processing Pods to keep the service available to the user.

Zero down time

Enables zero downtime upgrades of clusters and services by running all services in HA. This will also ensure higher availability for your customers.

Use POD anti-affinity to ensure that multiple copies of a POD are scheduled on different nodes to ensure service availability through planned and unplanned cluster node outages.

My Disruptions policies, using POD Disruptions, at all costs ensure that you have the lowest number of POD copies!

Plan to fail

Hardware will eventually fail and software will eventually run. — (Michael Hutton)

conclusion

As we all know, Kubernetes has become the de facto choreography platform standard for DevOps. Kubernetes addresses the storms generated by production environments from the perspective of availability, scalability, security, resilience, resource management, and monitoring. Since many companies use Kubernetes in production, the best practices mentioned above must be followed to scale applications smoothly and reliably.