In the era of microservices, cloud computing, and no-service architectures, it’s useful to understand Kubernetes and know how to use it. However, the official Kubernetes documentation is a bit confusing for users new to cloud computing. In this article, we will learn about the important concepts in Kubernetes. In future articles, we’ll also learn how to write configuration files, use Helm as a package manager, create a cloud infrastructure, easily orchestrate services using Kubernetes, and create a CI/CD pipeline to automate the entire workflow. Armed with this information, you can start any kind of project and create a powerful infrastructure.
First, we know that there are many benefits to using containers, from faster deployment to large-scale consistent delivery. Even so, containers are not the solution to all problems because of the overhead associated with using containers, such as maintaining a container choreography layer. So, you need to analyze the costs/benefits at the beginning of the project.
Now, let’s start our Kubernetes world tour!
Kubernetes hardware architecture
node
Nodes are worker machines in Kubernetes and can be any device with CPU and RAM. For example, a smartwatch, smartphone or laptop, or even a raspberry PI can be a node. When we use the cloud, the node is a virtual machine (VM). So, simply put, a node is an abstraction of a single device. The advantage of this abstraction is that we don’t need to know the underlying hardware structure. We only use nodes, so our infrastructure is platform independent.
The cluster
A cluster is a group of nodes. When you deploy your application to a cluster, it automatically distributes the work to each node. If more resources are needed (in short, we need more money), new nodes will be added to the cluster and work will be automatically reassigned.
We run our code on a cluster, but we don’t need to care about which parts of the code are running on which nodes. The assignment of work is automatic.
Persistent volumes
Because our code can be moved from one node to another (for example, if one node doesn’t have enough memory, work will be rescheduled to another node that does), saving data on a node is easy to lose. If we want to store our data permanently, we should use persistent volumes. A persistent volume is a bit like an external hard drive that you can plug in and store your data on.
Kubernetes, developed by Google, is a platform for stateless applications whose persistent data is stored elsewhere. As the project matures, many enterprises want to use it in stateful applications, so developers need to add persistent volume management. As with earlier virtualization technologies, database servers are generally not the first servers to migrate to the new architecture. This is because databases are at the heart of many applications and can contain a lot of important information, so local database systems are typically large in virtual or physical machines.
So, the question is, when should we start using persistent volumes? To answer this question, we should first understand the different types of database applications.
We classify data management solutions into the following two categories:
- Vertical scaling – this includes traditional RDMS solutions such as MySQL, PostgreSQL, and SQL Server
- Horizontal scaling – includes “NoSQL” solutions such as ElasticSearch or Hadoop-based solutions
Vertical scaling solutions such as MySQL, PostgreSQL, and Microsoft SQL should not be applied inside containers. These database platforms require high I/O, shared disks, block storage, and so on, and cannot handle node loss within the cluster, which is often the case in container-based ecosystems.
Containers can be used for horizontal scaling applications such as Elastic, Cassanda, Kafka, and so on. They can withstand node loss in a database cluster and database applications can recover and balance themselves.
In general, you should containerize distributed databases to take advantage of redundant storage techniques and handle node loss within a database cluster (ElasticSearch is a good example).
Kubernetes software components
The container
One of the goals of modern software development is to ensure that various applications can be isolated from each other on the same host or cluster. Virtual machines are a solution to this problem. But virtual machines need their own operating systems, so their size is usually gigabytes.
Containers, on the other hand, can isolate the application’s execution environment but share the underlying operating system. So, a container is like a box in which we can hold everything we need to run an application: code, runtime, system tools, system repositories, Settings, and so on. They typically require only a few megabytes to run, far fewer resources than virtual machines, and can be started immediately.
Pods
A Pod is a set of containers. In Kubernetes, the smallest unit is Pod. A POD can contain multiple containers, but typically we only use one container per pod, because the minimum copy unit in Kubernetes is pod. If we want to expand each container individually, we can add a container to the Pod.
Deployments
The initial function of Deployment is to provide declarative updates to both POD and ReplicaSet (where the same pod is replicated many times). Using Deployment, we can specify how many copies of the same POD should be running at any time. Deployment is similar to a POD manager that automatically starts the required number of pods, monitors pods, and recreates pods in the event of a failure. Deployment is extremely useful because you do not need to create and manage each pod individually.
We usually use Deployment for stateless applications. However, you can preserve the state of the Deployment and make it stateful by attaching a persistent volume to it.
Stateful Sets
StatefulSet is a new concept in Kubernetes and is a resource for managing stateful applications. It manages the deployment and the extension of a set of Pods and ensures the sequence and uniqueness of these pods. It is similar to Deployment, except that Deployment creates a set of pods with arbitrary names and the order of pods is not important to it, whereas StatefulSet creates pods with unique names and order. So, if you wanted to create three copies of pod named example, StatefulSet would be created as: example-0, example-1, example-2. Therefore, the most important benefit of this approach is that you can get an idea of what is going on just by looking at the name of the pod.
DaemonSets
DaemonSet ensures that the POD runs on all nodes of the cluster. If a node is added/removed from the cluster, DaemonSet automatically adds/removes the POD. This is important for monitoring and logging because you can monitor each node without having to manually monitor the cluster.
Services
Deployment is responsible for keeping a set of Pods running, so Service is responsible for starting network access for a set of Pods. Services can provide standardized features across clusters: load balancing, service discovery between applications, and deployment of applications with zero downtime. Each service has a unique IP address and DNS host name. You can manually configure the IP address or host name for the application that needs to use the service, and the traffic will be load-balanced to the correct POD. In the section on external traffic, we’ll learn more about the types of services and how we communicate between internal services and the external world.
ConfigMaps
If you want to deploy to multiple environments, such as staging, development, and production, it’s not a good idea to configure bake into your application because of the differences between the environments. Ideally, you want different configurations for each deployment environment. ConfigMap was born. ConfigMaps lets you decouple configuration artifacts from images to keep your containerized application portable.
External flow
Now that you know about services running in a cluster, how do you get external traffic to your cluster? There are three service types that can handle external traffic: ClusterIP, NodePort, and LoadBalancer. There is a fourth solution: add another layer of abstraction, called the Ingress Controller.
ClusterIP
ClusterIP is the default service type in Kubernetes that allows you to communicate with other services within a cluster. Although ClusterIP is not designed for external access, external traffic can access our services as long as some changes are made using proxies. Do not use this solution in a production environment, but you can use it for debugging. Services declared ClusterIP should not be directly visible from the outside.
NodePort
As we saw in the first part of this article, the POD is running on the node. Nodes can be a variety of different devices, such as laptops or virtual machines (but running in the cloud). Each node has a fixed IP address. By declaring a service as a NodePort, the service exposes the node IP address so that you can access it externally. You can use NodePort in a production environment, but for large applications with many services, manually managing all the different IP addresses is cumbersome.
LoadBalancer
By declaring a service of type LoadBalancer, you can use the cloud provider’s LoadBalancer to expose externally. How the external Load Balancer routes traffic to the service Pod is up to the cluster provider. With this solution, you don’t have to manage all the IP addresses of every node in the cluster, but you will have a Load balancer for each service. The downside is that each service has a separate Load Balancer instance, and you will pay according to the Load Balancer instance.
This solution is suitable for production environments, but it is somewhat expensive. Next, let’s look at a slightly cheaper solution.
Ingress
Ingress is not a service, but an API object that manages external access to cluster services. It enters your cluster as a reverse proxy and a single entry point, routing requests to different services. I usually use the NGINX Ingress Controller, which acts as a reverse proxy and also acts as SSL. The best production solution for exposing the Ingress is to use a load balancer.
With this solution, you can use a single Load Balancer to expose any number of services, so you can keep costs to a minimum.
conclusion
In this article, we learned about the basic concepts in Kubernetes and its hardware architecture. We also discussed different software components, such as Pod, Deployment, StatefulSets, and Services, and learned how Services communicate with the outside world. Hopefully this will help you comb through the intricacies of Kubernetes component architecture again.