Analysis of Kubernetes controller

This is the 23rd day of my participation in the August More Text Challenge

What is a controller?

Kubernetes has a number of controller types, used to control the pod state, behavior, copy number, etc., the controller through the POD label to control the POD, so as to achieve the operation and maintenance of the application, such as expansion, upgrade, etc.

Common controller types are as follows:

ReplicationController, ReplicaSet, Deployment: stateless service, ensure that the number of copies specified by Pod can be run at any time, to ensure that Pod is always available, support rolling update, rollback. Typical usage: Web services.
DaemonSet: Ensure that all (or some) nodes in the cluster have a POD assigned to them. If a new node is added, the pod will be automatically assigned. Typical usage: FileBeat log collection, Prometheus resource monitoring.
StatefulSet: stateful services, such as various data storage systems. Services within StatefullSet have stable persistent storage and network identity, orderly deployment and scaling.
Job: A job that runs only once.
CronJob: Jobs that run periodically. Typical usage: The database is backed up periodically.
Horizontal Pod Autoscaling (HPA): Automatically scale the number of pods according to the CPU or memory of the desired pod.

Why do we need controllers?

Let’s consider some of the scenarios we might encounter if we had a Pod that was providing services online:

One operation was very successful, the site traffic suddenly exploded
The node running the current Pod failed and the Pod is no longer in service

For the first case, it may be easier to deal with. Before the general activity, we will roughly calculate how much traffic there will be, start several more PODS in advance, and then kill the extra pods after the activity. Although it is a bit troublesome, we should be able to deal with this situation.

In the second case, you might get a bunch of alarms one night saying that the service is down, and then get up and start your computer and restart a new Pod on a different node, and the problem will be solved.

If we try to solve these problems artificially, it’s like going back to the days of slash-and-burn, right? If only there were a tool to help us manage Pod, Pod is not enough to automatically help us to add a new Pod, Pod hung automatically help us to restart a Pod on the appropriate node, if so, is not encountered above the problem, we do not need to manual to solve.

Classification of Pod

Pod is divided into autonomous Pod and controller managed Pod.

Autonomous Pod: The Pod will not be created automatically after exiting, as shown in the figurePod A.
Controller managed PODS: The number of Pod copies is maintained throughout the life of the controller, as shown in the figurePod B.

ReplicationController, ReplicaSet, Deployment

ReplicationController(RC)

In simple terms, RC guarantees the number of copies of pods running at any time, ensuring that pods are always available. If the number of pods is greater than the specified number, terminate the excess. If the number is lower than the specified number, start some new pods. When a Pod fails, is deleted, or dies, RC automatically creates a new Pod to keep the number of copies, so we should use RC to manage our pods even if there is only one Pod.

In short, RC is used to ensure that the number of Pod objects it controls meets the desired number. It does the following:

Ensure that the number of Pod resources accurately reflects the expected value
Ensure Pod health
Elastic scaling

The following is an example of ReplicationController:

apiVersion: v1
kind: ReplicationController
metadata:
  name: kubia
spec:
  replicas: 3
  selector:
    app: kubia
  template:
    metadata:
      labels:
        app: kubia
    spec:
      containers:
      - name: kubia
        image: luksa/kubia
        ports:
        - containerPort: 8080

Copy the code

ReplicationController has three main parts:

Label Selector: Used to determine which pods are in the ReplicationController scope
Replica Count: specifies the number of pods that should be run
Pod Template: Used to create a new POD copy

Problems of RC:

In most cases, we can control the number of Pod copies created and copied by defining an RC implementation that contains a complete Pod definition module (excluding apiVersion and KIND). RC controls Pod copies through the Label selector mechanism. By changing the number of Pod replicas in the RC, the Pod can be expanded or shrunk. Update the Pod template by changing the image version of the Pod template in RC. (One-click rollback is not supported, you need to change the image address in the same way)

ReplicaSet(RS)

With the rapid development of Kubernetes, it has been recommended that we use RS and Deployment instead of RC. In fact, RS and RC function is basically the same, the only difference is that RC only supports equal-based selector (e.g. Env =dev or environment! = QA), but RS also supports collection-based selectors (e.g., Version in (v1.0, v2.0)), which makes it easier to manage complex operations.

An example of ReplicaSet is shown below:

apiVersion: apps/v1beta2
kind: ReplicaSet
metadata:
  name: kubia
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kubia
  template:
    metadata:
      labels:
        app: kubia
    spec:
      containers:
      - name: kubia
        image: luksa/kubia

Copy the code

The only difference between ReplicationController and ReplicaSet is that instead of listing the pod labels directly in the selector property, you specify them under selector. MatchLabels. This is a simpler (and less expressive) way to define a tag selector in ReplicaSet.

In addition to the selector. MatchLabels specified outside, still can use the selector. MatchExpressions specified, as shown below:

selector:
  matchExpressions:
    - key: app
      operator: In
      values:
        - kubia
Copy the code

The more expressive tag selector using ReplicaSet has four valid operators:

InThe value of: Label must match one of the values specified.
NotIn: The value of Label does not match any specified values.
Exists: POD must contain a tag with the specified name (value is not important). When using this operator, the VALUES field should not be specified.
DoesNotExist: POD must not contain a tag with the specified name. The VALUES attribute cannot be specified.

If you specify multiple expressions, all of them must be true for the selector to match pod. If both matchLabels and matchExpressions are specified, then all labels must match, and all expressions must evaluate to true for the pod to match the selector.

Deployment

However, RS is rarely used in isolation. It is mainly used by Deployment, a higher level resource object, unless users need to customize the upgrade capability or do not need to upgrade the Pod at all. Deployment provides a declarative definition method for Pod and ReplicaSet.

Typical application scenarios:

It is used to create Pod and ReplicaSet, roll update and rollback, expand and shrink, pause and restore.

In general, we recommend Deployment rather than Replica Set.

Deployment is based on ReplicaSet and provides declarative updates for Pod and ReplicaSet resources. It has the following features:

Event and status viewing: You can view the detailed progress and status of the Deployment object upgrade
Rollback: If problems are found after the upgrade, the application can be returned to the specified historical version
Version logging: Every operation on the Deployment object is saved
Pause and Start: Each upgrade can be paused and started at any time
There are many automatic update schemes: musharing-reconstruction update and RollingUpdate

The Deployment update strategy is described as follows:

RollingUpdate policy: The number of pods in the old controller is decreasing while the number of pods in the new controller is increasing. The following two properties:maxSurge: The total number of pods that exist during an upgrade can be up to the expected number, which can be numerical or percentage.maxUnavailabe: The number of pods (old and new versions) that are normally available during an upgrade cannot be less than the desired number, which can be a numerical value or percentage.
Concept strategy: Create new pods only after deleting old ones. Use this strategy if your application does not support multiple versions of the service and needs to completely deactivate the old version before starting the new one. Using this strategy, however, can cause the application to become temporarily unavailable.

An example of Deployment looks like this:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: kubia
spec:
  replicas: 3
  template:
    metadata:
      name: kubia
      labels:
        app: kubia
    spec:
      containers:
      - image: luksa/kubia:v1
        name: nodejs
Copy the code

ReplicationController, ReplicaSet, Deployment coordination process

ReplicationController, ReplicaSet, Deployment

ReplicaSet is the new ReplicationController and is recommended to replicate and manage PODS instead of ReplicationController.

Also, when using Deployment, the actual Pod is created and managed by Deployment’s Replicaset rather than directly by Deployment.

StatefulSet

background

Deployment is not sufficient to cover all application choreography problems because, in its view, all the pods for an application are exactly the same, so there is no order between them and it doesn’t matter which host they are running on. When needed, Deployment creates a new Pod from the Pod template. When not needed, “kill” any Pod. However, in actual scenarios, not all applications meet such requirements. For example, multiple instances save a copy of data on the local disk. Once these instances are killed, the mapping between the instances and data is lost even after they are reconstructed. As a result, the application fails. Applications with unequal or dependent relationships between these instances are called stateful applications.

StatefulSet briefly

StatefulSet is essentially a variant of Deployment. In order to solve the problem of stateful service, the Pod managed by StatefulSet has a fixed Pod name and start/stop sequence. In StatefulSet, the Pod name is called hostname and shared storage must be used. In Deployment, the corresponding Service is Service, while in StatefulSet, the corresponding Headless Service is Headless Service, which differs from Service in that it has no Cluster IP. Resolving its name returns a list of the endpoints of all the pods that the Headless Service corresponds to. In addition to Headless Service, StatefulSet creates a DNS domain name for each Pod copy controlled by StatefulSet in the following format: $(podname). (headless server name), FQDN (fully qualified domain name) format for: $(podname). (headless server name). The namespace. The SVC. Cluster. The local

StatefulSet abstracts real-world application states into two cases:

Topological state: In this case, multiple instances of an application are not completely peer. These application instances must be started in some order. For example, the primary node A of an application must be started before B, so when I delete A and B nodes and create them again, the order will be the same. Also, the newly created A and B network identifiers must be the same as the original A and B network identifiers so that original visitors can access the new Pod in the same way.
Storage state: In this case, multiple instances of the application are bound to different storage data. For these application instances, Pod A should read the same data the first time it reads the same data ten minutes later, even if Pod A has been recreated in the meantime.

So the core function of StatefulSet is to somehow record these states and then restore them for the new Pod when it is recreated.

Headless Service

Before we dive into StatefulSet, let’s talk about the Headless Service.

As we know, Service is a mechanism used in the Kubernetes project to expose a set of Pods to external access. For example, if there are three PODS in a Deployment, THEN I can define a Service and users can access a specific Pod as long as they have access to the Service. But how is this Service accessed?

The first mode is the VIP(Virtual IP address) of the Service. For example: when I visit192.168.0.1The IP address of this Service is a VIP. In practice, it forwards the request to the specific Pod of the Service proxy.
The second method is the DNS mode of Service. Here is divided into two processing methods: the first isNormal Service. In this case, when the DNS record is accessed, the Service VIP is resolved. The second isHeadless Service. In this case, when the DNS records are accessed, the IP address of a Pod is resolved.

As you can see, Headless Service does not need to assign a VIP, but can directly resolve the IP address of the proxied Pod in the form of DNS records. (For Headless Service, the spec.clusterIP in the Service distribution file is None, which does not allow the Service to obtain the clusterIP. The POD is directly used for DNS resolution.) What are the benefits of this design?

This is designed so that Kubernetes assigns a unique “resolvable identity” to a Pod. With this identity, once you know the name of a Pod and the name of its corresponding Service, you can confidently access the Pod IP address through the DNS record.

Create StatefulSet

Unlike ReplicaSet, a Pod created by StatefulSet has a regular name (and host name).

StatefulSet failed restart mechanism

StatefulSet replaces it with a new Pod with the exact same identity, and ReplicaSet replaces it with an unrelated new Pod.

An example of StatefulSet looks like this:

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: kubia
spec:
  serviceName: kubia
  replicas: 2
  template:
    metadata:
      labels:
        app: kubia
      spec:
        containers:
        - name: kubia
          image: luksa/kubia-pet
          ports:
          - name: http
            containerPort: 8080
          volumeMounts:
          - name: data
            mountPath: /var/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      resources:
        requests:
          storage: 1Mi
      accessModes:
      - ReadWriteOnce
Copy the code

In this YAML file, there is a serviceName=kubia field. This field tells the StatefulSet controller to use the Kubia Headless Service to ensure Pod “resolvable identity” when executing the control loop. Thus, during Pod creation, StatefulSet names and numbers all pods it manages so that each Pod instance is unique. More importantly, these pods were created in strict sequence.

The difference between StatefulSet and Deployment

Deployment applies to stateless applications. StatefulSet applies to stateful applications.
There is no sequence between Deployment pods; StatefulSet’s PODS are deployed, extended, updated, and removed in sequence.
Shared storage for all pods in Deployment; StatefulSet has its own store for each pod, so volumeClaimTemplates is used to generate its own store for each pod to store its own state.
The POD name for Deployment contains random numbers; The POD name of StatefulSet is always fixed.
All the Services in Deployment have ClusterIP and can be load balanced. The Service of StatefulSet does not have ClusterIP. It is a Headless Service, so it cannot be load balanced. It returns pod names, so pod names must be fixed. StatefulSet creates a DNS domain name for each Pod copy controlled by StatefulSet, in addition to the Headless Service:$(podname).(headless server name).namespace.svc.cluster.local.

DaemonSet

The DaemonSet controller ensures that each Node in the cluster only runs a specific Pod copy, implements system-level background tasks, and also has label selectors.

You can also specify that nodes that partially meet the criteria run a Pod copy, such as monitoring nodes with SSD storage.

It is used to deploy cluster logging, monitoring, or other system management applications. Typical applications include:

Log collection, such as Fluentd, Logstash, etc.
System monitoring, such as Prometheus Node Exporter, CollectD, New Relic Agent, Ganglia Gmond, etc.
System programs, such as Kube-proxy, Kube-DNS, Glusterd, Ceph, etc.

An example of DaemonSet is as follows:

Premise:

# list the nodes used
kubectl get node

Add disk= SSD label to node minikube
kubectl label node minikube disk=ssd

Copy the code

Let’s create a DaemonSet that simulates running the SSD-monitor process, which prints “SSD OK” to standard output every 5 seconds. It will run a single-container POD based on a Luksa/SSD-monitor container image. An instance of this POD will be created on each node with the disk= SSD label.

apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
  name: ssd-monitor
spec:
  selector:
    matchLabels:
      app: ssd-monitor
  template:
    metadata:
      labels:
        app: ssd-monitor
    spec:
      nodeSelector:
        disk: ssd
      containers:
      - name: main
        image: luksa/ssd-monitor

Copy the code

Job

So far, we’ve only talked about pods that need to run continuously. You’ll run into situations where you just want to run and kill the task when it’s done. ReplicationController, ReplicaSet, and DaemonSet run tasks continuously and never reach the completed state. Processes in these PODS are restarted upon exit. However, in a completable task, the process should not be restarted after it terminates.

The Job controller is used to configure the Pod object to run one-time tasks. The process in the container does not restart after the normal running is Completed, but puts the Pod object to the state “Completed”. If a process in the container terminates because of an error, you need to configure whether to restart it.

There are two types of Job controller objects:

Serial Job on a single work queue: A series of one-off jobs that are executed sequentially until the desired number of times is met
Parallel jobs with multiple work queues: Multiple work queues run multiple one-off jobs in parallel

Configuration items:

Completions: Indicates the total number of jobs executed
Parallelism: The parallelism of job execution
ActiveDeadlineSeconds: The maximum active time, beyond which assignments will be terminated
BackoffLimit: The number of retries before the job is marked as failed. The default value is 6
The ttlSecondsAfterFinished: Completed job is not cleaned by default. This configuration item enables the job to be automatically cleared for xx seconds after it is completed. When the TTL controller clears the job, it cascades the job and deletes all the pods in the job. If the value is set to 0, the job is deleted immediately. If the job is not specified, the job is not deleted.

The following is an example of a Job:

Defines a resource of type Job that runs a Luksa/Batch-Job image that calls a process that runs for 120 seconds and then exits.

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-job
spec:
  completions: 5
  parallelism: 2
  activeDeadlineSeconds: 100
  backoffLimit: 5
  ttlSecondsAfterFinished: 100
  template:
    metadata:
      labels:
        app: batch-job 
    spec:
      restartPolicy: OnFailure
      containers:
      - name: main
        image: luksa/batch-job

Copy the code

No POD selector is specified above, which will be created based on the label in the POD template.

Description:

In the definition of a POD, you can specify what Kubernetes should do when the process running in the container terminates.

This is done through the POD configuration property restartPolicy, which defaults to Always. Resource Pods of Job type cannot use the default policy because they are not intended to run indefinitely. Therefore, the restart policy needs to be explicitly set to OnFailure or Never. This setting prevents the container from restarting when the task is complete.

CronJob

The CronJob controller performs periodic tasks and controls the running time and rerunning mode. The CronJob controller controls the running time and rerunning mode of periodic tasks in the Linux operating system.

Configuration items:

JobTemplate: indicates the Job controller template.
Schedule: indicates the time when a Cron job is scheduled to run.
ConcurrencyPolicy: Concurrent execution policy that defines how to execute the next task if the previous job has not completed. The default value is Allow, which allows both previous and subsequent jobs, or even more jobs belonging to the same CrontJob, to run simultaneously. If set to Forbid, two jobs are prohibited from running at the same time. If the former Job is not finished, the latter Job will not be started (skip). If set to Replace, the latter Job will Replace the former Job, that is, terminate the former Job and start the latter Job.
FailedJobHistoryLimit: Number of historical records reserved for failed tasks. The default value is 1.
SuccessfulJobsHistoryLimit: reserved for successful mission history records, the default is 3.
StartingDeadlineSeconds: The timeout period for the error in starting the job due to the lack of a time for the job to be executed for various reasons, which is credited to the error history
Suspend: Indicates whether to suspend the execution of subsequent tasks. The default value is false.

The following is an example of a CronJob:

Defines a resource of the CronJob type to run a batch task every 15 minutes.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: batch-job-every-fifteen-minutes
spec:
  schedule: "0,15,30,45 * * * *"
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            app: periodic-batch-job
        spec:
          restartPolicy: OnFailure
          containers:
          - name: main
            image: luksa/batch-job

Copy the code

The configuration schedule is shown in the following figure:

The timetable contains the following five items from left to right:

minutes
hours
The day of the month
month
What day

How does a scheduling Job work

You might have a situation where a Job or Pod is created and run relatively late. You may have high standards for this assignment, and the task should not start too far behind schedule.

In this case, you can specify a deadline by specifying the startingDeadlineSeconds field in the CronJob specification.

apiVersion: batch/v1beta1
kind: CronJob
spec:
  schedule: "0,15,30,45 * * * *"
  startingDeadlineSeconds: 15
Copy the code

The Pod must start running at the latest 15 seconds after the scheduled time, if the job should run at 10:30:00. If 10:30:15 does not start for any reason, the task will not run and will show Failed.

CronJob normally creates one Job for each execution configured in the schedule, but two jobs may be created at the same time, or none at all. To solve the first problem, your tasks should be simultaneous (multiple runs, not one, will not yield undesirable results). For the second question, make sure that the next task does anything that should have been done by the last (missed) run.

HorizontalPodAutoscaler (HPA)

When the resource usage of an application has peaks and valleys, how can I adjust the number of pods in a Service to improve the overall resource utilization of the cluster? This depends on the Horizontal Pod Autoscaling, which, as the name suggests, automatically scales the Pod number based on the desired CPU or memory size of the Pod.

apiVersion: autoscaling/v1
The resource type is HPA
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-test
  namespace: test
spec:
  # Maximum number of copies
  maxReplicas: 10 
  # Minimum number of copies
  minReplicas: 3   
  scaleTargetRef:   
    apiVersion: apps/v1beta1
    kind: Deployment  
    Monitor the Deployment named deploy-test
    name: deploy-test   
  If the CPU is larger than 80, pods are automatically created to share the service load. If the CPU is smaller than 80, pods are reduced
  targetCPUUtilizationPercentage: 80 

Copy the code

The calculation of automatic scaling based on multiple Pod measures (e.g., CPU usage and query rate per second [QPS]) is also not complicated. Autoscaler separately calculates the number of copies of each metric and then maximizes it (for example, if 4 pods are needed to reach the target CPU utilization and 3 pods are needed to reach the target QPS, the Autoscaler expands to 4 pods).

conclusion

We usually classify applications into stateless applications, stateful applications, daemon applications and batch applications. Kubernetes has designed controllers for various types of applications. This article briefly describes the different uses of these controllers.

The controller	use
Deployment	Stateless application
StatefulSet	Stateful application
DaemonSet	Daemon application
Job & CronJob	Batch job

What is a controller?

Why do we need controllers?

Classification of Pod

ReplicationController, ReplicaSet, Deployment

ReplicationController(RC)

ReplicaSet(RS)

Deployment

ReplicationController, ReplicaSet, Deployment coordination process

ReplicationController, ReplicaSet, Deployment

StatefulSet

background

StatefulSet briefly

Headless Service

Create StatefulSet

StatefulSet failed restart mechanism

The difference between StatefulSet and Deployment

DaemonSet

Job

CronJob

How does a scheduling Job work

HorizontalPodAutoscaler (HPA)

conclusion

Related Posts

Self-study the state patterns of design patterns

Stream#foreach

On the issue of flow between microservices