• The relationship between containers and PODS
    • Sidecar Pattern
  • Manages containers for Pod objects
    • Define the acquisition policy for the mirror
    • Exposure to port
    • Custom run containerized applications
    • The environment variable
  • Tag and tag selector
    • Label management
    • Label selector
  • Resource annotation
  • The Pod object’s life cycle
    • Phase
    • The Pod creation process
    • Important behaviors in the Pod lifecycle
      • The container used for initialization
      • Lifecycle hook functions
      • Container restart policy
    • The termination process of Pod
  • Pod viability detection
    • exec
    • httpGet
    • tcpSocket
    • Activity detection behavior attributes
  • Pod readiness detection
  • Resource requirements and resource constraints
    • The resource requirements
    • Resource constraints
    • Pod quality of service category

Pod is the basic unit of Kubernetes system, the smallest component that can be created or deployed by users in the resource object model, and the resource object that runs containerized applications on Kubernetes system.

The relationship between containers and PODS

Docker recommends the single-container single-process operation, but due to the isolation mechanism between containers, inter-process Communication (IPC) cannot be realized between container processes. This makes it difficult to communicate between functional-related containers, such as the master container and the container responsible for log collection. Pod resource abstraction is the component used to solve such problems. Pod objects are collections of containers that share Network, UTS (UNIX Time-sharing System) and IPC namespace, so they have the same domain name, host name and Network interface, and can communicate directly through IPC. It is the underlying container pause that provides sharing mechanisms such as network namespaces for containers in a Pod object. Although PODS support running multiple containers, as a best practice, unless multiple processes are closely related, they should be built into multiple Pods, which can be scheduled to run on multiple different hosts, improving resource utilization and facilitating scaling.

Sidecar Pattern

When multiple processes are closely related, multiple containers are generally organized according to the sidecar model. Sidecar is the auxiliary application container that provides coordination for the main application container of Pod. A typical application scenario is that the agent can be run as the auxiliary application container when the logs in the main application container are collected to the log server using the Agent.

Manages containers for Pod objects

Example configuration list of Pod:

apiVersion: v1
kind: Pod
metadata:
 name: pod-example
spec:
 containers:
 - name: myapp
   image: ikubernetes/myapp:v2
Copy the code

Where, under the spec field, containers is and its subfield name is mandatory, and image is mandatory in manual scenarios Pod, but optional when controlled by high-level managed resources such as Deployment, as this field may be overwritten.

Define the acquisition policy for the mirror

The core function of Pod is to run the container, and you can customize the image acquisition policy through image.imagePullPolicy.

  • Always: The mirror is Always obtained from the specified repository if the label of the mirror is Latest or the mirror does not exist
  • IfNotPresent: The image is downloaded from the target repository only if the local image is missing
  • Never: the image is not downloaded from the repository. That is, only the local image is used
spec:
 containers:
 - name: myapp
   image: ikubernetes/myapp:v2
     imagePullPolicy: Always  
Copy the code

The default policy for obtaining the latest image file is Always, and the default policy for obtaining the latest image file is IfNotPresent.

Exposure to port

The meaning of exposing ports in Pod is different from that of exposing ports for Docker containers: in Docker’s network model, containerized applications using the default network need to “expose” to the external network through NAT mechanism so that they can be accessed by container clients on other nodes; However, in K8S, the IP address of each Pod is already in the same network plane. No matter whether the port is exposed for the container or not, it will not affect the Pod client on other nodes in the cluster to access it. Therefore, the exposed port is only informational data, and it is convenient to specify the container port explicitly.

spec:
 containers:
 - name: myapp
   image: ikubernetes/myapp:v2
   ports:
   - name: http
     containerPort: 80
     protocol: TCP
Copy the code

The configuration here specifies to expose TCP port 80 on the container and names it HTTP. The IP addresses of Pod objects are reachable only within the current cluster. They cannot directly receive request traffic from clients outside the cluster. Although their service accessibility is not constrained by the working node boundary, it is still constrained by the cluster boundary. How to make Pod objects accessible outside the cluster will be learned later.

Custom run containerized applications

The command field can specify applications that are different from the default running of the image, and you can also use the args field for parameter passing, which overrides the default definitions in the image. However, if only the args field is defined for the container, it is passed as a parameter to the application specified to run by default in the image; If only the command field is defined for the container, it overrides the program and parameters defined in the image and runs the application without parameters.

spec:
 containers:
 - name: myapp
   image: ikubernetes/myapp:v2 
   imagePullPolicy: Never     
   command: ["/bin/sh"]
   args: ["-c", "while true; do sleep 30; done"]
Copy the code

The environment variable

Environment variables are also a means of passing configuration to containerized applications. There are two methods of passing data to container environment variables in Pod objects: env and envFrom. The first method is described here, and the second method will be explained when we introduce ConfigMap and Secret resources. Environment variables are typically composed of name and value fields.

spec:
 containers:
 - name: myapp
   image: ikubernetes/myapp:v2 
   env:
   - name: REDIS_HOST
     value: do.macOS
   - name: LOG_LEVEL
     value: info
Copy the code

Tag and tag selector

Label management

The label selector selects the resource object with the label attached and performs the required operations. An object can have more than one tag, and the same tag can be added to multiple resources. You can attach multiple labels of different latitudes to resources to achieve flexible resource group management. For example, version labels, environment labels, and hierarchical architecture labels are used to cross-identify different versions, environments, and architecture levels of a resource. Example of defining a tag:

apiVersion: v1
kind: Pod
metadata:
 name: pod-example
 labels:
   env: qa
   tier: frontend
Copy the code

Once the resource is created, add the –show-labels option to the kubectl get Pods command to display lables information. The -l

,

option adds the corresponding column information. Manage active object tags directly:

kubectl label pods/pod-example release=beta
Copy the code

Add release=beta to pod-example, and add the –overwrite option if you want to modify an existing impairment pair.

Label selector

Tag selectors are used to express query criteria or selection criteria for tags. The Kubernetes API currently supports two selectors: based

  • Equality -based, the available operators are “=”, “==” and “! = “three, the first two are equivalent
  • Set-based, supports the in, notin, and exists operators. In addition, we can specify only KEY to filter all resources that have this KEY name tag. KEY filters all resources that do not have this KEY name label

Follow the following logic when using label selectors:

  • The logical relationship between multiple selectors specified at the same time is the and operation
  • A label selector with a null value means that every resource object will be selected
  • An empty label selector will not be able to select any resources.

Many resource objects in Kubernetes, such as Service, Deployment, and ReplicaSet resources, must be associated with Pod resource objects in the form of label selectors. Selectors can be specified in the spec field using nested “Selector” fields. There are two ways:

  • MatchLabels: Specify label selectors by giving key value pairs directly
  • MatchExpressions: A list of label selectors specified based on expressions, each of which is of the form {key:KEY_NAME, operator: operator,values: [VALUE1, VALUE2,… } “, the relationship between selector lists is “logic and”; Values are not required to be a non-empty list of strings when the In or NotIn operators are used, but must be empty when Exists or DostNotExist is used.

Format examples:

selector:
 matchLabels:
   component: redis
 matchExpressions:
   - {key: tier, operator: In, values: [cache]}
   - {key: environment, operator: Exists, values:}
Copy the code

Resource annotation

In addition to tags, Pod and other resources can use annotations, which are key-value types of data, but cannot be used to tag or pick up Kubernetes objects. Instead, they can be used to provide “metadata” information for resources. In addition, metadata in annotations is not limited by the number of characters, can be structured or unstructured, and there are no restrictions on character types. Annotations place build, release, or mirroring information that points to the address of a logging, monitoring, analysis, or audit repository, or information generated by a client library or tool program for debugging purposes: information such as name, version, build information, and so on.

Use the kubectl get -o yaml and kubectl describe commands to display resource annotation information.

kubectl describe pods pod-example | grep "Annotations"
Copy the code

Annotations define annotations in the configuration manifest:

apiVersion: v1
kind: Pod
metadata:
 name: pod-example
 annotations:
   created-by: "cluster admin"
Copy the code

Additional annotations:

kubectl annotate pods pod-example created-by2="admin"
Copy the code

The Pod object’s life cycle

The Pod life cycle is shown below:

Phase

A Pod object should always be in one of the following phases in its life:

  • Pending: THE API Server creates the Pod resource object and stores it into etCD, but it has not yet been scheduled or is still in the process of downloading the image from the repository. ‰
  • Running:Pod has been scheduled to a node and all containers have been created by Kubelet. ‰
  • Succeeded: All containers in the Pod have successfully terminated and will not be restarted. ‰
  • Failed: All containers have terminated, but at least one container Failed to terminate, that is, the container returned a non-zero exit state or was terminated by the system. ‰
  • Unknown: THE API Server cannot obtain the status information of the Pod object normally because it cannot communicate with kubelet of the working node.

The Pod creation process

The creation process of Pod refers to the creation process of Pod itself and its main container and its auxiliary container.

  1. Users submit podspecs to the API Server via Kubectl or another API client.
  2. The API Server tries to store information about the Pod object into etCD, and when the write operation is complete, the API Server returns an acknowledgement message to the client.
  3. API Server starts to reflect state changes in ETCD.
  4. All Kubernetes components use the “watch” mechanism to track and check relevant changes on the API Server.
  5. The Kube-scheduler senses through its “watcher” that the API Server has created a new Pod object that has not yet been bound to any working nodes.
  6. Kube-scheduler selects a working node for the Pod object and updates the result information to the API Server.
  7. The scheduling result information is updated to the ETCD storage system by the API Server, and the API Server begins to reflect the scheduling result of this Pod object.
  8. The Kubelet on the target work node to which the Pod is scheduled tries to call the Docker start container on the current node and sends the resulting state of the container back to the API Server.
  9. The API Server stores the Pod status information into the ETCD system.
  10. After the ETCD confirms the successful completion of the write operation, the API Server sends the confirmation to the relevant Kubelet through which the event will be accepted.

Important behaviors in the Pod lifecycle

In addition to creating application containers (primary and secondary containers), users can define various behaviors for Pod objects in their life cycle, such as containers for initialization, viability probes, and readiness probes

The container used for initialization

Init Container is a container that is run before the main container of an application is started. It is often used to perform some preset operations on the main container. Typical applications include:

  • Used to run specific utility programs that, for security reasons, are not suitable for inclusion in the main container image.
  • Provides utilities or custom code that are not available in the main container image.
  • Provides a way for container image builders and deployers to work separately and independently without having to collaborate on a single image file.
  • The initialization container and the primary container are in different file system views, so sensitive data, such as Secrets resources, can be safely used separately.
  • The initialization container starts and runs sequentially before the application container, so it can be used to delay the start of the application container until its dependent conditions are met.

Define the initContainers field in the resource manifest:

spec:
 containers:
 - name: myapp
   image: ikubernetes/myapp:v2 
 initContainers:
 - name: init-something
   image: busybox
   command: ['sh', '-c', 'sleep 10'] 
Copy the code
Lifecycle hook functions

Kubernetes provides two types of lifecycle hooks for containers:

  • PostStart: Runs immediately after the container is created, but Kubernetes cannot guarantee that it will run before ENTRYPOINT in the container.
  • PreStop: Runs immediately before the container terminates the operation. It is called synchronously, so it blocks the operation to delete the container until it completes.

Hook functions can be implemented in two ways: Exec and HTTP. The former one runs a user-defined command directly in the current container when the hook event is triggered, and the latter one makes an HTTP request to a URL in the current container. The hook function is defined in the container’s spec.lifecycle field.

Container restart policy

Whether a Pod object should be rebuilt depends on the definition of its restartPolicy property.

  • Always: Restarts a Pod object whenever it terminates, which is the default.
  • OnFailure: Restart the Pod object only if there is an error.
  • Never: Never restarts.

After the container fails to restart, a delay of 10 seconds, 20 seconds, 40 seconds, 80 seconds, 160 seconds, and 300 seconds will occur.

The termination process of Pod

  1. The user sends the command to delete the Pod object.
  2. Pod objects in the API server are updated over time, and during a grace period (30 seconds by default), pods are considered “dead.”
  3. Mark the Pod as “Terminating”.
  4. (Run concurrently with Step 3) Kubelet starts the Pod shutdown process when monitoring the Pod object’s “Terminating” state.
  5. (Running concurrently with Step 3) The endpoint controller monitors the shutdown behavior of the Pod object and removes it from the list of endpoints for all Service resources matching this endpoint.
  6. If the current Pod object defines a preStop hook handler, execution will start synchronously when it is marked “terminating”. If preStop does not end after the grace period, step 2 is re-executed and an additional grace period of 2 seconds is obtained.
  7. The container process in the Pod object receives the TERM signal.
  8. After the grace period, if there are any running processes, the Pod object receives SIGKILL.
  9. Kubelet requests the API Server to set the grace period for this Pod resource to 0 to complete the delete operation, and it becomes no longer visible to the user.

If kubelet or the container manager is restarted while waiting for the process to terminate, the terminate operation regains a full delete grace period and reexecutes the delete operation.

Pod viability detection

Kubelet determines when a container needs to be restarted based on survivability detection. Through the spec. Containers. LivenessProbe definition, supports three detection:

  • exec
  • httpGet
  • tcpSocket

exec

Exec probe executes user-defined commands in the target container to determine the health status of the container. If the return value of the command status is 0, it indicates that the container passes the detection “successfully”. Other values are “failed”. It has only one available attribute, “Command,” which specifies the command to execute, as shown in the following example:

apiVersion: v1
kind: Pod
metadata:
 name: liveness-exec-demo
 labels:
   test: liveness-exec-demo
spec:
 containers:
 - name: liveness-exec-demo
   image: busybox 
   args: ["/bin/sh", "-c", " touch /tmp/healthy;sleep 60; rm -rf /tmp/healthy;sleep 600"]
   livenessProbe:
     exec:
       command: ["test", "-e", "/tmp/healthy"]
Copy the code

This configuration list starts a container based on the BusyBox image and executes the command defined by Args, which creates/TMP /healthy files when the container starts and deletes them 60 seconds later. Run the test-e/TMP /healthy command to check the existence of the/TMP /healthy file. If the file exists, the status code 0 is returned, indicating that the test succeeds. So after 60 seconds you can see the event that the container was restarted using the describe command.

httpGet

The httpGet mode is to send an HTTP GET request to the target container, and determine the result according to its response code. When 2xx or 3xx, the detection passes. The configurable fields are as follows:

  • Host, the requested host address, which defaults to Pod IP and can also be defined using “host:” in httpHeaders.
  • Port, requested port, mandatory field.
  • HttpHeaders: Specifies a custom request header.
  • Path, the path of the requested HTTP resource.
  • Scheme: Indicates the protocol used to establish the connection. The protocol can only be HTTP or HTTPS.

The sample

apiVersion: v1
kind: Pod
metadata:
 name: liveness-http-demo
 labels:
   test: liveness-http-demo
spec:
 containers:
 - name: liveness-http-demo
   image: nginx:1.12-alpine
   ports:
   - name: http
     containerPort: 80
   lifecycle:
     postStart:
       exec:
         command: ["/bin/sh", "-c", " echo Healthy > /usr/share/nginx/html/healthz"]
   livenessProbe:
     httpGet:
       path: /healthz
       port: http
       scheme: HTTP
Copy the code

This configuration listing creates a page file healthZ dedicated to httpGet testing via postStart hook. The path specified for httpGet probes is’ /healthz ‘, the address defaults to Pod IP, and the port uses the port name HTTP defined in the container. Running the following command to delete the HEALTHZ page, you can see Container liP-HTTP-demo failed liVENESS probe, will be restarted in the event window.

kubectl exec liveness-http-demo rm /usr/share/nginx/html/healthz
Copy the code

Typically, a dedicated URL path should be defined for HTTP probing operations, and the Web resources corresponding to this URL path should be fully examined internally in a lightweight manner to ensure that the key components of the application are properly serviced to the client.

tcpSocket

Tcp-based survival detection is used to initiate A TCP request to a specific port of the container and attempt to establish a connection. If the connection is established successfully, it passes the detection. In comparison, it is more efficient and resource-efficient than HTTP-based probing, but it is less accurate. The configurable fields are as follows:

  • Host, the destination IP address for the connection request. The default is Pod IP.
  • Port, the destination port on which the connection is requested. This field is mandatory.

For example:

Spec: containers: -name: liP-TCP-demo Image: nginx:1.12-alpine livenessProbe: tcpSocket: port: 80Copy the code

Activity detection behavior attributes

For pods with liVENESS configured, you can see information like delay and timeout using the describe command. The default values were not specified before:

Liveness:       tcp-socket :80 delay=0s timeout=1s period=10s #success=1 #failure=3
Copy the code
  • InitialDelaySeconds: indicates the duration of the active probe delay, that is, how long after the container is started before the first probe is started. The delay attribute is displayed as delay and the default value is 0 seconds, an integer
  • TimeoutSeconds, the timeout duration of the live probe, displayed as the timeout property, 1s by default, integer, 1s minimum
  • PeriodSeconds, the frequency of the detection of the activity, displayed as a period attribute and an integer. The default value is 10s, and the minimum value is 1s. Too high a frequency can impose significant overhead on Pod objects, and too low a frequency can cause slow responses to errors
  • SuccessThreshold, the minimum number of consecutive successful probes in a failed state before the probe is considered successful, is displayed as the #success attribute. The default value is 1, and the minimum value is also 1, integer
  • FailureThreshold: In the successful state, the minimum number of consecutive failures of a probe operation before it is considered as a failed detection, displayed as #failure attribute, default value is 3, minimum value is 1, integer.

In addition, LIVENESS detection only applies to the current service. For example, when a back-end service (such as a database or cache service) fails, restarting the current service does not resolve the problem, but it is restarted again and again until the back-end service recovers.

Pod readiness detection

Once a Pod object is started, container applications usually take some time to complete their initialization, such as loading configuration or data, and even some applications that need to warm up. Therefore, you should avoid having a Pod object process client requests immediately after it is started, and instead wait for the container initialization to complete and become Ready, especially if there are other Pod objects that provide the same service. Readiness probes are periodic operations used to determine whether a container is ready or not. The container is considered ready when the probe returns a “success” state. Like liVENESS probe, there are three ways to do this, but it is defined using a property named readinessProbe. For example:

apiVersion: v1 kind: Pod metadata: name: readiness-tcp-demo labels: test: readiness-tcp-demo spec: containers: - name: Readiness - TCP-Demo Image: nginx:1.12- Alpine readinessProbe: tcpSocket: port: 80Copy the code

Pod objects that do not define readiness probes are ready as soon as the Pod enters the “Running” state. In production practice, readiness probes must be defined for containers that require time to initialize and critical Pod resources.

Resource requirements and resource constraints

The “computing resources” in K8S that can be requested or consumed by containers or pods are the CPU and memory, where the CPU is a compressible resource that can be shrunk on demand, while memory is an incompressible resource that can be shrunk in unpredictable ways. Resource isolation is currently at the container level, so CPU and memory resources need to be configured on the container in Pod, which supports two properties:

  • Requests, which defines guaranteed availability values for its requests, meaning that the container may not be running with these quotas of resources, but must ensure that so many resources are available when it does;
  • Limits: limits the maximum number of available resources

In K8S, one CPU unit is equivalent to one virtual CPU (vCPU) on a virtual machine or one Hyperthread (or logical CPU) on a physical machine. It supports fractional metering. One core is equivalent to 1000 microcores (Millicores). So 500m is equivalent to 0.5 cores. The default unit is byte. You can also use E(Ei), P(Pi), T(Ti), G(Gi), M(Mi), and K(Ki) as the unit suffix.

The resource requirements

apiVersion: v1
kind: Pod
metadata:
 name: stress-demo
spec:
 containers:
 - name: stress-demo
   image: ikubernetes/stress-ng
   command: ["/usr/bin/stress-ng", "-m 1", "-c 1", "--metrics-brief"]
   resources:
     requests:
       memory: "128Mi"
       cpu: "200m"
Copy the code

The above configuration list defines the resource requirements of the container as 128M memory and 200m(0.2) CPU core. It runs stress-ng(a multifunctional system stress tool) image to start a process (-m 1) for memory performance stress tests, followed by a dedicated CPU stress test process (-c 1). Kubectl exec stress-demo — top kubectl exec stress-demo — top kubectl exec stress-demo — top kubectl exec stress-demo — top kubectl exec stress-demo — top Because two test threads were running at full load across two CPU cores), much higher than the values defined in Requests. This is due to the current abundance of resources. Once resources are tight, the node only keeps 1/5 of the CPU core available in the container, which is about 3.33% for a node with six cores. Excessive resources will be compressed. Memory is a non-compressible resource, so this Pod may be killed in OOM when memory resources are tight. If you do not define requests, other PODS may be compressed to such a low level that pods cannot be scheduled to run during CPU constraints, and incompressible memory resources may be killed in OOM. Therefore, when running business-critical PODS on Kubernetes systems, you must use the Requests attribute to define the guaranteed availability of resources for the container.

Each node in the cluster has fixed CPU and memory resources. The Kubernetes scheduler will determine which nodes can receive and run the current Pod resources based on the Requests attribute of the container when scheduling pods. For a node’s resources, each Pod object is run. Requests defined in requests are reserved until all Pod objects are allocated.

Resource constraints

Resource requirements can be defined to ensure the minimum amount of resources for a container. To limit the maximum amount of resources a container can use, you need to define resource limits. If a resource limit is defined, the container process cannot obtain the available time beyond its CPU quota, and the process will be killed by OOM Killer if it requests to allocate memory resources beyond its LIMITS definition.

Pod quality of service category

Kubernetes allows overloaded use of limits for node resources, which means that a node cannot simultaneously satisfy all Pod objects on it to run with full resource. Therefore, it is necessary to determine the priority of the Pod object, and terminate the low-priority Pod object when memory resources are insufficient. The priority of Pod objects is determined by the Requests and limits attributes and is divided into three levels, or Quality of Service (QoS) :

  • Guaranteed, all containers in a Pod define Limits and Requests for all resource types. Limits equals Requests and is not equal to 0. When Requests is undefined, it equals Limits by default and has the highest priority.
  • BestEffort, Pod resources that do not set requests or limits attributes for any of the containers fall into this category and have the lowest priority.
  • Burstable, which is not Guaranteed and BestEffort, has medium priority.

The above is only applicable when memory resources are scarce and CPU resources cannot be satisfied. Pod simply cannot obtain corresponding resources temporarily.

Learning materials

Kubernetes combat advanced ma Yongliang