preface
As a back-end engineer, since most of the projects I am responsible for are “stateless applications” such as Web services, the most common Kubernetes controller I come into contact with in my daily work is Deployment, but Deployment is only suitable for orchestrating “stateless applications”. It assumes that all the pods of an application are identical, not sequentially dependent on each other, and it doesn’t matter which host they’re running on. Because every Pod is the same, you can expand/shrink horizontally, add and remove pods as needed.
However, not all applications are stateless, especially the applications and data storage applications with master-slave relationship between each instance. For such applications, Deployment controller cannot be used correctly, so Kubernetes uses another controller StatefulSet to schedule the Pod of statically applications. Keep the current state of the application equal to the required state defined by the application.
What is a StatefulSet
Like Deployment, StatefulSet is a controller that helps you deploy and extend Kubernetes pods. Most of the time you don’t care how pods are scheduled in Deployment. However, when you need to care about Pod deployment order, corresponding persistent storage, or require Pod to have a fixed network identity that does not change even after reboot or rescheduling, the StatefulSet controller can help you accomplish scheduling goals.
Each Pod created by StatefulSet has a serial number (starting at 0) and a fixed network identity. You can also add VolumeClaimTemplate to the YAML definition to declare the PVC used for Pod storage. When StatefulSet deploys pods, each Pod is deployed one by one from 0 to the final one. The next Pod will be deployed only after the previous one is deployed and running.
StatefulSet, which is an extension of Deployment and was added to the Kubernetes family of controllers after version 1.9, abstracts the state that a stateful application needs to maintain into two cases:
-
Topology status. This situation means that multiple instances of the application are not in a completely equal relationship. These application instances must be started in some order. For example, primary node A of the application must be started before secondary node B. If you remove Pod A and Pod B, they must be created in exactly the same order. In addition, the new Pod must have the same network identity as the original Pod, so that the original visitors can access the new Pod in the same way.
-
Store state. This situation means that multiple instances of the application are bound to different storage data. For these application instances, the data read by Pod A for the first time should be the same as the data read again after Pod A is recreated. The most typical example of this is multiple storage instances of a database application.
So the core function of StatefulSet is to somehow record these states and then restore them for new pods when they are recreated.
Keep the topology state of the application
To maintain the topology of the application, you must ensure that a fixed network id can be used to access a fixed Pod instance. Kubernetes adds a fixed network id to each Endpoint (Pod) through the Headless Service. So let’s spend some time looking at Headless services.
HeadlessService
In the article learning and practicing combination, quickly master Kubernetes Service written
A Service defines a set of pods on a logical abstraction layer, providing them with a uniform fixed IP address and a load balancing policy for accessing this set of Pods.
For A ClusterIP Service, the format of its A record is:
ServiceName. Namespace. SVC. Cluster. The local, when you visit the A record, it resolves to is the VIP of the Service address.
For A Headless Service with clusterIP=None, the format of the A record is the same as above, but the access record returns the Pod IP address set. Pod will be assigned to the corresponding DNS A record, format for: podName. ServiceName. Namesapce. SVC. Cluster. The local
Common services have a ClusterIP, which is essentially a virtual IP address that forwards requests to one of the pods that the Service represents.
The definition of Service and Deployment in use is as follows:
apiVersion: v1
kind: Service
metadata:
name: app-service
spec:
type: NodePort Create a ClusterIp Service when creating a NodePort Service
selector:
app: go-app
ports:
- name: http
protocol: TCP
nodePort: 30080
port: 80
targetPort: 3000
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-go-app
spec:
replicas: 2
selector:
matchLabels:
app: go-app
template:
metadata:
labels:
app: go-app
spec:
containers:
- name: go-app-container
image: kevinyan001/kube-go-app
ports:
- containerPort: 3000
Copy the code
Once you have created the above resources in Kubernetes, you can go into one of the pods and view the Service’s A record
➜ kubectl exec - it my - go - app - 69 d6844c5c - gkb6z - / bin/sh/app # nslookup app - service. The default. SVC. Cluster. The local Server: 10.96.0.10 Address: 10.96.0.10:53 Name: app - service. The default. SVC. Cluster. The local Address: 10.108.26.155Copy the code
Service name = Pod name = Pod name = Pod name = Pod name = Pod name = Pod name
/ app # nslookup my - go - app - 69 d6844c5c - gkb6z. App - service. The default. SVC. Cluster. The local Server: 10.96.0.10 Address: 10.96.0.10:53 * * server can 't find my - go - - 69 d6844c5c - gkb6z app. The app - service. The default. The SVC. The cluster. The local: NXDOMAINCopy the code
Service has a ClusterIp address, which is directly resolved by DNS. How can DNS resolve Pod IP through Service? So there’s the Headless Service.
The only difference between creating a Headless Service and a normal Service is that you specify spec:clusterIP: None in the YAML definition.
Now I create a Headless Service agent for the two application Pod instances in the above example, whose YAML definitions are as follows
# headless-service.yaml
apiVersion: v1
kind: Service
metadata:
name: app-headless-svc
spec:
clusterIP: None # <-- Don't forget!!
selector:
app: go-app
ports:
- protocol: TCP
port: 80
targetPort: 3000
Copy the code
Command to create a Service
➜ kubectl apply-f headless-service.yaml service/app-headless-svc created
After the Headless Service is created, let’s look at the CORRESPONDING A record of the Service in DNS
Or in just enter the Pod, remember to format the DNS record of Service is serviceName namespace. SVC. Cluster. The local
/app # nslookup app-headless-svc.default.svc.cluster.local
Server: 10.96. 010.
Address: 10.96. 010.: 53
Name: app-headless-svc.default.svc.cluster.local
Address: 10.1. 038.
Name: app-headless-svc.default.svc.cluster.local
Address: 10.1. 0. 39
Copy the code
DNS queries will return the IP addresses of the two endpoints (Pods) of the HeadlessService agent, so that the client can obtain the IP addresses of each Endpoint through the HeadlessService and perform load balancing policies on the client if necessary. Another important use of the Headless Service (and the real reason you need a Headless Service when using StatefulSet) is that it adds DNS domain name resolution to the Pod created by each StatefulSet of the proxy. This allows pods to access each other.
To highlight:
- This is assigned to
Pod
Is the DNS domain name ofPod
Fixed unique network identity, even if the reconstruction and scheduling DNS domain name does not change.- The Deployment to create
Pod
The name is random, soHeadlessService
Not forDeployment
Domain name resolution is added separately to the created Pod.
Let’s modify the above example slightly and add a StatefulSet object to create a Pod to verify.
apiVersion: v1
kind: Service
metadata:
name: app-headless-svc
spec:
clusterIP: None # <-- Don't forget!!
selector:
app: stat-app
ports:
- protocol: TCP
port: 80
targetPort: 3000
---
apiVersion: apps/v1
kind: StatefulSet # <-- claim stateful set
metadata:
name: stat-go-app
spec:
serviceName: app-headless-svc # <-- Set headless service name
replicas: 2
selector:
matchLabels:
app: stat-app
template:
metadata:
labels:
app: stat-app
spec:
containers:
- name: go-app-container
image: kevinyan001/kube-go-app
resources:
limits:
memory: "64Mi"
cpu: "50m"
ports:
- containerPort: 3000
Copy the code
The only difference between this YAML file and the Deployment we used earlier is the addition of a spec.servicename field.
StatefulSet names all pods it manages and numbers them as follows: StatefulSet name – Serial number. These numbers are cumulative from 0 and correspond to each Pod instance of StatefulSet. They are never repeated.
➜ kubectl get pod
NAME READY STATUS RESTARTS AGE
stat-go-app-0 1/1 Running 0 9s
stat-go-app-1 0/1 ContainerCreating 0 1s
Copy the code
We can go to stat-go-app-0 Pod and check the DNS records of the two pods
Tip: Headless Service add DNS to Pod format for podName. ServiceName. Namesapce. SVC. Cluster. The local
/ app # nslookup stat - go - app - 0. App - headless - SVC. Default. SVC. Cluster. The local Server: 10.96.0.10 Address: 10.96.0.10: Name: 53 stat - go - app - 0. App - headless - SVC. Default. SVC. Cluster. The local Address: 10.1.0.46 / app # nslookup stat - go - app - 1. The app - headless - SVC. Default. SVC. Cluster. The local Server: 10.96.0.10 Address: 10.96.0.10: Name: 53 stat - go - app - 1. The app - headless - SVC. Default. SVC. Cluster. The local Address: 10.1.0.47Copy the code
This ensures that PODS can communicate with each other. If StatefulSet is used to orchestrate a master-slave application, they can communicate with each other through DNS domain name access, even if Pod resends its internal DNS domain name.
Maintain Pod orchestration order
StatefulSet creates pods using the StatefulSet controller named stat-go-app. All pods managed by StatefulSet are named StatefulSet name – Serial number. The serial numbers are accumulated from 0 and correspond to each Pod instance of StatefulSet.
Kubectl get pod = stat-go-app-0; stat-go-app-1 = stat-go-app-1
More importantly, these pods were created in strict order of numbering. For example, stat-Go-app-1 will remain Pending until stat-Go-app-0 enters the Running state and Conditions become Ready.
StatefulSet keeps a record of this topology state, and rescheduling pods is strictly in this order even when tuning occurs. The next Pod will not be created until the previous Pod is created and Ready.
Keep the Pod fixed with a unique network identity
Once you understand the real purpose of the Headless Service, the answer to the question of how Kubernetes internally fixed Pod unique network identifiers is that the Headless Service adds DNS domain name resolution for each Pod created by the proxy’s StatefulSet. So in use StatefulSet choreography instances of master-slave relationship between stateful applications, Pod can each other to podName. The serviceName. Namesapce. SVC. The cluster. The local domain name format for communication, In this way, you don’t have to worry about IP changes after the Pod is rescheduled to another node.
Preserve the storage state of the instance
To add a Volume to a Pod, you need to add the spec.volumes field to the Pod definition. When you add volumes to a Pod, you need to add the spec.volumes field to the Pod definition. You can then define a Volume of a specific type in this field, such as hostPath.
.
spec:
volumes:
- name: app-volume
hostPath:
# directory location on the host
path: /data
containers:
- image: Mysql: 5.7
name: mysql
ports:
- containerPort: 3306
volumeMounts:
- mountPath: /usr/local/mysql/data
name: app-volume
.
Copy the code
However, this method of declaring data volumes is not suitable for data store applications where each Pod instance is bound to store data. Because the Volume of hostPath type is based on the host directory, if the Pod is rescheduled and goes to another node, there is no way to restore the Pod storage data on the new node.
Since the data volume on the Pod host is not applicable, then Pod can only use Kubernetes cluster storage resources. Cluster persistent data volume resources are configured and used through PVCS and PVCS, so let’s look at these two concepts first.
PV and PVC
A PersistentVolume (PV) is a piece of Storage in a cluster that can be provisionated beforehand by an administrator or dynamically by using a Storage Class. Persistent volumes are cluster resources, just as nodes are cluster resources, and they have a life cycle independent of any Pod that uses PV.
As an application developer, I may not know anything about distributed storage projects (such as Ceph, GFS, HDFS, etc.) and will not be able to write their corresponding Volume definition files. This not only exceeds the developer’s knowledge base, but also risks exposing sensitive information about the company’s infrastructure (secret keys, administrator passwords, etc.). So Kubernetes later introduced PersistentVolumeClaim (PVC).
PVC represents the Pod’s request for storage. Similar in concept to Pod. Pod will consume node resources, while PVC application will consume PV resources. With PVC, the definition of a Pod that needs to use a persistent volume only requires the PVC to be used, which hides a lot of information about storage from the user. For example, I can mount the remote storage directly into the Pod container without knowing anything about its space name, server address, AccessKey, and so on. For example:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pv-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
name: pv-pod
spec:
containers:
- name: pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: pv-storage
volumes:
- name: pv-storage
persistentVolumeClaim:
claimName: pv-claim
Copy the code
As you can see, in the Volumes definition for this Pod, all we need to do is declare that it is typed to persistentVolumeClaim and specify the PVC name, regardless of the definition of the persistent volume itself.
After PVC is created, it can only be used after binding with PV, but for users, we don’t need to worry about this detail.
The RELATIONSHIP between PVC and PV can be understood in terms of the relationship between interfaces and implementations in the programming domain.
PVC templates for StatefulSet
The relationship between StatefulSet, Pod, PVC, and PV can be represented in the following diagram
In the definition of StatefulSet we can added a spec. VolumeClaimTemplates fields. It is similar to the Pod template (spec.template field).
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "nginx"
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: Nginx: 1.9.1
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Copy the code
Note: Pod templates are available in both StatefulSet and Deployment, which are used to create Pod instances for the controller. See the previous Deployment article for details on this.
That is, any Pod managed by StatefulSet will declare a corresponding PVC. The DEFINITION of this PVC comes from the volumeClaimTemplates template field. More importantly, the name of the PVC will be assigned a number exactly the same as that of the Pod.
The PVCS created by StatefulSet are named in the format **”PVC name-StatefulSet Name-Serial number “**.
For the StatefulSet above, it creates pods and PVCS with the following names:
Pod: web-0, web-1
PVC: www-web-0, www-web-1
Copy the code
If web-0 is rescheduled and the Pod is recreated to another node, in the definition of the new Pod object, the PVC it declares to use is called www-web-0, thanks to volumeClaimTemplates. So, after the new web-0 is created, Kubernetes looks for the PVC binding named www-web-0 for it. Since the PVC life cycle is independent of the Pod that uses it, the new Pod takes over the data left by the previous Pod.
conclusion
StatefulSet, like a special Deployment, implements maintenance of topology state and storage state using two standard Kubernetes features: Headless Service and PVC.
StatefulSet uses the Headless Service to create a fixed DNS domain name for each Pod it manages to use as the Pod’s network identity within the cluster. Together with numbering and scheduling pods in strict sequence, these mechanisms ensure StatefulSet’s support for maintaining application topology state.
With the volumeClaimTemplates in the StatefulSet definition file declaring the PVCS used by pods, the PVCS it creates are bound to the pods it creates by naming them. With the help of PVC’s Pod independent life cycle and the binding mechanism between the two, StatefulSet maintains the status of the application storage.
Today’s article is over here, will continue to share the study of Kuberntes after the article, strive to create a suitable engineer Kubernetes learning tutorial, like you can pay attention to the public number on wechat “NMS talking bi talking”, every week will push advanced technology articles.