1.1 Introduction to the scheduler
Come to a small liu loaded force, today we will learn K8 scheduler
Scheduler is the Scheduler of Kubernetes. The main task of Scheduler is to assign defined PODS to nodes in the cluster. The following issues need to be considered:
- Fairness: How to ensure that each node can be allocated resources
- Efficient resource utilization: All cluster resources are used to the maximum extent
- Efficiency: the scheduling performance is good, can quickly to large volume
pod
Complete scheduling work - Flexibility: Allows users to control scheduling logic based on their own needs
Scheduler is run as a separate program that connects to Apiserver to get podspec.nodename for empty podSpec.nodename, and creates a binding for each pod indicating which node the pod should be placed on.
1.2 Scheduling Process
Scheduling is divided into several parts:
Predicate has a series of algorithms that you can use:
PodFitsResources
: Indicates whether the remaining resources on the node are greater thanpod
Requested resourcePodFitsHost
If:pod
Specifies theNodeName
To check whether the node names are the sameNodeName
matchingPodFitsHostPorts
: Used on the nodeport
Whether andpod
To apply for theport
conflictPodSelectorMatches
: Filter out andpod
The specifiedlabel
Mismatched nodeNoDiskConflict
: it’smount
thevolume
andpod
The specifiedvolume
No conflicts, unless they are both read-only
If there are no suitable nodes in the predicate process, the POD will remain in the pending state, retrying the schedule until a node meets the predicate condition.
After this step, if more than one node meets the criteria, the priorities process continues: the nodes are sorted by priority size. A priority consists of a series of key-value pairs. The key is the name of the priority item and the value is its weight. These priority options include:
LeastRequestedPriority
: By calculationCPU
andMemory
The utilization rate to determine the weight, the lower the utilization rate, the higher the weight. In other words, this priority metric favors nodes with a lower percentage of resource usageBalancedResourceAllocation
: nodeCPU
andMemory
The closer the usage rate, the higher the weight. This should be used with the one above, not aloneImageLocalityPriority
: Indicates a node that already has a mirror. The greater the total size of the mirror, the higher the weight
All priority items and weights are calculated through the algorithm to get the final result.
1.3 Customizing a Scheduler
In addition to the K8S built-in scheduler, you can customize the scheduler. A scheduler can be selected for pod scheduling by specifying the name of the scheduler with the spec: SchedulerName parameter. For example, the following pod selects my-scheduler for scheduling instead of the default default-scheduler:
apiVersion: v1
kind: Pod
metadata:
name: annotation-second-scheduler
labels:
name: multischeduler-example
spec:
schedulername: my-scheduler
containers:
- name: pod-with-second-annotation-container
image: GCR. IO/google_containers/pause: 2.0
Copy the code
2.1 Node affinity
The spec. Affinity. NodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution
(Priority execution plan) : Soft strategyrequiredDuringSchedulingIgnoredDuringExecution
(Execution plan required) : Hard strategy
Key value operation relation:
Key description Inlabel value in a list NotInlabel value not in a list Gtlabel value greater than a value Ltlabel value less than a value Exists a label Exists DoesNotExist a label DoesNotExist
Soft strategy:
[root@master schedule]
apiVersion: v1
kind: Pod
metadata:
name: affinity
labels:
app: node-affinity-pod
spec:
containers:
- name: with-node-affinity
image: hub.hc.com/library/myapp:v1
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker3
[root@master schedule]
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
affinity 1/1 Running 0 39s 10.244.2.92 worker2 <none> <none>
Copy the code
Hard strategy:
[root@master schedule]
apiVersion: v1
kind: Pod
metadata:
name: affinity2
labels:
app: node-affinity-pod
spec:
containers:
- name: with-node-affinity
image: hub.hc.com/library/myapp:v1
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker3
[root@master schedule]
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
affinity2 0/1 Pending 0 23s <none> <none> <none> <none>
[root@master schedule]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 49s default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector.
Copy the code
2.2 Pod affinity
The spec. Affinity. PodAffinity/podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution
(Priority execution plan) : Soft strategyrequiredDuringSchedulingIgnoredDuringExecution
(Execution plan required) : Hard strategy
[root@master schedule] apiVersion: v1 kind: Pod metadata: name: pod-2 labels: app: pod-2 spec: containers: - name: pod-2 image: hub.hc.com/library/myapp:v1 affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - pod-1 topologyKey: kubernetes.io/hostname podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - pod-2 topologyKey: kubernetes.io/hostname [root@master schedule] pod-2 0/1 Pending 0 4s <none> <none> <none> <none> [root@master schedule] NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES POD-1 1/1 Running 0 5s 10.244.2.94 worker2 < None > <none>Copy the code
The affinity and anti-affinity scheduling policies are compared as follows:
Scheduling policy matching tag operator Topology domain support scheduling target nodeAffinity host In, NotIn, Exists,DoesNotExist, Gt, Lt Not specify host podAffinityPODIn, NotIn, DoesNotExist is POD in the same topological domain as designated POD
2.3 Taint and Toleration
Node affinity is a property (preference or hard requirement) of PODS that makes them attracted to a particular class of nodes. Taint, by contrast, enables nodes to reject a particular class of pods
Taint and Toleration work together to prevent pods from being assigned to inappropriate nodes. One or more TAINTs can be applied to each node, which means that pods that do not tolerate these taints will not be accepted by the node. Toleration, if applied to pods, means that these pods can, but are not required to, be scheduled to nodes with a matching TAINT.
1) stain (Taint
The composition of)
Using kubectl taint command can be set to a certain Node Node, the Node is set on the stain and then a reactive relationship between and Pod, can let the Node refused to Pod scheduling execution, and even the Node existing Pod out the composition of each spot is as follows: key=value:effect
Each stain has a key and value as the label of the stain, where value can be empty and Effect describes the effect of the stain. Taint Effect currently supports the following three options:
- NoSchedule:
K8S
There will be no willPod
Dispatches to those with the stainNode
on - PreferNoSchedule:
K8S
Will try to avoid willPod
Dispatches to those with the stainNode
on - NoExecute:
K8S
There will be no willPod
Dispatches to those with the stainNode
On, at the same timeNode
Existing onPod
Out of here
The setting, viewing and removal of stains
kubectl describe node node-name
kubectl taint nodes node1 key1=value1:effect
kubectl taint nodes node1 key1=value1:effect
Copy the code
A Node with a taint will have a mutually exclusive relationship between Pod and taint effect, and Pod will not be scheduled to Node to some extent. But Toleration can be set up on pods, and pods with Toleration can tolerate stains and be scheduled to nodes with stains.
Configurations of ** toleration: **
spec:
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
tolerationSeconds: 3600
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoExecute"
- key: "key2"
operator: "Exists"
effect: "NoSchedule"
Copy the code
Description:
- Among them
key
,vaule
,effect
To work withNode
Set on thetaint
consistent operator
The value ofExists
Will be ignoredvalue
valuetolerationSeconds
: whenPod
When you need to be evictedPod
Continues to reserve the running time on
① If the key value is not specified, all tainted keys are tolerated
tolerations:
- operator: "Exists"
Copy the code
② If the effect value is not specified, all stain effects are tolerated
tolerations:
- key: "key"
operator: "Exists"
Copy the code
③ If multiple masters exist, you can set the following parameters to prevent resource waste
kubectl taint nodes Node-Name node-role.kubernetes.io/master=:PreferNoSchedule
Copy the code
2.4 Specifying a scheduling Node
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myweb
spec:
replicas: 7
template:
metadata:
labels:
app: myweb
spec:
nodeName: worker1
nodeSelector:
type: theSelected
containers:
- name: myweb
image: hub.hc.com/library/myapp:v1
ports:
- containerPort: 80
Copy the code
Description:
spec.nodeName
Will:Pod
Schedules directly to the specifiedNode
Node, will be skippedScheduler
The matching rule is mandatory matchingspec.nodeSelector
Through:K8S
thelabel-selector
Mechanism selects nodes, which are matched by scheduler scheduling policieslabel
, and then schedulingPod
To the target node, the matching rule is mandatory- to
Node
Tag:kubect; label node worker1 type=theSelected
Xiao Liu, go first