1.1 Introduction to the scheduler

Come to a small liu loaded force, today we will learn K8 scheduler

Scheduler is the Scheduler of Kubernetes. The main task of Scheduler is to assign defined PODS to nodes in the cluster. The following issues need to be considered:

  • Fairness: How to ensure that each node can be allocated resources
  • Efficient resource utilization: All cluster resources are used to the maximum extent
  • Efficiency: the scheduling performance is good, can quickly to large volumepodComplete scheduling work
  • Flexibility: Allows users to control scheduling logic based on their own needs

Scheduler is run as a separate program that connects to Apiserver to get podspec.nodename for empty podSpec.nodename, and creates a binding for each pod indicating which node the pod should be placed on.

1.2 Scheduling Process

Scheduling is divided into several parts:

Predicate has a series of algorithms that you can use:

  • PodFitsResources: Indicates whether the remaining resources on the node are greater thanpodRequested resource
  • PodFitsHostIf:podSpecifies theNodeNameTo check whether the node names are the sameNodeNamematching
  • PodFitsHostPorts: Used on the nodeportWhether andpodTo apply for theportconflict
  • PodSelectorMatches: Filter out andpodThe specifiedlabelMismatched node
  • NoDiskConflict: it’smountthevolumeandpodThe specifiedvolumeNo conflicts, unless they are both read-only

If there are no suitable nodes in the predicate process, the POD will remain in the pending state, retrying the schedule until a node meets the predicate condition.

After this step, if more than one node meets the criteria, the priorities process continues: the nodes are sorted by priority size. A priority consists of a series of key-value pairs. The key is the name of the priority item and the value is its weight. These priority options include:

  • LeastRequestedPriority: By calculationCPUandMemoryThe utilization rate to determine the weight, the lower the utilization rate, the higher the weight. In other words, this priority metric favors nodes with a lower percentage of resource usage
  • BalancedResourceAllocation: nodeCPUandMemoryThe closer the usage rate, the higher the weight. This should be used with the one above, not alone
  • ImageLocalityPriority: Indicates a node that already has a mirror. The greater the total size of the mirror, the higher the weight

All priority items and weights are calculated through the algorithm to get the final result.

1.3 Customizing a Scheduler

In addition to the K8S built-in scheduler, you can customize the scheduler. A scheduler can be selected for pod scheduling by specifying the name of the scheduler with the spec: SchedulerName parameter. For example, the following pod selects my-scheduler for scheduling instead of the default default-scheduler:

apiVersion: v1
kind: Pod
metadata:
  name: annotation-second-scheduler
  labels:
    name: multischeduler-example
spec:
  schedulername: my-scheduler
  containers:
 - name: pod-with-second-annotation-container
    image: GCR. IO/google_containers/pause: 2.0
Copy the code

2.1 Node affinity

The spec. Affinity. NodeAffinity:

  • preferredDuringSchedulingIgnoredDuringExecution(Priority execution plan) : Soft strategy
  • requiredDuringSchedulingIgnoredDuringExecution(Execution plan required) : Hard strategy

Key value operation relation:

Key description Inlabel value in a list NotInlabel value not in a list Gtlabel value greater than a value Ltlabel value less than a value Exists a label Exists DoesNotExist a label DoesNotExist

Soft strategy:

[root@master schedule]
apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: hub.hc.com/library/myapp:v1
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker3
[root@master schedule]
NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
affinity                 1/1     Running   0          39s   10.244.2.92   worker2   <none>           <none>
Copy the code

Hard strategy:

[root@master schedule]
apiVersion: v1
kind: Pod
metadata:
  name: affinity2
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: hub.hc.com/library/myapp:v1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker3

[root@master schedule]
NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
affinity2                  0/1     Pending   0          23s   <none>        <none>    <none>           <none>

[root@master schedule]
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  49s   default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.

Copy the code

2.2 Pod affinity

The spec. Affinity. PodAffinity/podAntiAffinity:

  • preferredDuringSchedulingIgnoredDuringExecution(Priority execution plan) : Soft strategy
  • requiredDuringSchedulingIgnoredDuringExecution(Execution plan required) : Hard strategy
[root@master schedule] apiVersion: v1 kind: Pod metadata: name: pod-2 labels: app: pod-2 spec: containers: - name: pod-2 image: hub.hc.com/library/myapp:v1 affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - pod-1 topologyKey: kubernetes.io/hostname podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - pod-2 topologyKey: kubernetes.io/hostname [root@master schedule] pod-2 0/1 Pending 0 4s <none> <none> <none> <none> [root@master schedule] NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES POD-1 1/1 Running 0 5s 10.244.2.94 worker2 < None > <none>Copy the code

The affinity and anti-affinity scheduling policies are compared as follows:

Scheduling policy matching tag operator Topology domain support scheduling target nodeAffinity host In, NotIn, Exists,DoesNotExist, Gt, Lt Not specify host podAffinityPODIn, NotIn, DoesNotExist is POD in the same topological domain as designated POD

2.3 Taint and Toleration

Node affinity is a property (preference or hard requirement) of PODS that makes them attracted to a particular class of nodes. Taint, by contrast, enables nodes to reject a particular class of pods

Taint and Toleration work together to prevent pods from being assigned to inappropriate nodes. One or more TAINTs can be applied to each node, which means that pods that do not tolerate these taints will not be accepted by the node. Toleration, if applied to pods, means that these pods can, but are not required to, be scheduled to nodes with a matching TAINT.

1) stain (TaintThe composition of)

Using kubectl taint command can be set to a certain Node Node, the Node is set on the stain and then a reactive relationship between and Pod, can let the Node refused to Pod scheduling execution, and even the Node existing Pod out the composition of each spot is as follows: key=value:effect

Each stain has a key and value as the label of the stain, where value can be empty and Effect describes the effect of the stain. Taint Effect currently supports the following three options:

  • NoSchedule:K8SThere will be no willPodDispatches to those with the stainNodeon
  • PreferNoSchedule:K8SWill try to avoid willPodDispatches to those with the stainNodeon
  • NoExecute:K8SThere will be no willPodDispatches to those with the stainNodeOn, at the same timeNodeExisting onPodOut of here

The setting, viewing and removal of stains


kubectl describe node node-name

kubectl taint nodes node1 key1=value1:effect

kubectl taint nodes node1 key1=value1:effect
Copy the code

A Node with a taint will have a mutually exclusive relationship between Pod and taint effect, and Pod will not be scheduled to Node to some extent. But Toleration can be set up on pods, and pods with Toleration can tolerate stains and be scheduled to nodes with stains.

Configurations of ** toleration: **

spec:
  tolerations:
    - key: "key1"
      operator: "Equal"
      value: "value1"
      effect: "NoSchedule"
      tolerationSeconds: 3600
    - key: "key1"
      operator: "Equal"
      value: "value1"
      effect: "NoExecute"
    - key: "key2"
      operator: "Exists"
      effect: "NoSchedule"
Copy the code

Description:

  • Among themkey,vaule,effectTo work withNodeSet on thetaintconsistent
  • operatorThe value ofExistsWill be ignoredvaluevalue
  • tolerationSeconds: whenPodWhen you need to be evictedPodContinues to reserve the running time on

① If the key value is not specified, all tainted keys are tolerated

tolerations:
 - operator: "Exists"
Copy the code

② If the effect value is not specified, all stain effects are tolerated

tolerations:
 - key: "key"
operator: "Exists"
Copy the code

③ If multiple masters exist, you can set the following parameters to prevent resource waste

kubectl taint nodes Node-Name node-role.kubernetes.io/master=:PreferNoSchedule
Copy the code

2.4 Specifying a scheduling Node

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 7
  template:
    metadata:
      labels:
        app: myweb
    spec:
      nodeName: worker1
      nodeSelector:
	    type: theSelected
      containers:
      - name: myweb
        image: hub.hc.com/library/myapp:v1
        ports:
        - containerPort: 80
Copy the code

Description:

  • spec.nodeNameWill:PodSchedules directly to the specifiedNodeNode, will be skippedSchedulerThe matching rule is mandatory matching
  • spec.nodeSelectorThrough:K8Sthelabel-selectorMechanism selects nodes, which are matched by scheduler scheduling policieslabel, and then schedulingPodTo the target node, the matching rule is mandatory
  • toNodeTag:kubect; label node worker1 type=theSelected

Xiao Liu, go first