preface

Today found a new thing “Descheduler”, found or a very interesting usage, and then test yourself to use

Kube-scheduler is familiar with it. In THE K8S environment, the master node has a necessary point component and the scheduler plays a point role. When watch arrives at new resources, it also maintains a set of scheduling policies.

The built-in Scheduler faces problems

From the perspective of Kube-Scheduler, it is perfect to calculate the best node to run Pod through various algorithms. When a new Pod is scheduled, the scheduler will make the best scheduling decision according to its resource description of Kubernetes cluster at that time. However, Kubernetes cluster is very dynamic. Due to cluster-wide changes, for example, when a node is first evicted for maintenance, all pods on that node will be evicted to other nodes, but when the maintenance is completed, the previous Pod will not automatically return to that node. Because pods do not trigger rescheduling once they are bound to nodes, the Kubernetes cluster becomes unbalanced for a period of time due to these changes, and an equalizer is needed to rebalance the cluster.

RemoveDuplicates: This policy ensures that only one RS, RC, Deployment, or Job resource object associated with a Pod runs on the same node. If there are more pods, these duplicate pods are expelled to better spread the pods across the cluster. If some nodes collapse due to some reason, the Pod drift on the node to other nodes, lead to multiple associated with RS or RC Pod run on the same node, it may happen, once appear, fault node is ready again, can enable the strategy to expel these repeated Pod, at present, There are no parameters associated with the policy and it is easy to disable the policy by setting it to false.

RemovePodsViolatingInterPodAntiAffinity: this strategy ensures that is removed from the node in violation of the pod of the affinity between pod.

LowNodeUtilization: This policy looks for underutilized nodes and, if possible, deports pods from other nodes, hoping to arrange for the re-creation of expelled pods on these underutilized nodes to achieve an equal load on all nodes in the environment

Official note: github.com/kubernetes-…

The balanced pod distribution in the cluster is implemented by scheduling tasksapiVersion: v1
kind: ConfigMap
metadata:
  name: descheduler-policy-configmap
  namespace: kube-system
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      "RemoveDuplicates":
         enabled: true
      "RemovePodsViolatingInterPodAntiAffinity":
         enabled: true
      "LowNodeUtilization":
         enabled: true
         params:
           nodeResourceUtilizationThresholds:
             thresholds:
               "cpu" : 20
               "memory": 20
               "pods": 20
             targetThresholds:
               "cpu" : 30
               "memory": 30
               "pods": 30

---

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: descheduler-cronjob
  namespace: kube-system
spec:
  schedule: "*/2 * * * *"
  concurrencyPolicy: "Forbid"
  jobTemplate:
    spec:
      template:
        metadata:
          name: descheduler-pod
        spec:
          priorityClassName: system-cluster-critical
          containers:
          - name: descheduler
            image: hzjulius/descheduler:v010.. 0
#us.gcr.io/k8s-artifacts-prod/descheduler/descheduler:v010.. 0Originally this mirror, but to climb the wall, directly tag to get their own warehouse usevolumeMounts:
            - mountPath: /policy-dir
              name: policy-volume
            command:
              - "/bin/descheduler"
            args:
              - "--policy-config-file"
              - "/policy-dir/policy.yaml"
              - "--v"
              - "3"
          restartPolicy: "Never"
          serviceAccountName: descheduler-sa
          volumes:
          - name: policy-volume
            configMap: name: descheduler-policy-configmap --kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: descheduler-cluster-role
  namespace: kube-system
rules:
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create"."update"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get"."watch"."list"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get"."watch"."list"."delete"]
- apiGroups: [""]
  resources: ["pods/eviction"]
  verbs: ["create"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: descheduler-sa
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: descheduler-cluster-role-binding
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: descheduler-cluster-role
subjects:
  - name: descheduler-sa
    kind: ServiceAccount
    namespace: kube-system

Copy the code

test

Deploy one service on one node

View the scheduled task after it is started

You can see that many pods are automatically expelled after the scheduled task starts The pod is rescheduled and then distributed evenly

Check Descheduler’s log to see the details of the specific scheduling