Introduction: OpenKruise, the open source cloud native application automation management suite of Ali Cloud and CNCF Sandbox project, today released the new version of V0.10.0, which will also be the last minor version of OpenKruise before V1.0. This article will take you through the new changes in V0.10.0, including the addition of WorkloadSpread, PodUnavailableBudget and other large particle features.

The authors wish | wine

background

OpenKruise, the open source cloud native application automation management suite of Ali Cloud and CNCF Sandbox project, today released a new version of V0.10.0, which will also be the last minor version of OpenKruise before V1.0.

This article will take you through the new changes in V0.10.0, including the addition of WorkloadSpread, PodUnavailableBudget and other large particle features.

An overview of new features

1. WorkloadSpread: enables bypass application elastic topology management

In application deployment o&M scenarios, topology disaggregation and elasticity are required. The most common and basic of these is to break up at one or several topological levels, such as:

  • Application deployment must be scattered based on node dimensions to avoid stacking (improve disaster recovery capability)
  • Scatter applications based on available zones (AZs) to improve disaster recovery capability.

These basic requirements are currently met through the capabilities provided by Kubernetes natively such as pod Affinity, Topology Spread Constraints, and so on. However, in real production scenarios, there are too many more complex partitions and elastic requirements. Here are some practical examples:

  • For example, the Pod quantity ratio of an application deployed in zones A, B, and C is 1:1:2:1 (for example, the traffic of the application is unbalanced in multiple zones).
  • If multiple zones or different models exist in the topology, deploy them to one zone or model before capacity expansion. If resources are insufficient, deploy them to another zone or model (and so on). Apply capacity reduction in reverse order, first reduce the capacity of the zone or Pod on the model (and so on)
  • Multiple basic and elastic node pools exist. During application deployment, a fixed number or proportion of PODS must be deployed in the basic node pool, and other pods must be deployed in the elastic node pool

For these examples, in the past, one application could only be divided into multiple Workload (such as Deployment) to solve the basic problems of application using different proportions, scaling and scaling priorities, resource awareness, elastic selection and other scenarios under different topologies. However, deep customization of PaaS layer is still required. To support refined management of multiple workloads for an application.

To solve these problems, WorkloadSpread resource was added in Kruise V0.10.0. Currently, it supports Deployment, ReplicaSet, CloneSet Workload types. To manage the partitioning and elastic topology of their subordinate PODS.

Here is a simplified example:

apiVersion: apps.kruise.io/v1alpha1
kind: WorkloadSpread
metadata:
  name: workloadspread-demo
spec:
  targetRef:
    apiVersion: apps/v1 | apps.kruise.io/v1alpha1
    kind: Deployment | CloneSet
    name: workload-xxx
  subsets:
  - name: subset-a
    requiredNodeSelectorTerm:
      matchExpressions:
      - key: topology.kubernetes.io/zone
        operator: In
        values:
        - zone-a
    maxReplicas: 10 | 30%
  - name: subset-b
    requiredNodeSelectorTerm:
      matchExpressions:
      - key: topology.kubernetes.io/zone
        operator: In
        values:
        - zone-b
Copy the code

This WorkloadSpread can be created by associating it with a Workload object through targetRef, and then in the process of expanding pod capacity, POD will be injected by Kruise into corresponding topology rules according to the above policy. This is a bypass injection and management mode, which itself will not interfere with Workload’s Pod expansion and release management.

Note: WorkloadSpread prioritized Pod reduction through Pod Deletion Cost:

  • If the Workload type is CloneSet, this feature is already supported and scaling priority can be achieved
  • Kubernetes version >= 1.21 for Deployment/ReplicaSet PodDeletionCost feature-gate is enabled on kube-controller-Manager in 1.21

To use WorkloadSpread, you need to open the WorkloadSpread feature-gate when installing/upgrading Kruise V0.10.0.

The above example is only for the simplest configuration. For more instructions, please refer to the official website documentation. We will share the specific implementation principle with you in the following articles.

2. PodUnavailableBudget: Application availability defense

The Pod Disruption Budget (PDB) provided by Kubernetes natively ensures high availability by limiting the number of pods that can be disrupted at the same time.

However, in many scenarios, even with PDB protection, services will be interrupted and degraded. For example:

  • The application owner is upgrading the version through Deployment. Meanwhile, the cluster administrator is scaling down nodes due to low machine resource utilization
  • The middleware team is updating versions of sidecar in the cluster (e.g., ServiceMesh Envoy) in place with SidecarSet, while the HPA is scaling down the same set of applications
  • The application owner and middleware team are upgrading the same batch of pods, taking advantage of CloneSet and SidecarSet’s ability to upgrade in place

This is intuitive — THE PDB can only prevent Pod Eviction triggered by the Eviction API (e.g. Kubectl drain expels all pods on node), but many operations like Pod removal, in-place upgrades, etc. cannot be defended.

The PodUnavailableBudget (PUB) feature added in Kruise V0.10.0 is an enhanced extension of the native PDB. It incorporates PDB’s own capabilities and builds on them with additional protection against more Voluntary Disruption operations, including but not limited to Pod deletion, in-place upgrades, and more.

apiVersion: apps.kruise.io/v1alpha1 kind: PodUnavailableBudget metadata: name: web-server-pub namespace: web spec: targetRef: apiVersion: apps/v1 | apps.kruise.io/v1alpha1 kind: Deployment | CloneSet | StatefulSet | ... # selector: # matchLabels: # app: web-server # Guarantee the maximum number of unusable maxUnavailable: # minAvailable: 40%Copy the code

To use PodUnavailableBudget, you need to open feature-gate when installing/upgrading Kruise V0.10.0 (you can choose to open either or both) :

  • Intercept PodUnavailableBudgetDeleteGate: protective Pod delete, deportation, and so on
  • Intercept PodUnavailableBudgetUpdateGate: protective Pod update operation in situ upgrade, etc

For more instructions, please refer to the official website documentation. The specific implementation principle will be shared with you in subsequent articles.

3. CloneSet supports capacity reduction based on topology rules

When The CloneSet replicas are reduced, there is a fixed algorithm for selecting which pods to delete:

  1. Unscheduled < scheduled
  2. PodPending < PodUnknown < PodRunning
  3. Not ready < ready
  4. The smaller POD-deletion cost is less than the larger POD-deletion cost
  5. Larger scatter weight < smaller
  6. The time in Ready is shorter than longer
  7. The container restarts more times than less times
  8. Short creation time < long creation time

Among them, “4” is the feature provided in Kruise V0.9.0, which is used to support the user to specify the deletion order (WorkloadSpread uses this function to achieve the reduction priority); The “5” feature is currently available in V0.10.0, that is, when scaling down, it will refer to the topology of the application to sort.

  • If the application is configured with Topology spread constraints, then CloneSet scaling will select pods to remove based on the topology dimension (for example, try to even out the number of pods deployed on multiple zones).
  • If the application is not configured with topology spread constraints, by default CloneSet scaling will select Pod removal based on node dimension fragmentation (minimizing the number of stacks on the same node).

Advanced StatefulSet supports streaming capacity expansion

In order to avoid a large number of failed pods being created after a new Advanced StatefulSet is created, starting with Kruise v0.10.0, the maxUnavailable strategy in the scale strategy was introduced:

apiVersion: apps.kruise.io/v1beta1
kind: StatefulSet
spec:
  # ...
  replicas: 100
  scaleStrategy:
    maxUnavailable: 10% # percentage or absolute number
Copy the code

When this field is set, Advanced StatefulSet ensures that the number of unusable pods after pod creation does not exceed this limit.

For example, the StatefulSet above will initially create only 10 pods at once. After that, every time a pod becomes running or ready, a new pod will be created.

Note: This feature is only allowed in StatefulSet where podManagementPolicy is’ Parallel ‘.

5. Other

In addition to the above, some other changes are as follows:

  • SidecarSet added fields such as imagePullSecrets and injectionStrategy. Paused to support sidecar mirror pull secret and paused injection
  • Advanced StatefulSet supports image preheating with in-place upgrade

For details, see the ChangeLog documentation.

The last

V0.10.0 will be the last minor release of OpenKruise before v1.0, and Kruise will release its first major v1.0 release before the end of the year, so stay tuned!

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.