How do I adjust the sensitivity of HPA capacity expansion and contraction based on different service scenarios

background

Before K8s 1.18, sensitivity cannot be adjusted by HPA expansion:

For shrinkage, bykube-controller-manager 的 --horizontal-pod-autoscaler-downscale-stabilization-windowParameter Controls the capacity reduction time window. The default value is 5 minutes. That is, after the load is reduced, the capacity reduction takes at least 5 minutes.
For capacity expansion, the hPA controller’s fixed algorithm and hard-coded constant factor control the capacity expansion speed, which cannot be customized.

As a result, users cannot customize the sensitivity of the HPA for expansion and reduction. Different service scenarios may have different sensitivity requirements for capacity expansion, for example:

For critical services with traffic bursts, you should expand quickly when needed (even if not, just in case) but slowly (to prevent another traffic spike).
For some offline services that need to process a large amount of data, the capacity should be expanded as soon as possible to reduce the processing time, and the capacity should be reduced as soon as possible to save costs when resources are not needed.
Businesses that deal with regular data/network traffic may be scaled up and down in a general way to reduce jitter.

HPA received an update in K8s 1.18, which added the scaling sensitivity control to the previous v2Beta2 version, but the version number remains unchanged.

How to use

This update is actually in the HPA under a new Spec behaviors fields, scaleUp and scaleDown below two fields respectively control the expansion and shrinkage behavior, specific can consult the official API documentation: V1-18. The docs. Kubernetes. IO/docs/refere…

Examples of usage scenarios are given below.

The rapid increase

When your application needs rapid expansion, you can use an HPA configuration similar to the following:

apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: web spec: minReplicas: 1 maxReplicas: 1000 metrics: - pods: metric: name: k8s_pod_rate_cpu_core_used_limit target: averageValue: "80" type: AverageValue Type: Pods scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: Web behavior: # policies: - type: percent value: 900%Copy the code

The above configuration indicates that the current number of replicas is increased to 9 times, that is, the current number of pods is increased to 10 times, but the maxReplicas limit cannot be exceeded.

If there is only one Pod at the beginning, it will expand capacity at a rapid speed if traffic bursts. The trend of the number of PODS during capacity expansion is as follows:

1 -> 10 -> 100 -> 1000
Copy the code

If no scaling policy is configured, the scaling starts after the global default scaling time window (– horizontal-pod-AutoScaler-downscale-stabilization -window, 5 minutes by default).

Fast capacity expansion and slow capacity reduction

If the traffic peak is over, the number of concurrent pods will drop precipitously. If the default capacity reduction strategy is used, the number of PODS will drop precipitously after a few minutes. If a traffic peak occurs suddenly after the capacity reduction, the capacity expansion process can be rapidly expanded, but it still takes some time. During this period, the back-end processing capacity may fail and some requests may fail. In this case, you can add a capacity reduction policy to the HPA. The following is an example of the HPA behavior configuration:

behavior: scaleUp: policies: - type: percent value: 900% scaleDown: policies: - type: pods value: 1 periodSeconds: 600 # shrinks only 1 Pod every 10 minutesCopy the code

In the previous example, the configuration of scaleDown is added to specify that one Pod is scaled down every 10 minutes, greatly reducing the capacity reduction speed. The trend of the number of PODS during capacity reduction is as follows:

1000 - >... (10 min later) -> 999Copy the code

This enables key services to maintain the processing capability even when traffic bursts, preventing some requests from failing due to traffic peaks.

Slowly increase

Add the following behavior to the HPA if you want your application to be less critical and not too sensitive during expansion:

Behavior: scaleUp: policies: -type: Pods Value: 1Copy the code

If there was only one Pod at the beginning, the number of pods would change as follows:

1 -> 2 -> 3 -> 4
Copy the code

Disable automatic capacity reduction

If the application is critical and you want to avoid automatic capacity reduction after capacity expansion, manual intervention or other self-developed controllers are required to determine the capacity reduction conditions. You can use the following behavior configuration to disable automatic capacity reduction:

behavior:
  scaleDown:
    policies:
    - type: pods
      value: 0
Copy the code

Extend the capacity reduction time window

The default time window for scaling down is 5 min (–horizontal-pod-autoscaler-downscale-stabilization-window). If we need to extend the time window to avoid some exceptions caused by flow burrs, we can specify the time window for scaling down. The following is an example of the behavior configuration:

Behaviors: scaleDown: stabilizationWindowSeconds: 600 # wait for 10 minutes and then began to shrink capacity policies: - type: the pods value: 5 # shrinks away each time a PodCopy the code

The above example shows that when the load comes down, it waits for 600s (10 minutes) before scaling down, five pods at a time.

Extend the expansion time window

Some applications often have data burrs leading to frequent capacity expansion, but the expanded Pod is not necessary, but a waste of resources. For example, in the scenario of data processing pipeline, the capacity expansion indicator is the number of events in the queue. When a large number of events accumulate in the queue, we hope to expand the queue quickly, but do not want to be too sensitive, because events may accumulate in a short period of time, and can be quickly handled even without capacity expansion.

The default capacity expansion algorithm expands capacity within a short period of time. In this scenario, we can add a time window for capacity expansion to avoid resource waste caused by burrs. The behavior configuration example is as follows:

Behaviors: scaleUp: stabilizationWindowSeconds: 300 # expansion policies: wait 5 minutes before the time window - type: the pods value: 20 # expansion of new Pod 20 at a timeCopy the code

The preceding example shows that during capacity expansion, you need to wait for a 5-minute window. If the load drops during this period, you do not need to expand capacity. If the load continues to exceed the capacity expansion threshold, you need to add 20 pods for each capacity expansion.

summary

This paper introduces how to make use of the new HPA feature of K8s 1.18 to control the sensitivity of expansion capacity, so as to better meet the demand of expansion speed in different scenarios.

The resources

The HPA is introduced: kubernetes. IO/docs/tasks /…