Kubernetes Standard Guide for Cost Reduction and Efficiency | ProphetPilot: Container Intelligent Cost Management Engine

The author

Qi Tian, senior engineer of Tencent Cloud, focuses on large-scale offline mixing, elastic scaling, cloud native cost optimization, familiar with Kubernetes, and focuses on cloud native big data and AI.

Wang Xiaowei, product manager of Tencent Cloud Container, is keen to provide customers with efficient use of Kubernetes, and provide customers with the ultimate cost reduction and efficiency service.

preface

With the popularity of Kubernetes, enterprises have generally embraced containers and are moving to cloud native. But the current Kubernetes only solve the first step of cloud native (Day 1), is to use container scheduling and declarative API, etc., to solve the resource acquisition, application deployment, high availability disaster recovery, basic operation and maintenance and other problems. However, the current enterprises that adopt Kubernetes also encounter the problem of going to the advanced stage. It is very difficult to operate the huge and complex Kubernetes cluster, for example:

How are resource needs assessed? For example, the native Kubernetes scheduling needs to be based on the container’s Request for resources. How much resources does a container need? How much should I set the total resource capacity of the cluster to? What is the right number of nodes?
How to achieve truly smart scaling? How to deal with changing flow with peaks and troughs flexibly? If you want to set the HPA, which metric should you use? How to set the conversion range of the copy number?
How do I view the cost status of a container? Currently, you can view the resource usage and utilization of the cluster. However, how much is the bill for these resources? Kubernetes cluster may have nodes, hard disks, networks, LB and other resources. How to aggregate these discrete resource bills and display them in different dimensions?
How to improve resource efficiency? How do I identify invalid and unreasonable resource application loads? How to choose the most reasonable node specifications and charging type?
.

These issues need to be addressed in the second step of Kubernetes operations (Day 2) :

TKE has always been committed to reducing the operational complexity of users, adding an intelligent engine to Kubernetes and reducing the operational burden of customers. In previous installments of the series, we covered cost control systems, common utilization tools, resource utilization phenomena analysis, understanding, and application resiliency. TKE proposed container Intelligent ProphetPilot based on user requirements. This paper will illustrate the concepts related to ProphetPilot from the background, product functions and layered models.

Background and Current situation

TKE currently has many flexible, resource utilization, cost saving, and load aware scheduling components, such as HPA, HPC, VPA, CA, and offline mixing. For more information, see resource utilization improvement tools. Although TKE provides customers with a variety of cost-reducing and cost-enhancing products, it still has the following deficiencies.

Faced with so many scalable components, users often find it difficult to choose. Currently, HPA and CA are the most widely used components. Other components are often afraid to use or do not know when to use them, and why they need to use them
Some components are community components or Kubernetes native capabilities, the function is often not strong enough, such as HPA, expansion and shrinkage perception is not timely and accurate, does not necessarily meet customer needs
Components lack a coherent experience. They have their own functions and even conflict when used together. For example, VPA and HPA cannot use CPU or memory metrics at the same time

The following figure compares the relationship between the components, mainly from Observer, Analyzer, and Action to illustrate the relationship between the dimensions:

Observer: Monitors and collects the health of the workload
Analyzer (Analysis) : Analyzes its operating mode. For example: 1. Does the CPU usage of the load change periodically? Example 2. Does the load flow rise or fall at certain fixed times? Example 3. Is the Request set properly in the load container? , etc.
Action: Executes the best policy based on the running pattern of the analyzed load. In the example of the Analyzer dimension, example 1 can recommend using HPA, Example 2 can recommend using HPC, and example 3 can recommend using VPA

Function introduction

ProphetPilot features an Observer and an Analyzer that can run through elastic and mixed components. TKE uses its own Executor to meet rapidly evolving business requirements in scenarios where open source components cannot meet the requirements:

Recommend center

Recommendation is the brain of the Cost management center. It measures and calculates the relationship between Cost and Efficiency and Reliability. Such as:

The current cluster is not suitable for use in offline mixing. Why is it suitable for off-line mixing? The reason may be that it observed that the indicators of most containers were periodic high and low peak, but there was no stable off-line high load container, so it suggested users to implement mixing, so as to improve the efficiency of resource use.
In the current cluster, if the resource utilization of some containers is stable and the resources are low, you are advised to perform VPA operations to change Request and Limit. If the user is a rich container and does not want to accept the displayed Request Limit change, you are advised to perform mixed suggestions directly. Because mixers modify CGroup, this is not visible to users, but can improve resource utilization and schedule more pods;
Do YOU need to perform CA operations in the current cluster load? The traditional CA is simply aware of Pod Pending and has insufficient resources based on Request. However, the resource usage of the cluster may be very low, so VPA is more appropriate. TKE will soon support in-place updates, as many enterprises will not accept VPA’s Request to change the Pod and rebuild it.
Is HPA currently available? Traditional HPA simply updates data periodically from monitoring. In the case of burst traffic, the perception is slow, but the recommendation center can cooperate with the local perception and other replicas to quickly carry out HPA actions, so as to achieve rapid burst expansion of second-level HPA. EBPF is used to directly configure events to trigger expansion when a certain system call is perceived to be excessive.
For traditional PaaS platforms, such as DBA clusters, the application features of these clusters have the characteristics of databases, and experienced DBAs have more optimization experience in the tuning of database features and parameters, we allow the PaaS platform to customize the recommendation strategy of the recommendation center. So that the PaaS platform to achieve the ultimate business resource optimization, cost control, stability assurance;
For traditional general Web services, computing services and other non-middleware and DBA businesses, these customers generally try to reduce their operation and maintenance as much as possible. In order to reduce the management complexity and operation and maintenance cost of these customers, TKE will combine real monitoring data and recommend the following contents according to the layered model:
1. Request and Limit for intelligent Workload recommendation;
2. Cluster resource assessment: evaluates the resource quantity based on the historical load level of the cluster.
3. A reasonable number of copies is recommended according to the current Workload QPS and load;
4. Forecast the current Node and Pod resource usage, feed back to the scheduler, make intelligent scheduling;
5. Recommended combination purchase of cloud instances, with the lowest price, the most reasonable configuration based on user purchase suggestions.

Cost analysis

Cost analysis focuses on observing the cost usage of the cluster from the point of view of cost, because the existing Kubernetes cluster can only see the use of resources, but can not analyze and observe more specific cost dimension data. The main function points include:

Responsible for demonstrating the cost consumption, resource consumption and resource efficiency of the business
Analyze Top10 resource consuming businesses, and promote business resource optimization and transformation
Analyze the estimated cost for each resource object in the next month for the cluster
View the cost curve, business growth curve, and cost curve comparison
Estimated cost savings based on the recommendations of the referral center
Multidimensional observations: node/namespace/workload /Pod/Container
Multi-cycle observation: day/week/month/custom
Generate regular reports
Save important historical data

Warning notice

Whether or not policies are automatically executed, warnings and alarms are provided when ProphetPilot detects that certain configurations in the cluster are incorrect or that certain actions need to be performed. In addition, data will be saved for each policy recommendation and action execution, so that users can view historical events and the reason for triggering events, facilitating history tracing.

strategy

A policy is how the recommendation center’s recommended solution will be executed. ProphetPilot will provide a variety of policies, divided into four main types:

Automatic execution policy: Fully customized execution policies, including user-defined trigger criteria. For example, e-commerce order services have high requirements on stability and can be configured with large resource redundancy. Even if the resource utilization is low, it is considered normal. However, some offline batch tasks have low priorities and are insensitive to latency. Therefore, higher resource utilization standards can be set.
3. To execute a strategy at some point in the future; Or delay the execution of the recommended action for a certain amount of time after the action is generated. For example, when the number of nodes shrinks and the workload shrinks, you can slow down as much as possible to prevent the traffic surge again.
Periodic policies: Similar to cronjobs, policies are executed periodically.
Manual confirmation policies: Manual actions. ProphetPilot does not automatically execute these policies. When a recommended action is generated, an alarm notifies the customer that the recommended action is manually executed.

Execution engine

The execution engine refers to specific actions, such as HPA, VPA, and offline mixing. For more information, see Resource utilization improvement tools.

Container intelligent layered model

The application layer

With the popularity of cloud native, micro service architecture, a set of enterprise applications often involve more than one service, subtle dependencies between services, to understand the relationship between the application of micro service, can let the elastic expansion has a more comprehensive perspective, to prevent a single local perspective, to increase the Workload also did not achieve the real effect, Thus wasting resources and costs.

In the following example, the NGINX log write rate, the Kafka producer write rate, Log offset change rate, consumption rate, and ES index calculation efficiency are correlated; Finding more critical metrics like Kafka’s production rate is a better scaling metric than CPU:

ProphetPilot understands the different application characteristics of the application layer. Middleware, database and other basic products in the market can be classified as applications. The services themselves can be classified into different levels based on the importance of the application. Thus, Workload’s correlation dependency chart can be established in the application layer to achieve the overall expansion and contraction of business rather than the Workload’s individual expansion and contraction.

Scheduling layer

Distributed Resource Scheduling

The current Kubernetes only solve the enterprise cloud native first step (Day 1), so that enterprises can use containers and Kubernetes scheduling, to solve the resource acquisition, application deployment, high availability disaster recovery, operation and maintenance and other problems, but it is in the resource model, is still in the initial stage, For example, r&d needs to evaluate how many resources the service needs, and then fill in Request. Finally, Kubernetes makes static resource scheduling according to Request, and schedules Pod to the appropriate Node.

Doing so will face the following problems:

In order to service stability, r & D often overevaluate resources, resulting in waste;
Research and development do not know how to evaluate, or even have no evaluation resources, I believe that most of the research and development can not see how much resources their services need at a glance, resulting in insufficient resources, or waste of resources;
Kubernetes is based on static resource scheduling, the actual container after the use of resources and the static resource scheduling decision is inconsistent, resulting in a waste of resources, or serious resource contention.

The current common practice is to do overbooking, generally two dimensions:

For example, the Node itself has only 48 cores, but it can be overbooked twice. Let it create an illusion for the scheduler and schedule according to 96 cores. Because the computing capacity and memory of different nodes do not match, the CPU of nodes with strong computing capacity can be oversold, so that more PODS can be scheduled to improve the efficiency of resource allocation.
Pod dimension overbooking is actually to configure the ratio between Limit and Request. At the same time, different enterprises have different schemes to guarantee whether Limit or Request is guaranteed for the resources declared by users. For example, in some enterprises, Limit is used to guarantee R&D resources. R&d fills in Limit when applying for resources, and container platforms convert Limit and Request into overbooking, so as to improve the efficiency of resource allocation. In the cloud native model, the concepts of Limit and Request are directly exposed to the user. In Kubernetes, Request refers to the minimum amount of resources that can be guaranteed (because Kubernetes allocates resources according to Request scheduling). Limit means a Limit on how many resources you can use.

To solve this problem, the fundamental reason is that the current system is based on static resource scheduling, not dynamic resource scheduling. If we schedule according to the Usage of the container, rather than according to the Request resource of the container, then we can achieve real payment by amount. It’s what real Serverless calls pay-as-you-go.

ProphetPilot computes data to accurately recommend container resource consumption, and recommends appropriate resource allocation. Let the container’s Request approach the Usage, allowing the scheduler to schedule resources for Usage.

Single machine container scheduling

At the stand-alone level, the container will do resource allocation scheduling and resource isolation according to the allocated resource Request and Limit. Although it can achieve resource isolation and resource sharing and reuse to a certain extent, the resource model provided by Kubernetes is still a basic and simple model at present.

Kubernetes uses Request to schedule nodes directly. At the stand-alone level, Linux uses THE CPU Share mode of Cgroup to allocate CPU resources to containers and divide CPU time slices according to the CPU Share weight. If a Node is filled according to Request, the CPU resources of the Node are divided among all containers according to the CPU Share weighting. It looks like a perfect computing model, but the kernel is not perfect for all scenarios of resource contention.

For example, in the offline mixed service scenario, the offline service uses multi-core parallel computing to improve the utilization of computing resources. If the exclusive binding of cores is not implemented, the online service will be affected by the traffic peak, thus affecting the SLA of the online service. Kernel layer to solve this problem, need to be very strong technical strength, often have to divides the application priority to choose and to allow for the ascent of the resource utilization at the same time, yields and guarantee the high priority application, and cast it away low priority application, so the division of application priority, may be the direction of the future, this is not only the application layer, from the application layer, From Kubernetes to the kernel layer, the system needs to be cleaned up. Currently Kubernetes uses only three default priorities, and more may be supported in the future.

(image source mp.weixin.qq.com/s/Cbck85Wmi…).

ProphetPilot allows users to define multiple load priorities, enabling different service guarantees for different priority applications. It is worth noting that too much priority will lead to the cascade effect of container expulsion. The so-called cascade effect means that a container is expelled because of insufficient node resources at the current node, and then is scheduled to another node, resulting in the expulsion of pods of lower priority on the other node. To avoid this situation, ProphetPilot takes flexible instances to prevent cascading evictions.

At the same time, some customers hope in the offline part, offline application cannot be expelled, unlimited demand offline applications must have a certain SLA, ensure that run within the given time, for this kind of demand, we adopt dynamic priority scheduling and elastic instance, after the offline Pod is out for the first time, will consider its cast times and computing time, If the calculation finds that the SLA cannot be guaranteed if the expulsion continues, priority improvement scheduling will be carried out. If there are still no resources, elastic public cloud instances will be carried out.

Resource layer

Today popularizes which in cloud computing, cloud vendor provides a series of selection of IaaS computing resources, storage resources, network resources, such as for tencent cloud of CVM, have hundreds of specifications, intelligent container model in the resource layer should choose the most suitable IaaS resources, so as to ensure that the container can stable operation, high efficiency and lowest cost.

Pricing model

Tencent Cloud server provides three charging modes: pay-per-quantity, pre-payment and bidding instance. Different charging modes have different application scenarios. ProphetPilot analyzes the historical instance billing model of a customer cluster and recommends different billing instances based on the future trend of cluster resources and users’ demands for cost.

Charging by volume: For example, in a cluster, if you have configured e-commerce promotion policies, you can enable the charging by volume instance at a specified time and disable it at the end of the time.
Annual and monthly package: For example, in the cluster, if some services run for more than one month or longer, but choose to charge by volume, it will be more cost-effective to change to annual and monthly package.
Bidding instance: for example, if the cluster resources are insufficient and the offline tasks need to be run for a short time, the service guarantee requirement is not high, but the cost is controlled, then the mode of flexible bidding instance can be adopted.

Aircraft configuration

Models are mainly CPU, memory, and disk configurations, including hardware models and specifications. ProphetPilot evaluates cluster node resources, business scale, and future growth trend, searches for configurations and price space of different models, and recommends a reasonable model ratio on the premise of meeting business resource requirements.

Cloud model

The current evolution mode of cloud is hybrid cloud, and the elastic resources of customer IDC and public cloud pull through. It is difficult for enterprises to evaluate whether IDC resources are sufficient, whether to enable the elastic resources to public cloud, and what IaaS resource instances will pop up. The current mode is usually the traditional batch procurement mode in advance according to the plan. Some enterprise resources are IDCs and some resources are in the public cloud. With a hybrid cloud network, enterprises will be able to achieve true on-demand flexibility.

In summary, at the resource level, ProphetPilot analyzes the node specifications distribution and resource utilization efficiency of a user cluster, and recommends the most appropriate charging mode and node specifications configuration.

conclusion

The essence of low resource utilization in Kubernetes cluster is the Request mechanism, which is the resource reservation and placeholder mechanism. To ensure the stability of services, services usually apply for large requests, which inevitably causes a large waste of resources. If you can schedule resources, adjust load, or even pay according to Usage, you can achieve true on-demand Usage. This is the ultimate form of cost management and the target state of ProphetPilot, which ensures service stability without managing or configuring any resources. As you go. If you have any suggestions or demands, welcome to contact us through the little assistant.