See the previous link: juejin.cn/post/686596…

Mixing system design

Based on Kubernetes, we implemented the in-service/off-line business hybrid system, following the following design principles:

  • Dynamic scheduling: Implements dynamic scheduling of offline services based on the actual load of nodes
  • Dynamic resource allocation and isolation: Dynamically adjusts the amount of resources allocated to offline services based on the load of online services and implements resource isolation policies to reduce or eliminate performance interference
  • Plug-in: do not do any in-tree invasive changes to K8S, all components should be based on the k8S extension mechanism development, and the mixed system itself has strong scalability
  • Timely response: If the resource usage of mixed node is too high or online services are affected, the system detects and evades offline services in a timely manner to ensure the SLA of online services
  • Operation and maintenance, observability: friendly to users and operation and maintenance personnel, low access cost, low use cost image

Figure 6 System architecture

Resource Reclaim

Resource Reclaim Refers to Reclaim resources that have been applied for but are still idle by online services and then provide them to offline services. This part of resources is low-quality and does not have high availability guarantee.

We defined colocation/ CPU and Colocation /memory (corresponding to the original CPU and memory respectively) to characterize the reclaimed Resource and realize the dynamic scheduling of offline tasks.



Figure 7 resource reclaim

If the CPU Usage of online services on the node is high, the resources allocated to offline services are reduced. When the online service CPU Usage is low, we can allocate more resources to the offline service.

Dynamic scheduling

Dynamic scheduling of offline tasks is implemented based on colocation/ CPU and colocation/memory resources. Offline tasks are preferfully scheduled to mixed nodes with low load and few offline tasks to balance loads among nodes and reduce resource competition among services.

Dynamic resource allocation and isolation

Google has been investing heavily in data center resource management for many years. Due to the limitations of hardware in performance isolation, Google has made a lot of changes in the software level and pioneered a number of resource isolation technologies, including Cgroups, Container, etc. (many features of the kernel are triggered by business requirements rather than imagined). Our performance isolation of in/offline business is also primarily achieved through Cgroups.

Kubelet CGroup Manager has no extensibility point, and direct modification of Kubelet code will bring relatively high cost to subsequent operation and maintenance upgrade. Therefore, we independently developed a Zeus-Isolation Agent to run on each mixed node. The cGroup is dynamically updated periodically to achieve in-/ off-line service resource isolation.



Figure 8 In/offline business resource isolation

From CPU, memory, cache, disk to network, we have implemented a multi-dimensional isolation strategy to significantly reduce the interference between/offline services. Taking cache as an example, we customized the kernel to set different cache reclamation priorities for in-service and off-line services, and preferentially reclaim caches used by offline services.

Rescheduling of offline services

In this scenario, at the beginning, there are few online services on the mixed node, the load is low, and more resources can be allocated to offline services. Therefore, users can schedule more offline services. However, users schedule more online services or the traffic of online services surges. As a result, the resources available for offline services are limited on nodes and the execution efficiency of offline tasks is low. If other mixed-part nodes are idle at this time, we will reschedule offline tasks to other nodes in order to avoid the hunger of offline tasks and reduce resource competition among services.

The rescheduling of offline tasks has the following advantages:

  • Balance the load of each mixed node to avoid high load of some nodes and too idle of others
  • Avoid excessive load on a node, which may affect the performance and stability of online services
  • Improve the execution efficiency of offline services

However, rescheduling also has disadvantages. If there is no remote checkpoint mechanism, the computation power before rescheduling will be wasted. The extent of the impact depends on how long a task is handled. If a task takes seconds to process, the effect of rescheduling is minimal. If the processing time of a task is in the order of days, then rescheduling has a significant impact. Therefore, workload level configurations can be implemented for users, such as whether to use the rescheduling function and the trigger threshold of rescheduling.

The ground work

The above scheme of in-service/off-line business mixing has been integrated into NETEASE Light Boat container platform NCS, which has been widely applied in netease, greatly improving the utilization rate of server resources and achieving remarkable results.

Taking netease Media as an example, the media mixed the video transcoding business as offline business to the online business machine, and the CPU utilization increased from 6%-15% to about 55% after mixing.

Let’s first understand the features of video transcoding service:

  • CPU intensive, a large number of read and write disks store temporary data and have a certain amount of network IO
  • A long-running pod, rather than a run-to-complete pod, will continuously fetch video tasks from the queue for transcoding, and then idle and keep running if there are no tasks
  • Transcoding a single video takes seconds, so rescheduling has little impact on it

Redis + video transcoding

Redis service is delaysensitive online service, SLO has higher requirements, but its CPU utilization is low. Therefore, we try to mix video transcoding service to Redis exclusive node. Let’s take a look at the effects of these two mixed services in/out.



Figure 9 CPU utilization before and after Redis node mixing

As can be seen from Figure 9, the CPU utilization rate of Redis node is about 8% before mixing, and reaches 30-35% after mixing, indicating a significant increase in utilization rate.

Then let’s look at the RT comparison of redis’ SET/GET operation before and after mixing.



Figure 10 Average response time of Redis GET operation



Table 3 Average response time of Redis GET operation

As can be seen from Figure 10 and Table 3, RT of GET operation before and after mixing is basically unchanged.



FIG. 11 Average response time of Redis SET operation



Table 4 Average response time of Redis SET operation

It can be seen from FIG. 11 and Table 4 that THE RT of SET operation basically does not change before and after mixing.

Advertisement recommendation + video transcoding

Advertising recommendation service is also a delay-sensitive online service that requires high stability and performance. Let’s take a look at the effect achieved by mixing the transcoding service and advertising recommendation service (there are other types of online services on the node. Here we take delay-sensitive advertising recommendation service as an example).



Figure 12 Comparison of NODE CPU usage before and after mixing

As can be seen from Figure 12, CPU utilization was between 10-20% before mixing, and remained at about 55% for a long time after mixing, with a large increase in utilization rate.

The purpose of mixing services is to improve resource utilization and reduce costs, but the prerequisite is that the performance of online services is not significantly affected. Therefore, let’s take a look at the change of one of the core performance indicators of the AD recommendation service before and after mixing:



Figure 13 AD recommendation service request processing time

It can be seen from Figure 13 that there is no attenuation or deterioration of this core performance index before and after mixing. The average RT before and after mixing is 6.59ms and 6.65ms:



Table 5 Average RT index of advertising recommendation service before and after mixing

Summary and Outlook

At present, the mixed department has been widely implemented on netease and achieved remarkable results. In the future, we will continue to explore and practice the cloud native technology, and based on netease Light Boat, we will introduce the mixing scheme to more enterprises, so that more enterprises can enjoy the dividend of cloud native technology.

reference

[1] The Datacenter asa Computer [2] Overall Data Center Costs [3] Analysis of Data Center Costs and utilization status [4] Quasar: Resource-efficient and Qos-aware Cluster Management [5] Performance Isolation for Commercial Latency-Sensitive Services [7] Borg: the Next Generation [8] Autopilot: workload auto scaling at Google [9] Improving Resource Efficiency in Cloud Computing

See here friends, if you like this article, don’t forget to forward, favorites, message interaction!

Recently I sorted out some Java materials, including personal selection of Java architecture learning video, dachang actual combat knowledge, dachang internal interview questions, if you need, welcome to private letter me!

If you have any questions about this article, please feel free to contact me in the comments section