DatenLord | Rust K8S scheduling extensions

Author: Pan Zheng


background

K8S scheduler (KUBE-Scheduler) is the controller of K8S scheduling POD, which runs as the core component of K8S. The scheduler finds suitable running nodes for pods based on certain scheduling policies.

Scheduling process

The K8S scheduler schedules a POD in three phases

  1. Filter phase: calls a seriespredicateThe function orfilterThe pod filter function filters out nodes that are not suitable for the pod declaration, such as CPU, memory, etc., then the nodes will be filtered out at this stage.
  2. Scoring phase: Calls a seriespriorityThe function orscoreThe function scores and sorts the nodes that pass the filtering in the first step. For example, if there are three nodes that pass the filtering stage, when the three nodes pass the score of the resource balancing function, the nodes with more remaining CPU, memory and other resources will get higher priority score.
  3. Scheduling phase: The POD is scheduled to select the node with the highest priority

The scheduling process is shown in the figure below:

Scheduling algorithm

As mentioned earlier, the K8S scheduler calls a series of filter functions and priority functions, which are K8S’s built-in scheduling algorithms, and K8S schedules pods to the most appropriate nodes based on the default scheduling algorithm. The purpose of the filter function is to filter out nodes that do not meet pod requirements. Usually, multiple nodes pass through the filter stage. The purpose of the prioriry function is to select the most suitable node among the remaining nodes.

Let’s introduce the default scheduling policies of K8S.

  • filterStrategy:
    • PodFitsResources: This policy filters out nodes whose remaining resources do not meet pod requirements
    • MatchNodeSelector: This policy filters out mismatched podsnodeSelectorRequired node
    • PodFitsHostPorts: This policy filters out POD declarationsHostPortA node that has been occupied
    • , etc.
  • priorityStrategy:
    • BalancedResourceAllocationThis policy expects CPU and memory usage to be balanced when pods are scheduled to a node.
    • LeastRequestedPriority: This policy is based on the free part of the node. The basic algorithm of the free part is (total capacity of the node – requests of existing PODS – pod requests to be scheduled)/total capacity of the node. The node with more free parts will get higher priority.
    • , etc.

K8S provides many scheduling policies, which we will not list here. K8S does not use all of these scheduling policies, but some of them by default. We can combine these policies through the K8S configuration file to achieve the most suitable scheduling policies for our application scenarios.

Scheduling extension

Before to K8S are explained in the paper will provide a lot of scheduling policy, they are largely K8S allows us to control the behavior of the scheduler, however K8S provide scheduling strategy also has its limitations, it can only according to the general standard such as CPU, memory usage to design scheduling policy, when we need more targeted application scheduling strategy, For example, when we want to schedule a POD to a node with higher network bandwidth and lower latency, the scheduling strategy provided by K8S is difficult to meet our requirements. Fortunately, K8S provides a K8S scheduling extension mechanism. K8S scheduling extension provides a method for users to extend scheduling policies. The K8S scheduling extension allows users to provide customized predicate/filter functions and priority/score functions for extended scheduling policies. The implementation of scheduling extension is that the user provides a Web service and notifies K8S of the URL of the Web service through the configuration file. K8S invokes the user-defined extension function provided by the user through the POST method of the RESTful API during the filtering and scoring phases of scheduling POD, so as to achieve the effect of extending THE SCHEDULING policy of K8S. The specific architecture is shown in the figure below:

  • In the K8S filtering phase, kube-Scheculer sendsPOST ${URL}/filterTo the Scheduler-Extender, the request contains data serialized in Json formatExtenderArgs, scheduler-Extender receives the request through local customizationfilterFunction to produce a filter resultExtenderFilterResultSerialized to Json format.
  • During the scoring phase, kube-Scheculer sendsPOST ${URL}/priorityTo the Scheduler-Extender, the request contains data serialized in Json formatExtenderArgs, scheduler-Extender receives the request through local customizationpriorityFunction to produce the score resultList<HostPriority>Serialized to Json format.

The data structure

As shown above, K8S interacts with the user scheduling extension by involving three data structures, which K8S defines using the GO language and serializes into Json format when sending over HTTP. These data structures need to be defined in Rust and serialized and deserialized in Rust scheduling extensions.

Scheduling extended request

This data structure is the scheduling extension request sent by K8S, and the structure of filter and Priority request is the same. The go language is defined as follows:

type ExtenderArgs struct {
	// Pod being scheduled
	Pod *v1.Pod
	// List of candidate nodes where the pod can be scheduled; to be populated
	// only if Extender.NodeCacheCapable == false
	Nodes *v1.NodeList
	// List of candidate node names where the pod can be scheduled; to be
	// populated only if Extender.NodeCacheCapable == true
	NodeNames *[]string
}
Copy the code
  • PodRepresents the pod that needs to be scheduled
  • NodesRepresents a list of candidate nodes
  • NodeNamesRepresents a list of candidate node names

Note that only one of Nodes and NodeNames will be filled in, so these two fields need to be defined as options in Rust. Rust is defined as follows:

#[derive(Clone, Debug, Serialize, Deserialize)]
struct ExtenderArgs {
    /// Pod being scheduled
    pub Pod: Pod,
    /// List of candidate nodes where the pod can be scheduled; to be populated
    /// only if Extender.NodeCacheCapable == false
    pub Nodes: Option<NodeList>,
    /// List of candidate node names where the pod can be scheduled; to be
    /// populated only if Extender.NodeCacheCapable == true
    pub NodeNames: Option<Vec<String>>,
}
Copy the code

filterReply to request

This data structure is the answer to the Predicate request and contains the list of nodes that passed the filter, the reason why the node failed, and error information. The go language is defined as follows:

type ExtenderFilterResult struct {
	// Filtered set of nodes where the pod can be scheduled; to be populated
	// only if Extender.NodeCacheCapable == false
	Nodes *v1.NodeList
	// Filtered set of nodes where the pod can be scheduled; to be populated
	// only if Extender.NodeCacheCapable == true
	NodeNames *[]string
	// Filtered out nodes where the pod can't be scheduled and the failure messages
	FailedNodes FailedNodesMap
	// Error message indicating failure
	Error string
}
Copy the code
  • NodesSaid byfilterThe node list of the function
  • NodeNamesSaid byfilterA list of node names for the function
  • FailedNodesIs a hashmap that was saved without passingfilterFunction node and did not pass the reason
  • ErrorsaidfilterThe cause of failure in a function procedure

In the same way, only one of Nodes and NodesNames is entered and must also be defined as Option. Rust is defined as follows:

#[derive(Clone, Debug, Serialize, Deserialize)]
struct ExtenderFilterResult {
    /// Filtered set of nodes where the pod can be scheduled; to be populated
    /// only if Extender.NodeCacheCapable == false
    pub Nodes: Option<NodeList>,
    /// Filtered set of nodes where the pod can be scheduled; to be populated
    /// only if Extender.NodeCacheCapable == true
    pub NodeNames: Option<Vec<String> >,/// Filtered out nodes where the pod can't be scheduled and the failure messages
    pub FailedNodes: HashMap<String.String>,
    /// Error message indicating failure
    pub Error: String,}Copy the code

priorityReply to request

Priority the response to the priority request is a list of hostprorities containing the node name and node score. Go is defined as follows:

type HostPriority struct {
	// Name of the host
	Host string
	// Score associated with the host
	Score int64
}
Copy the code
  • HostRepresents the name of a node
  • ScoresaidpriorityThe function assigns a score to the node

The corresponding Rust definition is as follows:

#[derive(Clone, Debug, Serialize, Deserialize)]
struct HostPriority {
    /// Name of the host
    pub Host: String./// Score associated with the host
    pub Score: i64,}Copy the code

The types used in the previous three data structures, such as Pod, NodeList, are defined in the K8S-OpenAPI library in a version of Rust that can be used by adding dependencies to Cargo. Toml

K8S-openapi = { version = "0.11.0", default-features = false, features = ["v1_19"]}Copy the code

This library automatically generates Rust definitions according to K8S OpenAPI definitions. It saves you the trouble of converting data structures defined by GO into Rust definitions. However, this library only contains core API data structures, ExtenderArgs, ExtenderFilterResult and HostPriority are extender apis, so you need to define them yourself. []string and nil are both valid values in GO language. Serialization corresponds to Json values [] and NULL respectively. A field can be null to indicate that it is optional. Fields of type []string need to be marked with +optional, and Rust definitions need to define options accordingly. Please refer to issue for detailed discussion. At present, this problem has been fixed in K8S.

Scheduling extended Web services

With the data structure, it is necessary to implement a Web service to handle the request initiated by K8S. The Web service Rust has a rich library to use, here uses a lightweight HTTP synchronization library (tiny-HTTP), the specific filter function and priority function implementation is related to the specific business logic. The code is as follows:

match *request.method() {
            Method::Post => match request.url() {
                "/filter" | "/prioritize"= > {let body = request.as_reader();
                    letargs: ExtenderArgs = try_or_return_err! ( request, serde_json::from_reader(body),"failed to parse request".to_string()
                    );

                    let response = if request.url() == "/filter"{ info! ("Receive filter");
                        // Process the filter request
                        let result = self.filter(args); try_or_return_err! ( request, serde_json::to_string(&result),"failed to serialize response".to_string()
                        )
                    } else{ info! ("Receive prioritize");
                        // Process 'priority' requests
                        letresult = Self::prioritize(&args); try_or_return_err! ( request, serde_json::to_string(&result),"failed to serialize response".to_string()
                        )
                    };
                    Ok(request.respond(Response::from_string(response))?) } _ => Self::empty_400(request), }, ... Omit}Copy the code

conclusion

Through the introduction of this paper, we have a basic understanding of K8S scheduling process, K8S scheduling algorithm, K8S scheduling extension mechanism. And we use Rust language to implement THE K8S scheduling extension, use Rust language to define the data structure of interaction between K8S and scheduling extension, and introduce the domain that needs to be defined as Option in Rust definition and related issues that need to be paid attention to.