DatenLord | Rust K8S scheduling extensions
Author: Pan Zheng
background
K8S scheduler (KUBE-Scheduler) is the controller of K8S scheduling POD, which runs as the core component of K8S. The scheduler finds suitable running nodes for pods based on certain scheduling policies.
Scheduling process
The K8S scheduler schedules a POD in three phases
- Filter phase: calls a series
predicate
The function orfilter
The pod filter function filters out nodes that are not suitable for the pod declaration, such as CPU, memory, etc., then the nodes will be filtered out at this stage. - Scoring phase: Calls a series
priority
The function orscore
The function scores and sorts the nodes that pass the filtering in the first step. For example, if there are three nodes that pass the filtering stage, when the three nodes pass the score of the resource balancing function, the nodes with more remaining CPU, memory and other resources will get higher priority score. - Scheduling phase: The POD is scheduled to select the node with the highest priority
The scheduling process is shown in the figure below:
Scheduling algorithm
As mentioned earlier, the K8S scheduler calls a series of filter functions and priority functions, which are K8S’s built-in scheduling algorithms, and K8S schedules pods to the most appropriate nodes based on the default scheduling algorithm. The purpose of the filter function is to filter out nodes that do not meet pod requirements. Usually, multiple nodes pass through the filter stage. The purpose of the prioriry function is to select the most suitable node among the remaining nodes.
Let’s introduce the default scheduling policies of K8S.
filter
Strategy:PodFitsResources
: This policy filters out nodes whose remaining resources do not meet pod requirementsMatchNodeSelector
: This policy filters out mismatched podsnodeSelector
Required nodePodFitsHostPorts
: This policy filters out POD declarationsHostPort
A node that has been occupied- , etc.
priority
Strategy:BalancedResourceAllocation
This policy expects CPU and memory usage to be balanced when pods are scheduled to a node.LeastRequestedPriority
: This policy is based on the free part of the node. The basic algorithm of the free part is (total capacity of the node – requests of existing PODS – pod requests to be scheduled)/total capacity of the node. The node with more free parts will get higher priority.- , etc.
K8S provides many scheduling policies, which we will not list here. K8S does not use all of these scheduling policies, but some of them by default. We can combine these policies through the K8S configuration file to achieve the most suitable scheduling policies for our application scenarios.
Scheduling extension
Before to K8S are explained in the paper will provide a lot of scheduling policy, they are largely K8S allows us to control the behavior of the scheduler, however K8S provide scheduling strategy also has its limitations, it can only according to the general standard such as CPU, memory usage to design scheduling policy, when we need more targeted application scheduling strategy, For example, when we want to schedule a POD to a node with higher network bandwidth and lower latency, the scheduling strategy provided by K8S is difficult to meet our requirements. Fortunately, K8S provides a K8S scheduling extension mechanism. K8S scheduling extension provides a method for users to extend scheduling policies. The K8S scheduling extension allows users to provide customized predicate/filter functions and priority/score functions for extended scheduling policies. The implementation of scheduling extension is that the user provides a Web service and notifies K8S of the URL of the Web service through the configuration file. K8S invokes the user-defined extension function provided by the user through the POST method of the RESTful API during the filtering and scoring phases of scheduling POD, so as to achieve the effect of extending THE SCHEDULING policy of K8S. The specific architecture is shown in the figure below:
- In the K8S filtering phase, kube-Scheculer sends
POST ${URL}/filter
To the Scheduler-Extender, the request contains data serialized in Json formatExtenderArgs
, scheduler-Extender receives the request through local customizationfilter
Function to produce a filter resultExtenderFilterResult
Serialized to Json format. - During the scoring phase, kube-Scheculer sends
POST ${URL}/priority
To the Scheduler-Extender, the request contains data serialized in Json formatExtenderArgs
, scheduler-Extender receives the request through local customizationpriority
Function to produce the score resultList<HostPriority>
Serialized to Json format.
The data structure
As shown above, K8S interacts with the user scheduling extension by involving three data structures, which K8S defines using the GO language and serializes into Json format when sending over HTTP. These data structures need to be defined in Rust and serialized and deserialized in Rust scheduling extensions.
Scheduling extended request
This data structure is the scheduling extension request sent by K8S, and the structure of filter and Priority request is the same. The go language is defined as follows:
type ExtenderArgs struct {
// Pod being scheduled
Pod *v1.Pod
// List of candidate nodes where the pod can be scheduled; to be populated
// only if Extender.NodeCacheCapable == false
Nodes *v1.NodeList
// List of candidate node names where the pod can be scheduled; to be
// populated only if Extender.NodeCacheCapable == true
NodeNames *[]string
}
Copy the code
Pod
Represents the pod that needs to be scheduledNodes
Represents a list of candidate nodesNodeNames
Represents a list of candidate node names
Note that only one of Nodes and NodeNames will be filled in, so these two fields need to be defined as options in Rust. Rust is defined as follows:
#[derive(Clone, Debug, Serialize, Deserialize)]
struct ExtenderArgs {
/// Pod being scheduled
pub Pod: Pod,
/// List of candidate nodes where the pod can be scheduled; to be populated
/// only if Extender.NodeCacheCapable == false
pub Nodes: Option<NodeList>,
/// List of candidate node names where the pod can be scheduled; to be
/// populated only if Extender.NodeCacheCapable == true
pub NodeNames: Option<Vec<String>>,
}
Copy the code
filter
Reply to request
This data structure is the answer to the Predicate request and contains the list of nodes that passed the filter, the reason why the node failed, and error information. The go language is defined as follows:
type ExtenderFilterResult struct {
// Filtered set of nodes where the pod can be scheduled; to be populated
// only if Extender.NodeCacheCapable == false
Nodes *v1.NodeList
// Filtered set of nodes where the pod can be scheduled; to be populated
// only if Extender.NodeCacheCapable == true
NodeNames *[]string
// Filtered out nodes where the pod can't be scheduled and the failure messages
FailedNodes FailedNodesMap
// Error message indicating failure
Error string
}
Copy the code
Nodes
Said byfilter
The node list of the functionNodeNames
Said byfilter
A list of node names for the functionFailedNodes
Is a hashmap that was saved without passingfilter
Function node and did not pass the reasonError
saidfilter
The cause of failure in a function procedure
In the same way, only one of Nodes and NodesNames is entered and must also be defined as Option. Rust is defined as follows:
#[derive(Clone, Debug, Serialize, Deserialize)]
struct ExtenderFilterResult {
/// Filtered set of nodes where the pod can be scheduled; to be populated
/// only if Extender.NodeCacheCapable == false
pub Nodes: Option<NodeList>,
/// Filtered set of nodes where the pod can be scheduled; to be populated
/// only if Extender.NodeCacheCapable == true
pub NodeNames: Option<Vec<String> >,/// Filtered out nodes where the pod can't be scheduled and the failure messages
pub FailedNodes: HashMap<String.String>,
/// Error message indicating failure
pub Error: String,}Copy the code
priority
Reply to request
Priority the response to the priority request is a list of hostprorities containing the node name and node score. Go is defined as follows:
type HostPriority struct {
// Name of the host
Host string
// Score associated with the host
Score int64
}
Copy the code
Host
Represents the name of a nodeScore
saidpriority
The function assigns a score to the node
The corresponding Rust definition is as follows:
#[derive(Clone, Debug, Serialize, Deserialize)]
struct HostPriority {
/// Name of the host
pub Host: String./// Score associated with the host
pub Score: i64,}Copy the code
The types used in the previous three data structures, such as Pod, NodeList, are defined in the K8S-OpenAPI library in a version of Rust that can be used by adding dependencies to Cargo. Toml
K8S-openapi = { version = "0.11.0", default-features = false, features = ["v1_19"]}Copy the code
This library automatically generates Rust definitions according to K8S OpenAPI definitions. It saves you the trouble of converting data structures defined by GO into Rust definitions. However, this library only contains core API data structures, ExtenderArgs, ExtenderFilterResult and HostPriority are extender apis, so you need to define them yourself. []string and nil are both valid values in GO language. Serialization corresponds to Json values [] and NULL respectively. A field can be null to indicate that it is optional. Fields of type []string need to be marked with +optional, and Rust definitions need to define options accordingly. Please refer to issue for detailed discussion. At present, this problem has been fixed in K8S.
Scheduling extended Web services
With the data structure, it is necessary to implement a Web service to handle the request initiated by K8S. The Web service Rust has a rich library to use, here uses a lightweight HTTP synchronization library (tiny-HTTP), the specific filter function and priority function implementation is related to the specific business logic. The code is as follows:
match *request.method() {
Method::Post => match request.url() {
"/filter" | "/prioritize"= > {let body = request.as_reader();
letargs: ExtenderArgs = try_or_return_err! ( request, serde_json::from_reader(body),"failed to parse request".to_string()
);
let response = if request.url() == "/filter"{ info! ("Receive filter");
// Process the filter request
let result = self.filter(args); try_or_return_err! ( request, serde_json::to_string(&result),"failed to serialize response".to_string()
)
} else{ info! ("Receive prioritize");
// Process 'priority' requests
letresult = Self::prioritize(&args); try_or_return_err! ( request, serde_json::to_string(&result),"failed to serialize response".to_string()
)
};
Ok(request.respond(Response::from_string(response))?) } _ => Self::empty_400(request), }, ... Omit}Copy the code
conclusion
Through the introduction of this paper, we have a basic understanding of K8S scheduling process, K8S scheduling algorithm, K8S scheduling extension mechanism. And we use Rust language to implement THE K8S scheduling extension, use Rust language to define the data structure of interaction between K8S and scheduling extension, and introduce the domain that needs to be defined as Option in Rust definition and related issues that need to be paid attention to.