This is the sixth day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

Introduction to the

The most important base resource type in K8s is POD. Other advanced usage patterns are based on POD. If we want to get a better understanding of K8s, we need to take a look at how POD was created. The easiest way to Create a POD is through the kubectl run command. According to the official instructions, “Create and run a particular image in a pod”, which means to Create and run a particular image in a pod.

Create and run a particular image in a pod.

Examples:

Start a nginx pod.

kubectl run nginx –image=nginx

We will go from the moment we hit Enter to the moment we can see what happens behind the scenes when the POD is created and started successfully. From the macro level, the whole story takes place in three sites, one is local, the protagonist is Of course Kubectl, one is the remote K8s cluster master node, the protagonist is Kube-Apiserver, ETCD and scheduler, the other is the K8s cluster work node, the protagonist is Kubelet.

First let’s take a look at the overall process, what is done in each section, and then let’s look at the code implementation of each section one by one. Because K8s core code has more than two million lines, but also the use of a lot of advanced design patterns, this article will try to interpret this excellent code, a statement may have many fallacies, improper, please point out, we can discuss learning together.

The following code is analyzed based on K8s official release V1.20.2.

Overall flow chart

From the overall specific point of view, Kubectl is responsible for receiving user input, do preliminary processing, according to kube-Apiserver processing requirements to generate specific requests; Kube-apiserver is the entry point for all requests. It is responsible for the general check and distribution of all requests, as well as the query of external resource status. Kubelet is responsible for the life cycle management of specific PODS and the reporting of the overall status data of nodes.

Kubernetes is implemented based on the event + controller pattern, so there is no code in the code that runs through the POD creation control process. Involves the components of the relay is more like a team player, after finish her job, is handing over the bar (updated information/trigger some events), the next player to choose according to their own interest in events to do (such as watch to monitor), and to not share herself well, and then hand over to trigger some events (updated information /).

History of the kubectl run command

In earlier versions, the run subcommand created not only pods, but also jobs (including cronJobs), Deployment, and replication Controllers

The kubectl run command has been greatly simplified in v1.20.2, with the generator option removed and only basic pods can be created. Some details have also been improved, for example, the dry-run option has been added to distinguish between client and server, and the control is more subtle.

kubectl

Client parameters check validation and object generation

Code entry:

vendor/k8s.io/kubectl/pkg/cmd/run/run.go#L246

Parameter checking and verification

This operation includes verifying the image name and image pull policy

// vendor/k8s.io/kubectl/pkg/cmd/run/run.go#L276 
imageName := o.Image 
if imageName == "" { 
  return fmt.Errorf("--image is required") 
} 
validImageRef := reference.ReferenceRegexp.MatchString(imageName)
if! validImageRef {return fmt.Errorf("Invalid image name %q: %v", imageName, reference.ErrReferenceInvalidFormat)   
}

// vendor/k8s.io/kubectl/pkg/cmd/run/run.go#L310 
iferr := verifyImagePullPolicy(cmd); err ! =nil {
  return err
}
Copy the code

Object to generate

Get the POD default generator

// vendor/k8s.io/kubectl/pkg/cmd/run/run.go#L314 
generators := generateversioned.GeneratorFn("run") 
// Only one pod generator is left, including job, Deployment, etc. See kubectl run command history section
generator, found := generators[generateversioned.RunPodV1GeneratorName] // "run-pod/v1" 
if! found {return cmdutil.UsageErrorf(cmd, "generator %q not found", o.Generator)   
} 
// vendor/k8s.io/kubectl/pkg/generate/versioned/generator.go#L94 
case "run": The default generator registered under the run subcommand
	generator = map[string]generate.Generator{ RunPodV1GeneratorName: BasicPod{},     
Copy the code

Generate runtime objects

// vendor/k8s.io/kubectl/pkg/cmd/run/run.go#L330 
var createdObjects = []*RunObject{} 
runObject, err := o.createGeneratedObject(f, cmd, generator, names, params, cmdutil.GetFlagString(cmd, "overrides")) // Start creating runtime objects
iferr ! =nil { 
  return err 
} 
createdObjects = append(createdObjects, runObject) 
// vendor/k8s.io/kubectl/pkg/cmd/run/run.go#L616 
func (o *RunOptions) createGeneratedObject(f cmdutil.Factory, cmd *cobra.Command, generator generate.Generator, names []generate.GeneratorParam, params map[string]interface{}, overrides string) (*RunObject, error) { 
  // Validate generator parameters
  err := generate.ValidateParams(names, params)   
  Obj, err := generator.Generate(params)
  // API grouping and versioning
  mapper, err := f.ToRESTMapper() 
  // run has compiled knowledge of the thing is creating 
  gvks, _, err := scheme.Scheme.ObjectKinds(obj) 
  mapping, err := mapper.RESTMapping(gvks[0].GroupKind(), gvks[0].Version)  
  ifo.DryRunStrategy ! = cmdutil.DryRunClient {// Client instance build
    client, err := f.ClientForMapping(mapping) 
    // How is f instantiated? // Send HTTP requests
    actualObj, err = resource. NewHelper(client, mapping). DryRun(o.DryRunStrategy == cmdutil.DryRunServer). // Dynamically configure server Side Dry Run
    WithFieldManager(o.fieldManager). // Update the manager
    Create(o.Namespace, false, obj)  
  } 
}
Copy the code

Discovery and negotiation about API groups and version

Kubernetes uses apis with version numbers and are divided into API groups. An API group is a collection of apis that operate on similar resources. Kubernetes generally supports multiple versions of API groups. In order to find the most suitable API, Kubectl only needs to obtain the Schema documents exposed by Kube-API through the discovery mechanism (via the OpenAPI format). To improve performance, kubectl caches these schema files locally in the ~/. Kube /cache/discovery directory.

Processing return results

After receiving the return value of apI-server, proceed with subsequent processing. Output the created object in the correct format.

// vendor/k8s.io/kubectl/pkg/cmd/run/run.go#L430 
ifrunObject ! =nil { 
	iferr := o.PrintObj(runObject.Object); err ! =nil { 
		return err 
	}
}
Copy the code

Client authentication support

In order to ensure that the request is sent successfully, Kubectl needs to be able to authenticate. User credentials are almost always stored in a KubeconFig file on your local disk. To locate the file, Kubectl loads it by following these steps

  1. If –kubeconfig specifies a file, use it
  2. If the $KUBECONFIG environment variable is defined, the file to which the environment variable points is used
  3. In the local home directory, such as ~/.kube, search for and use the first file found

After the file is parsed, Kubectl can determine the current context of use, the cluster to which it points, and the authentication information associated with the current user.

kube-apiserver

certification

Kube-apiserver is the primary interface between clients and system components for persistence and querying cluster status. First and foremost, Kube-Apiserver needs to know who initiated the request.

How does Apiserver authenticate requests? When the service is first started, it looks at all the command-line arguments provided by the user and assembles a list of appropriate authenticators. Each request is checked one by one by the authenticator until one is approved.

// vendor/k8s.io/apiserver/pkg/authentication/request/union/union.go#L53 
// AuthenticateRequest authenticates the request using a chain of authenticator.Request objects. 
func (authHandler *unionAuthRequestHandler) AuthenticateRequest(req *http.Request) (*authenticator.Response, bool, error) { 
  var errlist []error 
  for _, currAuthRequestHandler := range authHandler.Handlers { 
    resp, ok, err := currAuthRequestHandler.AuthenticateRequest(req) 
    iferr ! =nil { 
      if authHandler.FailOnError {
        return resp, ok, err
      }
      errlist = append(errlist, err) continue
    }
    if ok {
      return resp, ok, err
    }
  }
  return nil.false, utilerrors.NewAggregate(errlist)
}
Copy the code

Initialization process of the authenticator

// pkg/kubeapiserver/authenticator/config.go#L95
// New returns an authenticator.Request or an error that supports the standard
// Kubernetes authentication mechanisms.
func (config Config) New(a) (authenticator.Request, *spec.SecurityDefinitions, error) {
  var authenticators []authenticator.Request
  authenticator := union.New(authenticators...)
  // Various initializations
  authenticator = group.NewAuthenticatedGroupAdder(authenticator)
  if config.Anonymous {
    // If the authenticator chain returns an error, return an error (don't consider a bad bearer token
    // or invalid username/password combination anonymous).
    authenticator = union.NewFailOnError(authenticator, anonymous.NewAuthenticator())
  }
  return authenticator, &securityDefinitions, nil
}
Copy the code

As shown in the figure below, assuming all authenticators are enabled, when a client sends a request to the Kube-Apiserver service, the request goes to the Authentication Handler function. In the Authentication Handler function, it iterates through the list of enabled authenticators and tries to execute each one. If one of the authenticators returns true, the Authentication succeeds. Otherwise, it continues to try the next authenticator.

When all authenticators fail to authenticate, the request is rejected and the combined error is returned to the client. If the authentication is successful, the Authorization header information is removed from the request and the user information is added to the request context information. This allows subsequent steps to access the information of the requesting user identified during the authentication phase.

authentication

Now that Kube-Apiserver has successfully authenticated the requester’s identity, you need to ensure that the requester has permission to do so before proceeding to the next step. Authentication and authentication are not the same thing, and to take it further kube-Apiserver needs to authenticate us.

Similar to the way authenticators are handled, Kube-Apiserver needs to assemble a list of appropriate authenticators to handle each request, based on the command-line arguments provided by the user. When all authenticators reject the request, the request terminates and the requester gets Forbidden’s reply. If either authenticator approves the request, the authentication succeeds and the request is processed in the next stage.

The authenticator is initialized. Procedure

Kube-apiserver currently provides six authorization mechanisms, namely AlwaysAllow, AlwaysDeny, ABAC, Webhook, RBAC and Node. You can set the authorization mechanism by specifying the –authorization-mode parameter, at least one of which needs to be specified.

// pkg/kubeapiserver/authorizer/config.go#L71
// New returns the right sort of union of multiple authorizer.Authorizer objects // based on the authorizationMode or an  error.
func (config Config) New(a) (authorizer.Authorizer, authorizer.RuleResolver, error) {
  if len(config.AuthorizationModes) == 0 {
    return nil.nil, fmt.Errorf("at least one authorization mode must be passed")}}Copy the code

Authentication decision status

// vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go#L149
type Decision int const
( 
  // DecisionDeny means that an authorizer decided to deny the action.
  DecisionDeny
  Decision = iota
  // DecisionAllow means that an authorizer decided to allow the action.
  DecisionAllow
  // DecisionNoOpionion means that an authorizer has no opinion on whether
  // to allow or deny an action.
  DecisionNoOpinion
)
Copy the code

When the decision status is DecisionDeny or DecisionNoOpinion, the decision will be processed by the next authenticator. If there is no next authenticator, the authentication fails. When the decision status is DecisionAllow, the authentication succeeds and the request is accepted.

Admission control

After authentication and authorization, and before the object is persisted, kube-Apiserver intercepts the request to perform custom operations on the requested resource object (verify, modify, or reject the request). Why is there a link? To ensure cluster stability, before resource objects are officially accepted, other components in the system need to perform a series of checks on created resources to ensure that they comply with the expectations and rules of the entire cluster. This is the last safeguard before etCD creates resources.

Plug-in implementation interface

// vendor/k8s.io/apiserver/pkg/admission/interfaces.go#L123
// Interface is an abstract, pluggable interface for Admission Control decisions.
type Interface interface {
	// Handles returns true if this admission controller can handle the given operation
	// where operation can be one of CREATE, UPDATE, DELETE, or CONNECT
	Handles(operation Operation) bool
}

type MutationInterface interface {
	Interface

	// Admit makes an admission decision based on the request attributes.
	// Context is used only for timeout/deadline/cancellation and tracing information.
	Admit(ctx context.Context, a Attributes, o ObjectInterfaces) (err error)
}

// ValidationInterface is an abstract, pluggable interface for Admission Control decisions.
type ValidationInterface interface {
	Interface

	// Validate makes an admission decision based on the request attributes. It is NOT allowed to mutate
	// Context is used only for timeout/deadline/cancellation and tracing information.
	Validate(ctx context.Context, a Attributes, o ObjectInterfaces) (err error)
}
Copy the code

Two types of access controllers

  1. The Mutating Admission Controller is used to modify information about resource objects submitted by users.
  2. The Validating Admission Controller is used for authentication and validates user-submitted resource object information.

Change access controller runs before verifying access controller.

The operation mode of an access controller is similar to that of authentication and authentication. The difference is that when any access controller fails, the entire access control process ends and the request fails.

The access controller runs in the kube-Apiserver process as a plug-in. The advantage of plug-in is that plug-ins can be extended and specified plug-ins can be individually enabled/disabled, or each access controller can be called an access controller plug-in.

When a client initiates a request and passes through the access controller list, if one access controller rejects the request, the request is rejected (HTTP 403Forbidden) and an error is returned to the client.

ETCD storage

After authentication, authentication, and access control checks, Kube-Apiserver deserializes the HTTP request (decoded), constructs a Runtime object, and persists it to ETCD.

Horizontal scaling

How does Kube-Apiserver know what to do with an operation on a particular resource? This can lead to very complex configuration steps when the service is first started, so let’s take a quick look:

  1. When Kube-Apiserver is started, a server chain is created to allow apiserVer aggregation, which is the basic way to provide multiple Apiservers
// cmd/kube-apiserver/app/server.go#L184
server, err := CreateServerChain(completeOptions, stopCh)
Copy the code
  1. A generic Apiserver is created as the default implementation
// cmd/kube-apiserver/app/server.go#L215
apiExtensionsServer, err := createAPIExtensionsServer(apiExtensionsConfig, genericapiserver.NewEmptyDelegate())
Copy the code
  1. The generated OpenAPI information (schema) is populated into the apiserver configuration
// cmd/kube-apiserver/app/server.go#L477
	genericConfig.OpenAPIConfig = genericapiserver.DefaultOpenAPIConfig(generatedopenapi.GetOpenAPIDefinitions, openapinamer.NewDefinitionNamer(legacyscheme.Scheme, extensionsapiserver.Scheme, aggregatorscheme.Scheme))


Copy the code
  1. Kube-apiserver configures a storage service provider for each API group, which acts as a proxy for Kube-Apiserver to access and modify resource state.
// pkg/controlplane/instance.go#L591
for _, restStorageBuilder := range restStorageProviders {
		groupName := restStorageBuilder.GroupName()
		if! apiResourceConfigSource.AnyVersionForGroupEnabled(groupName) { klog.V(1).Infof("Skipping disabled API group %q.", groupName)
			continue
		}
		apiGroupInfo, enabled, err := restStorageBuilder.NewRESTStorage(apiResourceConfigSource, restOptionsGetter)
		iferr ! =nil {
			return fmt.Errorf("problem initializing API group %q : %v", groupName, err)
		}
		if! enabled { klog.Warningf("API group %q is not enabled, skipping.", groupName)
			continue}}Copy the code
  1. Add REST route mapping information for each API group with a different version. This runs kube-Apiserver to map the request to the correct agent that was matched.
// vendor/k8s.io/apiserver/pkg/server/genericapiserver.go#L439
r, err := apiGroupVersion.InstallREST(s.Handler.GoRestfulContainer)
Copy the code
  1. In our particular scenario, the POST handler will be registered, which will broker the creation of the resource
// vendor/k8s.io/apiserver/pkg/endpoints/installer.go#816
case "POST": // Create a resource.
			var handler restful.RouteFunction
			if isNamedCreater {
				handler = restfulCreateNamedResource(namedCreater, reqScope, admit)
			} else {
				handler = restfulCreateResource(creater, reqScope, admit)
      }
Copy the code

To recap, kube-Apiserver has now completed the mapping configuration of the route to the internal resource operation agent, and when the request matches, the specified operation agent can be triggered.

Pod storage process

Let’s continue with the pod creation process:

  1. Based on the registered routing information, when a request matches one of the processor chains, it is forwarded to that processor for processing. If there is no matching processor, it is returned to the path-based processor for processing. But if no path handler is registered, a 404 error message is returned by the NotFound handler.
  2. Luckily, we’ve already registered createHandler. What does it do? First, it decodes the HTTP request body and performs basic validation, such as providing json that meets the requirements of the relevant version API resource
  3. Conduct audits and final access checks
  4. Resources are stored in ETCD through a storage proxy. Typically, the etCD key is of the following format: /, which can be modified through configuration
Create(ctx context.Context, name string, obj runtime.Object, createValidation ValidateObjectFunc, options *metav1.CreateOptions) (runtime.Object, error)
Copy the code
  1. Check to see if there are any errors. If there are no errors, the storage agent makes a get call to ensure that the resource object is actually created. It then triggers the post-create handler and other decorators that require additional requirements.
  2. Construct the HTTP request to return the content and send it
// vendor/k8s.io/apiserver/pkg/endpoints/handlers/create.go#L49
func createHandler(r rest.NamedCreater, scope *RequestScope, admit admission.Interface, includeName bool) http.HandlerFunc {
		gv := scope.Kind.GroupVersion()
		s, err := negotiation.NegotiateInputSerializer(req, false, scope.Serializer)
		iferr ! =nil {
			scope.err(err, w, req)
			return
		}

		// This decoder is the key to body parsing
		decoder := scope.Serializer.DecoderToVersion(s.Serializer, scope.HubGroupVersion)
		

		// Read package body: serialized runtime
		body, err := limitedReadBody(req, scope.MaxRequestBodyBytes)
		iferr ! =nil {
			scope.err(err, w, req)
			return
        }
		// Check the creation parameters

		// Start decoding conversion
		defaultGVK := scope.Kind
		original := r.New()
		trace.Step("About to convert to expected version")
		obj, gvk, err := decoder.Decode(body, &defaultGVK, original)
		/ / to omit
		trace.Step("Conversion done")

		// admission
		
		// Start storing etCD
		trace.Step("About to store object in database")
		admissionAttributes := admission.NewAttributesRecord(obj, nil, scope.Kind, namespace, name, scope.Resource, scope.Subresource, admission.Create, options, dryrun.IsDryRun(options.DryRun), userInfo)
		requestFunc := func(a) (runtime.Object, error) {
			return r.Create(
				ctx,
				name,
				obj,
				rest.AdmissionToValidateObjectFunc(admit, admissionAttributes, scope),
				options,
			)
		}
		/ / to omit
		trace.Step("Object stored in database")

		// Construct HTTP to return the result
		code := http.StatusCreated
		status, ok := result.(*metav1.Status)
		if ok && err == nil && status.Code == 0 {
			status.Code = int32(code)
		}

		transformResponseObject(ctx, scope, trace, req, w, code, outputMediaType, result)
  // At this point, the entire HTTP request to create the POD is returned, along with the created object
Copy the code

conclusion

So far, kube-Apiserver has done a lot of work. The POD resource is stored in etCD, but it is not visible to the outside world. For reasons of length, we will continue to explain pod scheduling and actual creation in subsequent articles.

Reference documentation

books

Kubernetes source code analysis zheng Dongxu

article

Kubernetes ditches Docker context

Kubelet creates the POD workflow

Kubectl Run v1.14 creates pod flows