As the TiDB Operator community grows, more and more developers are involved in TiDB Operator development. Currently, the development threshold for TiDB Operator is relatively high, requiring developers to read the TiDB Operator code in detail before they can get the full picture of the project. With this in mind, we want to provide a systematic overview of the TiDB Operator code in detail, as a guide for beginners and as a long-term reference manual. Through this series of articles, we hope to clear the way for TiDB Operator to understand and allow more ideas to sprout in the community.
Application scenarios and capability positioning of TiDB Operator
Understanding the application scenarios and focus of TiDB Operator processing helps you understand the functional boundaries of the TiDB Operator code.
The diagram above shows the architecture of TiDB Operator. Among them, TidbCluster, TidbMonitor, TidbInitializer, Backup, Restore, BackupSchedule, TidbClusterAutoScaler are supported by CustomResourceDefinition (CRD) User-defined resource. These CRDS describe the following information:
-
TidbCluster is used to describe the desired TiDB cluster
-
TidbMonitor is used to describe the TiDB cluster monitoring component that users expect
-
TidbInitializer is used to describe the desired TiDB cluster initialization Job
-
Backup Indicates the desired TiDB cluster Backup Job
-
Restore Describes the expected TiDB cluster recovery Job
-
BackupSchedule describes the expected periodic backup Job of the TiDB cluster
-
TidbClusterAutoScaler describes the automatic scaling rules for TiDB clusters that users expect
The orchestration and scheduling logic for TiDB clusters is handled by the following components:
-
Tidb-controller-manager is a set of custom controllers on Kubernetes. These controllers continuously compare the desired state recorded in the TidbCluster object with the actual state of the TiDB cluster, adjust the resources in Kubernetes to drive the TiDB cluster to meet the desired state, and complete the corresponding control logic according to other CRS.
-
Tidb-scheduler is a Kubernetes scheduler extension that injects scheduling logic unique to the TiDB cluster topology into the Kubernetes scheduler.
-
Tidb-admission – Webhook is a Kubernetes dynamic admission controller, which implements Pod, StatefulSet and other related resources modification, verification, operation and maintenance.
TiDB runs on Kubernetes with the help of native Kubernetes resource definitions such as Deployment, Statefulset, Service, PVC, ConfigMap, etc. TiDB is operated and maintained through the combination of these resources. With the help of the TiDB Operator, the user only needs to describe the specifications of the TiDB cluster, such as version, number of instances, etc., and does not need to consider how to use Kubernetes resources. Users can use YAML to deploy Tidbcluster CR and TidbMonitor CR. TiDB Operator will drive corresponding resources in Kubernetes to meet users’ expectations according to the configuration requirements of these CR objects. Finally, TiDB can run normally and provide normal service to the outside world under the condition of meeting user requirements.
In what way does TiDB Operator simplify user o&M operations? For example, the user needs three PD instances, but from a configuration perspective, the first PD instance needs to be initialized, and the second and third PD instances need to be added to the newly initialized instance. In this case, the startup parameters are –initial-cluster and –join. This configuration can then be automatically generated by the TiDB Operator.
Meanwhile, PD online upgrade needs to be realized through rolling update in operation and maintenance. Manual operation is cumbersome and difficult to ensure that online PD business will not be affected during the upgrade process. In Kubernetes, you need to use the Statefulsets updateStrategy. Partition option to control the progress of rolling updates and update PD instances one by one in combination with monitoring PD services. TiDB Operator can automatically migrate the Leader through PD API and monitor whether the updated Pod can function normally, so as to automate the online rolling update process. If these operations are completed manually, they are tedious and prone to error. However, we arrange the operation and maintenance logic of TiDB into TiDB Operator to help users simplify the OPERATION and maintenance process of TiDB.
From the point of view of implementation, TiDB Operator needs to have the ability to interact with two systems: one is to interact with Kubernetes, so that the resource configuration and operation of Kubernetes can meet the requirements of TiDB normal operation; The other is the API of TiDB component, that is, Operator needs to obtain the state changes inside the cluster from PD, complete the corresponding resource management of Kubernetes, and also be able to call the API of TiDB cluster to complete the operation and maintenance operations according to user requirements. When many partners integrate TiDB operation and maintenance capabilities on the existing Kubernetes operation and maintenance system, they hope to obtain an operation and maintenance capability to interact with the two systems from the perspective of TiDB system, and TiDB Operator successfully completes this task.
We also hope that this series of documents will help you understand the technical details of TiDB Operator so that you can integrate it into your business system.
Contents summary
We hope to discuss the following in the source code reading series:
-
Introduction to TiDB Operator – Discusses the problems that TiDB operators need to solve;
-
Operator mode – discusses code entry for TiDB operators, runs logic, and the Reconcile loop is triggered.
-
Reconcile Loop design for TiDB Operator components – Discusses the Reconcile Loop of TiDB components
General design of Loop and introduction to possible extension points;
-
Feature design of TiDB Operator – Discuss the design and implementation of features such as backup, auto-scaling, Webhook, Advanced Statefulset, TiDB Scheduler and monitoring;
-
Quality management for TiDB Operator – Discusses quality assurance measures for TiDB Operator, such as unit testing, E2E testing.
What can readers gain
We want to help you in the following scenarios:
-
Help you “know what it is and why it is”, understand the implementation behind the function, clear the cognitive blind spot of TiDB Operator, and improve your use experience;
-
When you want to contribute new features to the community, we want to help you find an entry point to research related issues and know where to start to modify or add features to meet your needs.
-
If you want to integrate TiDB operators into your Own Kubernetes-based operation and maintenance system, we hope to make it easy for you to integrate TiDB operators into your system by explaining how we manage Kubernetes resources and interact with TiDB.
-
TiDB Operator is a good example to use when learning about the Operator framework. At present, Kubernetes community has Operator frameworks such as Kubebuilder and Operator Framework, as well as controller runtime and other well-packaged controller runtime. These implementation methods are essentially using Kubernetes existing modules to encapsulate complex operation and maintenance logic. Understanding TiDB Operator’s execution logic will help you design a more powerful and easy-to-implement resource management system based on the Declaritor API and Kubernetes’ excellent implementation.
summary
In this article, we focus on the TiDB Operator issues that need to be addressed and the planning for the next series of articles. In the next series of articles, we’ll delve into TiDB Operator code design, This paper introduces the code structure of TiDB Operator, the operation logic of TiDB Operator, the implementation details of TiDB Operator functions, and the quality management of TiDB Operator. And TiDB Operator in Kubernetes Operator mode writing experience. You are welcome to interact with the TiDB Operator community via #sig-k8s or pingcap/tidb-operator.