In the previous article, we introduced the design and implementation of the Controller Manager for TiDB Operator and learned how each Controller accepts and handles changes. In this article, we will discuss the component’s Controller implementation. The TiDBCluster Controller is responsible for the lifecycle management of the TiDB’s major components, and we will use this as an example to introduce the choreography of the component control cycle. We will learn how the various control loop events are choreographed and what resource management operations are performed during the lifecycle management of the TiDB cluster. As you read, understand the general process and definition of these efforts, and we will detail how each component fits the following paradigm in the next article.
The component controls the invocation of the loop
Code is introduced in the previous article, we mentioned the TiDBCluster updateTidbCluster function of the controller, is located in the PKG/controller/TiDBCluster/tidb_cluster_control. Go, It is an entry point for TiDB component lifecycle management and invokes a series of lifecycle management functions. Ignoring the comments, we can see that the updateTidbCluster function calls the following functions in sequence:
-
c.reclaimPolicyManager.Sync(tc)
-
c.orphanPodsCleaner.Clean(tc)
-
c.discoveryManager.Reconcile(tc)
-
c.ticdcMemberManager.Sync(tc)
-
c.tiflashMemberManager.Sync(tc)
-
c.pdMemberManager.Sync(tc)
-
c.tikvMemberManager.Sync(tc)
-
c.pumpMemberManager.Sync(tc)
-
c.tidbMemberManager.Sync(tc)
-
c.metaManager.Sync(tc)
-
c.pvcCleaner.Clean(tc)
-
c.pvcResizer.Resize(tc)
-
c.tidbClusterStatusManager.Sync(tc)
These functions can be divided into two categories. One is the realization of the control loop of the perspective organization of TiDB components, such as PD, TiDB, TiKV, TiFlash, TiCDC, Pump, Discovery, Another is managing Kubernetes resources used by TiDB components and life-cycle management operations around other components, such as PV’s ReclaimPolicy maintenance, OrphanPod cleanup, Kubernetes resource Meta information maintenance, PVC cleaning and expansion, TiDBCluster object state management, etc.
Lifecycle management process for TiDB components
The code for TiDB’s main component control loop is distributed in the PKG /manager/member directory under files ending in _member_manager.go, such as pd_member_Manager. go, which in turn reference files such as _scaler.go, _upgrader.go files that contain implementations of scale-down and upgrading-related features. From the _member_manager.go file for each component, we can extract the following general implementations:
// Sync Service
iferr := m.syncServiceForTidbCluster(tc); err ! =nil {
return err
}
// Sync Headless Service
iferr := m.syncHeadlessServiceForTidbCluster(tc); err ! =nil {
return err
}
// Sync StatefulSet
return syncStatefulSetForTidbCluster(tc)
func syncStatefulSetForTidbCluster(tc *v1alpha1.TidbCluster) error {
iferr := m.syncTidbClusterStatus(tc, oldSet); err ! =nil {
klog.Errorf("failed to sync TidbCluster: [%s/%s]'s status, error: %v", ns, tcName, err)
}
if tc.Spec.Paused {
klog.V(4).Infof("tidb cluster %s/%s is paused, skip syncing for statefulset", tc.GetNamespace(), tc.GetName())
return nil
}
cm, err := m.syncConfigMap(tc, oldSet)
newSet, err := getnewSetForTidbCluster(tc, cm)
iferr := m.scaler.Scale(tc, oldSet, newSet); err ! =nil {
return err
}
iferr := m.failover.Failover(tc); err ! =nil {
return err
}
iferr := m.upgrader.Upgrade(tc, oldSet, newSet); err ! =nil {
return err
}
return UpdateStatefulSet(m.deps.StatefulSetControl, tc, newSet, oldSet)
}
Copy the code
Synchronizing a Service creates or synchronizes a Service resource for a component. Synchronizing Statefulset includes the following tasks:
-
Synchronize component Status;
-
Check whether TiDBCluster has stopped synchronization.
-
Synchronous ConfigMap;
-
Generate a new Statefulset according to the TidbCluster configuration, and roll update, expand and shrink the new Statefulset, and handle the related logic of Failover;
-
Create or update Statefulset;
The control loop for the component executes over the previous work loops to keep the component up to date. The following describes the specific tasks of these tasks in TiDB Operator.
Synchronization Service
Generally, the Reconcile part of Service starts with component Reconcile, which creates and synchronizes services used by components, such as Cluster1-pd and Cluster1-PD-peer. In the control loop function, can call getNewServiceForTidbCluster function, through Tidbcluster CR record information in the template to create a new Service, if the Service does not exist, create the Service directly, otherwise, Determine whether to update the Service object by comparing the new and old Service specs.
TiDB components use services, including Service and Headless Serivce, to provide access to components. The Service address can be used when a component does not need to know which instance is communicating with it and can be accessed by load balancing, such as TiKV, TiDB, etc., to access PD. When components need to distinguish which Pod provides services, they need to communicate with Pod DNS. For example, when TiKV starts, it will expose its Pod DNS as Advertise Address. Other pods can access themselves through this Pod DNS.
Sync Status
After services are synchronized, components are connected to the cluster network and can be accessed within the cluster. Control loop will enter syncStatefulSetForTidbCluster, started to Reconcile Statefulset, first is to use syncTidbClusterStatus Status information of components to synchronize, Follow-up operations such as capacity expansion, Failover, and upgrade depend on the information in Status.
Synchronization Status is a key operation for the TiDB Operator. It needs to synchronize the information of each component of Kubernetes with the internal information of the TiDB cluster. For example, in Kubernetes, this operation will synchronize the number of copies of the cluster, update the Status, To check whether Statefulset is in an updated state. In terms of TiDB cluster information, the TiDB Operator also needs to synchronize information inside the TiDB cluster from PD. For example, PD member information, TiKV storage information, TiDB member information, etc., TiDB cluster health check operation is completed in the update Status operation.
Check whether synchronization of TiDBCluster is suspended
After updating the state, tC.spec. Paused is used to determine whether the cluster is in the Paused synchronization state. If Paused, the following updating of Statefulset will be skipped.
Synchronous ConfigMap
After Status is synchronized, the syncConfigMap function updates ConfigMap, which typically contains the component’s configuration file and startup script. The configuration file is extracted from the Config item of the Spec in YAML. Currently, TOML transparent transmission and YAML transformation are supported, and THE TOML format is recommended. The startup script contains the startup parameters needed to get the component, and then starts the component process with the obtained startup parameters. In the TiDB Operator, when the component needs to obtain startup parameters from the TiDB Operator during startup, information processing on the TiDB Operator side will be carried out by the Discovery component. If PD gets parameters to decide whether to initialize or join a node, it accesses Discovery using wget to get the parameters it needs. This method of obtaining parameters in the startup script avoids the unexpected rolling update caused by updating Statefulset, which affects online services.
Generate new Statefulset
GetNewPDSetForTidbCluster function will get a new Statefulset template, it contains to generate Service, just ConfigMap reference, and according to the latest Status and Spec generate other items, such as env, Container, volume, etc., this new Statefulset needs to be sent to the following three processes for rolling update, capacity expansion and Failover processing. Finally, the newly generated Statefulset is sent to the UpdateStatefulSet function to determine whether it needs to be actually updated to the component’s corresponding Statefulset.
New Statefulset tooling (I): Rolling update
M.u pgrader. Upgrade function responsible for rolling update related operations, mainly in the update Statefulset UpgradeStrategy. Type and UpgradeStrategy Partition, Rolling updates are implemented with the RollingUpdate policy of Statefulset. Component Reconcile sets the upgrade policy of Statefulset to RollingUpdate, i.e. component Statefulset upgradestrategy. Type is set to RollingUpdate. Can be configured in Kubernetes Statefulset use UpgradeStrategy. Partition control rolling update progress, namely Statefulset will only update the serial number is greater than or equal to the value of the Partition, And not updated Pod. Through this mechanism, each Pod can continue rolling updates after normal external service. In the upgrade status or upgrade the startup phase, components of Reconcile will Statefulset UpgradeStrategy. Partition is set to the largest of Statefulset Pod number, prevent a Pod updates. After the initial update, when a Pod is updated and the external service is provided after the restart, for example, the Store status of TiKV becomes Up, and the Member status of TiDB is healthy, the Pod meeting such conditions is regarded as a successful upgrade. Then dial down the Partition value for the next Pod update.
Fabrication of new Statefulset (II): Expansion and shrinkage
The m.scaler.Scale function is responsible for scaling and scaling operations, mainly updating Replicas of Statefulset components. Scale Follows the principle of expanding and shrinking the capacity one by one. The span of each expansion is 1. The Scale function compares the Replicas of the specified components in the TiDBCluster, such as TC.spec.pd. Replicas, with the existing ones. First, it determines whether the Replicas need to be increased or reduced. Complete all expansion and contraction requirements by reconciling them multiple times. In the process of capacity expansion and contraction, PD will transfer Leader, TiKV will delete Store and other PD API operations will be involved. PD API will be used to complete the above operations in the component Reconcile process, and judge whether the operation is successful, and then the next capacity expansion and contraction will be carried out step by step.
New Statefulset processing (iii): Failover
The m.failover.Failover function is responsible for disaster recovery operations, including discovering and recording disaster states and recovering disaster states. After AutoFailover is enabled when TiDB Operator is deployed, When a Failure component is found, record the information to the key value of the status store such as FailureStores or FailureMembers, and start a new component Pod to take over the Pod workload. After the original Pod is restored, the number of Replicas in Statefulset is changed to reduce the capacity of the Pod used to share the workload during Dr. But in disaster TiKV/TiFlash logic, automatically in the process of disaster reduction capacity Pod is not the default action, you need to set up the spec. TiKV. RecoverFailover: true Pod shrinkage capacity for new start.
Update with the new Statefulset
In the final phase of synchronizing Statefulset, when a new Statefulset has been generated, the UpdateStatefulSet function is used to compare the differences between the new Statefulset and the existing Statefulset. If not, the actual update of Statefulset is performed. In addition, this function also needs to check if there are any managed statefulsets. This part is mainly the old version of TiDB deployed with Helm Chart. These statefulsets need to be managed by TiDB Operator, adding dependency tags to them.
After the above operations are completed, the Status of TiDBCluster CR is updated to the latest, related services and ConfigMap are created, a new Statefulset is generated, and rolling updates, capacity expansion, and Failover can be completed. Component reconciliations go round and round, monitoring component life cycle status and responding to life cycle status changes and user input changes, so that the cluster can run normally in the expected state.
Other life cycle maintenance work
In addition to the view of the main components of TiDB, some operations are orchestrated into the following function implementations, which are responsible for the following tasks:
-
Discovery is used for PD startup parameter configuration and TiDB Dashboard Proxy. The existence of Discovery can provide some dynamic information for components to obtain, avoiding Pod rolling update caused by modification of ConfigMap.
-
Reclaim PolicyManager is used to synchronize the CONFIGURATION of TC.spec. PVReclaimPolicy. By default, the PVReclaimPolicy is set to Retain to reduce the risk of data loss.
-
An OrphanPodCleaner, used to clean up pods when A PVC creation fails, allows the Statefulset Controller to retry Pod and corresponding PVC creation.
-
PVC Cleaner is used for PVC-related resources cleaning, cleaning the PVC that can be deleted by marking.
-
PVC Resizer is used for PVC expansion. When used on the cloud, you can modify the SIZE of PVC by modifying storage-related configurations in TidbCluster.
-
Meta Manager is used to synchronize StoreIDLabel, MemberIDLabel, NamespaceLabel information to Pod, PVC, and PV labels.
-
TiDBCluster Status Manager is used to synchronize TidbMonitor and TiDB Dashboard information.
summary
This article introduces the design of the control loop logic for TiDBCluster components. It attempts to explain that when the TiDBCluster Controller loop triggers the Reconcile function of each component, The Component Reconcile function examines the flow of components related resources, the expected state of the user, and how the components are actually run through the Reconcile function. The control loops in TiDB Operator generally follow the above design logic, and in later articles we will describe how each component applies the above design logic to implement component lifecycle management.
If you have any good ideas, join the TiDB Operator community by #sig-k8s or pingcap/tidb-operator.