Author: Xing Jiashu, senior engineer, currently working in the Database team of Tencent TEG Infrastructure Department. Tencent database technology team maintains the MySQL kernel branch TXSQL, 100% compatible with the original MySQL version, internally supports wechat red envelope, lottery and other internal business of the group, and externally provides the kernel version for Tencent cloud CDB for MySQL.
CDBTune is a database intelligent performance tuning tool independently developed by Tencent Cloud. Compared with the existing general methods in the industry, CDBTune does not need to subdivide load types and accumulate a large number of samples, and participates in parameter tuning through intelligent learning to achieve better parameter tuning effects.
Database systems are complex and vary in load, making tuning difficult for DBAs:
- Many parameters, up to several hundred
- There is no unified standard among different databases, and their names, functions and interactions differ greatly
- Depending on human experience tuning, human cost is high and efficiency is low
- Tool tuning is not universal
This sums up three problems: complexity, low efficiency and high cost. How does Tencent cloud intelligent performance tuning tool crack these problems in continuous practice?
Practice 1: Heuristic/search-based Algorithm
The input consists of two parts:
- Parameter constraints: including the set of parameters to be tuned and the upper and lower bounds of the parameters;
- Resource constraints: After how many rounds the tuning process stops.
Configuration Sampler: samples input parameters to generate Configuration. The configuration is set to SUT (the environment to be tuned). System Manipulator: It interacts with the SUT, sets parameters, and gets performance data from the SUT.
Performance Optimizer: Finds the optimal configuration based on the configuration and Performance data. PO algorithm mainly includes two methods: DDS and RBS.
lDivide-and-Diverge Sampling (DDS)
Here, DDS is used to divide the subspace of parameters to reduce the complexity of the problem. First, each parameter is divided into k regions, so n parameters have k^ N combinations, thus reducing complexity. If k and n are big, maybe the space is still big. How to deal with it? In this case, the method of over-sampling can be used to extract only K samples.
lRecursive Bound-and-Search (RBS)
Near a point on a performance plane, it is always possible to find a point with similar or better performance, which means that a better configuration can be found. In the existing sample, find the configuration with the best performance. You then run multiple rounds around this configuration, recursively looking for possible better configurations.
The possible problem with the search-based approach is that sampling and testing can be time-consuming and can fall into local optimality.
Practice 2: Machine Learning Methods /Machine Learning
There are three main steps:
L Identify load characteristics
Metric is the internal state indicator of the system, such as MySQL’s Innodb_metric. Two methods are used here, one is FA and the other is k-means clustering.
- Identify the correlation between configuration parameters and performance
There are several hundred configuration parameters, first sorted by Lasso linear regression parameters and their relationship to performance. Parameters that have a significant impact on performance are preferred.
- Automatic tuning
Match the target workload, which is the metric characteristics displayed by the load running under different configurations to match the most similar load. Then, based on the matched load, the optimal configuration is recommended. This uses the Gaussian process, and involves exploration/ exploitation.
The problem with this method is that the tuning process relies heavily on historical data, and similar workload can only be matched, which requires high training data. If no match is found, no good configuration is found.
Practice 3: Deep Learning methods /Deep Learning
Through the deep learning network, the following parameters are recommended for final adjustment:
- Obtain internal metrics corresponding to Workload
- Learn how internal metrics change during parameter tuning
- Learn the parameters that ultimately need to be adjusted
This model is highly dependent on training data and requires performance data for various loads in various configurations. The combination of database load and configuration is too numerous to cover. If similar scenarios are not matched, tuning results may not be ideal.
Practice 4: Reinforcement Learning
In reinforcement learning, the process of interaction between human and environment is simulated. Agent will make corresponding response action according to the observed state state. At the same time, the Environment accepts actions to change its state. This process will generate rewards according to certain rules, that is, the evaluation of action.
Finally, through practice comparison, we choose to use reinforcement learning model to develop database parameter tuning tool CDBTune. It emphasizes the action of tuning arguments and moves away from a data-centric approach.
Reinforcement learning and parameter tuning, we define the following rules:
- Rule: Adjust parameters at a certain interval to obtain performance data
- Rewards: Positive rewards for performance improvements, negative rewards for declines
- Objective: To minimize the time/number of referrals and achieve a high desired reward value
- Status: Internal metric of the system
We call the internal metrics of a system internal metrics; External performance data, such as TPS/QPS/Latency, are called external metrics. In the database parameter tuning scenario, the Agent selects an action (or multiple parameters) for parameter adjustment to act on the database, and calculates the immediate reward that should be obtained according to the external indicators after the action is executed.
The reinforcement learning corresponds to the parameter tuning scenario. The problem with this scenario is that reinforcement learning requires us to construct a table showing the benefit of performing an action in a certain state before we know which action has the greatest benefit. But the database’s state space (performance metrics) and action space (configuration combinations) are so large that combining such a table is an impossible task. This is where deep reinforcement learning comes in. We need to approximate the effect of the Q-Table through a deep network, which is how CDBTune is implemented.
CDBTune implementation
- S indicates the current database performance status (internal indicator), and S’ indicates the next database performance status
- R is immediate reward, W is neural network parameter, a is action taken (execution of configuration parameter)
- Q is the value function of state behavior
This model is divided into two parts.
L Database environment: on the left side of the figure, parameters will be set in this environment, and then the internal and external indicators of the environment will be collected and fed back to the model on the right.
L Deep reinforcement learning Network: on the right of the figure, the implementation algorithm is similar to Nature DQN published by DeepMind, using two Q-networks.
In addition, Replay Memory is our Memory pool and historical data will be recorded. And then the training goes on and on, adding to the pool. The deep learning network will randomly select sample models from the memory pool for training.
When estimating reward for an action, it is based on the assumption that our reward depends on the impact on the outcome of each step in the future; The biggest impact has been on recent returns. through
I approximate this Q value. For a sample (s, a), we can get a real return r. At this time, we can obtain their previous Loss and adjust the network on the left to make the Loss on both sides smaller and smaller. Then our network will gradually converge and get better recommendations.
Data form and related strategies
evaluation
Through the test, it can be seen that CDBTune achieves better parameter tuning effect through self-learning parameter tuning process without any previous data collection, and the throughput and delay performance obtained by CDBTune reaches a relatively understandable level. This is also the advantage of the deep reinforcement learning method compared with other methods.
Conclusion:
Advantages of intelligent parameter adjustment based on DQN
- Simplify the complexity and eliminate the need for precise classification of loads
- The parameter adjustment action is more consistent with the actual parameter adjustment situation
- There is no need to acquire enough samples to reduce the workload of early data collection
- Exploration and Exploitation can reduce the dependence on training data and reduce the possibility of falling into local optimization
In practice, we also encountered some problems:
- Select the actual movement, training efficiency is not high, training cycle is long
- Discretization of continuous configurations may result in low accuracy and slow convergence of recommended configurations
- Using the maximum Q value of the action causes the problem of overestimation of the Q value
In view of these problems, we are constantly optimizing and improving our model and optimizing parameters. It is believed that CDBTune can achieve better results in the future.