Meituan-dianping is based on THE CMDB high availability architecture of MGR
This article is based on professor Wang Zhipeng’s speech at DBAPlus community [Beijing Data Architecture and Data Optimization Technology Salon] on August 4, 2018. Click on the end of the article [read the text] and you can download the full PPT~
introduction
Wang Zhipeng
Meituan comments DBA
I once worked as a DBA in JD Finance, and now I am working in Meituan-Dianping, mainly responsible for the operation and maintenance of financial business line database and basic component database.
MySQL Group Replication (MGR), released by Oracle in 5.7.17, brings new life to MySQL high availability solutions. Its consistency, and the ability to automatically switch and write without relying on external components, has created a lot of expectations for DBAs.
The background,
Using MHA as a switching tool, CMDB to manage metadata, and high availability solution combined with middleware are common architectures in MySQL ecosystem. In this system, CMDB, as one of the basic components, can no longer rely on this architecture to achieve its own high availability, but needs a set of high availability architecture guarantee.
Starting from the second half of 2017, Meituan-Dianping database plans to fully upgrade and launch version 5.7, which is exactly the opportunity for the IDEA of CMDB high availability based on Mgr-born.
Ii. About MGR
MGR is embedded in the MySQL instance in the form of Plugin, which implements conflict detection and Paxos protocol communication.
Some students may know that it is very similar to PXC, and the war of words between the two in the community is also very lively. The pros and cons of the two are not listed here, but it is worth mentioning that in MGR cluster, node synchronization is still achieved through binlog. This is dbA-friendly and means we can easily regain the familiar master-slave feel (Still A MySQL).
Iii. Solutions
MGR includes two modes, multi-master and single-master. For the consideration of some known problems of multi-master mode and actual business scenarios, we decide to choose single-master mode as the main scheme. That is, when the master node fails, the cluster automatically elects a new master node, and the application directs write access to the new master node.
So what are the specific solutions that need to be considered?
-
Restrictions on MGR;
-
Relevant tests;
-
Reasonable parameters.
1. MGR
-
Only InnoDB storage engine is supported;
-
You must have a primary key;
-
Binlog_format Supports only ROW format.
-
Save Point not supported (fixed later)
-
…
The authorities have given some clear requirements and limitations. In view of these limitations, we need to check online databases to be accessed, investigate the feasibility, and standardize them to meet the requirements of MGR. Including convergence of MyISAM storage engine tables, tables without primary keys, application logic to remove save points (new version removes this limitation), etc.
In addition, our production concerns, such as the impact of network jitter on MGR, availability of backup and recovery tools, online-DDL availability, etc., also need to be taken into account. We did systematic functional testing of this:
We also learned something new about MGR’s behavior during the test. Scenarios that Are explicitly not supported by MGR, such as the MyISAM engine and tables without primary keys, are handled in an “optimistic” way, allowing you to create and Alter, but not write data.
2. The parameters of the MGR
At the same time, in the tests mentioned above, we also encountered different behaviors with different values of several important parameters. For example, group_replication_UNREachable_majority_timeout indicates that after the MGR node enters the UNREACHABLE state from ONLINE (usually caused by network jitter or node exception), it waits for the corresponding time. If UNREACHABLE remains, the node is set to the ERROR state. That is, this parameter is a timeout of the UNREACHABLE state, in seconds.
(Several states of MGR node)
The default value of this parameter is 0, indicating that there is no timeout, that is, infinite waiting. In actual production environments, such as network exceptions, this is an unacceptable result.
The following parameters are summarized according to the official documents and some problems in the practice process, which can be used as a reference:
3. Final architecture
Finally, I implemented the MGR cluster of three computer rooms and three nodes as the main body of the high availability solution, and extended down a set of master and slave clusters as the DISASTER recovery of irreparable problems. After all, as a newborn, the reliability of MGR has yet to be verified. I believe that in the near future, we are confident enough to give up the scheme of rolling back to master and slave.
Four, online process
Since 2018, Meituan-Dianping has migrated a total of three systems online, including process system, report system and CMDB.
5. Typical problems
In the process of launching several clusters, we also accumulated some problems, and some typical ones are briefly reviewed here:
1. Large transactions
After the report system went online, a strange situation occurred in the cluster: at some time points, the nodes would not be able to reach the UNREACHABLE status, and in serious cases, the cluster would switch to the master directly. During this period, the machine network in the machine room was normal.
This problem initially bothered us for some time. By comparing the previous process system, we found that the traffic size of the two cluster nics was somewhat different, and the report system had obvious spikes:
From this analysis point of view, we looked at the documentation and found that there was a parameter that could be optimized, group_replication_compression_threshold, which means that transactions exceeding the appropriate size should be compressed before transmission. The figure below shows the comparison after parameter adjustment, which is reduced from 1.5m to 15K. In the actual scenario, the occurrence of abnormal state did decrease, but did not eradicate.
We did some testing along the way and got to the root of the problem: big things.
It should be noted that the large transactions of MGR have their own “definition”, and their large transactions are related to the transmission time of the network, which explains why the number of abnormal state of nodes decreases after we enable compression. Finally, we solved this problem completely by limiting the size of transactions, while optimizing the business logic for large transactions. Here are two related parameters:
1. 2. Smart refrigerator
The second problem occurred during a node offline drill. After the DBA started the drill, the developer suddenly reported that the Nginx machine was down due to the backlog of requests. Nginx availability:
At this point we found a spike in slow query monitoring by the DBA, and the timing roughly matched. The following figure shows slow query monitoring:
Thus, we analyzed the slow queries during this period and found that this SQL is very familiar to us — the statement of MGR querying the master node. Normally, the SQL execution time is in milliseconds, and the execution time is 10 seconds at the time of the fault, which is consistent with the stop group_replication operation.
Based on this, we conducted corresponding tests to verify our guess: In the START and STOP processes of MGR node, all queries of replication_group_member view of the current node hang. This explains the outage caused by the backlog of Nginx requests. After that, we solved the problem by adding timeout logic to query the view in the program.
3. The equipment room is faulty
Third problem occurs in a real engine fault: CMDB master node in computer network bandwidth by half, result in CMDB MGR cluster and a set of business master-slave cluster happened almost at the same time switch, MGR switch time probably around 3 s, business basic without awareness, only part of the error, but business cluster switch found backfill CMDB failure.
The main reason is that the switchover logic still uses the DNS connection mode, leading to the switchover, DNS synchronization to redirect, and the switched application to the new DNS address resolution is not effective.
This failure also prompted us to migrate all core CMDB access to a Smart Client developed in-house.
Six, Smart Client
About Smart Client, it is a Python connection API developed by us internally. It is a set of functions of mGR-switching automatic master selection and read-write separation based on MySQLdb. It’s very easy for those of you who are familiar with Python and MySQL.
7. Daily operation and maintenance
As for the daily operation and maintenance of MGR, the actual situation is relatively easy.
The initialization differs little from that of the master/slave cluster except for some parameters. In terms of monitoring, in addition to the basic monitoring of the system and MySQL (which is compatible with MGR), we also add the monitoring of MGR node status, that is, all the nodes that are not in ONLINE state are abnormal. At the same time, students will ask, how is the delay monitored? In theory, MGR is an ultimately consistent cluster, which has no concept of delay, but we can approximate it as a delay by monitoring the values in the queue of pending transactions.
The graph below shows the “latency” of a cluster on the line, with the ordinate being the number of transactions:
The GTID_SET difference between the master node and other nodes can also be used as a reference.
Write at the end
Through a series of online drills, including some peak times, and a period of observing operational conditions, the MGR is indeed a stable and reliable highly available architecture. Although not very friendly for write-intensive scenarios, it is believed to provide new ideas for DBAs’ high-availability solutions.
reference
-
Requirements of MGR:
https://dev.mysql.com/doc/refman/5.7/en/group-replication-requirements.html
-
Limitations of MGR:
https://dev.mysql.com/doc/refman/5.7/en/group-replication-limitations.html
-
Parameter configuration:
https://dev.mysql.com/doc/refman/5.7/en/group-replication-configuring-instances.html
-
Python toolkit: Used by Python MySQL Group Replication
Refer to the link: https://km.sankuai.com/page/52289606
Scan the following public account on wechat
Use applets
“Small programs
Open the