MySQL high availability MHA principle and testing
Article source: Tao Teacher operation notes – wechat official account
1. Introduction of MHA
- Introduction to the MHA architecture: github.com/yoshinorim/…
- Github download: github.com/yoshinorim/…
MHA introduction:
Currently, Master High Availability (MHA) is a relatively mature solution for MySQL High Availability. It is developed by Youshimaton in Japan and is a set of excellent High Availability software for failover and Master/slave promotion in MySQL High Availability environment. During the MySQL failover process, MHA can automatically complete the database failover operation within 0 to 30 seconds. During the failover process, MHA can ensure data consistency to the maximum extent and achieve high availability in the true sense.
MHA advantages:
- Master Failover and Slave Promotion are very fast. 2. Automatic detection, multiple detection, switching process support to call other scripts interface.
- Master Crash does not cause data inconsistency and automatically complements data to maintain data consistency.
- There is no need to change any Settings for replication. It is easy to deploy and has no impact on the existing architecture.
- You don’t need to add many additional machines to deploy MHA, supporting centralized management of multiple instances.
- There is no performance impact.
- Supports any engine across storage engines
Scenarios not supported by MHA:
- Multi-level replication (M1->M2->Slave)
- MySQL5.0.45 or later
- Replication filteing rules(binlog-do-db,replicate-ignore-db,etc) must be consistent
- Load Data [Local] Infile with SBR Not supported.
2. Working principle of MHA
- Github.com/yoshinorim/…
- www.slideshare.net/matsunobu/a…
The working principle of MHA is summarized as follows: (1) Save binary log events from the master crash; (2) Identify the slave with the latest update; (3) Apply different relay logs to other slaves; (4) Apply binary log events saved from master; (5) Promote a slave to the new master; (6) Use other slaves to connect the new master for replication.
During automatic MHA failover, the MHA tries to save binary logs from the primary server that went down to maximize data loss, but this is not always possible. For example, if the primary server hardware fails or is inaccessible over SSH, the MHA cannot save binary logs and only fails over and loses the most recent data. Semi-synchronous replication with MySQL 5.5 greatly reduces the risk of data loss. MHA can be combined with semi-synchronous replication. If only one slave has received the latest binary log, THE MHA can apply the latest binary log to all other slave servers, thus ensuring data consistency among all nodes.
2.1 Main Library Troubleshooting Scenario
Master library failure:
- Scenario 1: All slave libraries have all binlog events
This is an ideal situation, but things can’t always be so lucky.
-
Scenario 2: The Master has transactions that are not synchronized to the slave library using semi-synchronous replication to avoid this risk.
-
Scenario 3: Some binlog events are missing from the library
2.2 Problems and difficulties
The difficulty of the master library Failover is that the latest slave library is still missing the master library binlog event.
2.3 Problem Solving
Goal achievement:
Save the binlog event
Find the slave of the latest master:
Identify missing events from each library
To restore
3. MHA component architecture
3.1 MHA Component Architecture
-
MHA Manager: management node. It is deployed on an independent server to manage multiple master/ Slave clusters or on one Slave node. Each master/ Slave cluster is called an application. The MHA Manager periodically probes the master node in the cluster. When the master node is found to be faulty, it automatically promotes the slave node with the latest data to the new master node, and then redirects all other slave nodes to the new master node.
-
MHA node: Data node that runs on each MariaDB server (manager/ Master /slave) and accelerates failover by monitoring scripts that parse and clean logs.
3.2 Software Package Functions
Manager:
Manager tool package: Masterha_manger Starts MHA masterha_check_ssh Check the SSH configuration of MHA masterha_check_repl Check the MySQL replication masterha_master_monitor Checking whether the master is down MasterHA_check_status Checking the current MHA running status masterha_master_switch Controlling failover (automatic or manual) Masterha_conf_host Adding or deleting configured server informationCopy the code
The Node tools:
Node tool (for all cluster nodes) : These tools are usually triggered by scripts from the MHA Manager, Save_binary_logs Save and copy the master's binary log apply_DIFF_relay_logs Identifies different relay log events and applies their different events to other purge_relay_logs Clear relay logs (without blocking SQL threads)Copy the code
3.3 MHA Processing Process
====== monitor node Monitor node ====== (1) Monitor all nodes, especially the master node. (2) Monitor the master failure (instance (SSH can), host (SSH cannot). (3) Monitor the master/slave status ====== failover ====== (3) Compare the GTID number of each node. (3) Data compensation 1: If SSH can be connected, the secondary node immediately saves the binary logs of its missing part. (4) Master selection: Compare the GTID number of each node and select a secondary node closest to the data of the master database to recover the missing logs. Stop slave reset slave all (5) Data compensation 2: If SSH cannot be connected, calculate the difference between the relaylogs of the two slave libraries and restore the slave library to the one with less data New Lord, open up new master-slave relationship used = = = = = = = = = = = (7) using the VIP mechanism to realize application transparent = = = = = = add function = = = = = = (8) automatically fix the main library (cluster) to develop... (9) the problem of secondary data compensation (binlog Server) (10) reminder function (send_report) (11) the problem of weightCopy the code
4. Set up the MHA environment
- Github.com/yoshinorim/…
- Github.com/yoshinorim/…
4.1 Environment Planning
Use three machines to make a simple MHA environment, MHA software version is MHA-0.56.
IP | Port | The DB role | MHA role | Software version |
---|---|---|---|---|
192.124.64.212 | 3307 | DB1 master | mha-node | Centos6, mha – 0.56 |
192.124.64.213 | 3307 | DB2 slave | mha-node | Centos6, mha – 0.56 |
192.124.64.214 | 3307 | DB3 slave | mha-node node-manager | Centos6, mha – 0.56 |
Installation suggestions:
1. Manager can be installed on any machine independently; 2. One manager can manage multiple mysql clusters. 3. It is recommended not to install Manager on the main library (in case of power failure or network disconnection of the main library); 4. Node packages must be installed on all databases. 5. Manager depends on Node
4.2 SSH Trust among Nodes
# Perform the following operations for each node
ssh-keygen -t rsa -P ' ' -f /root/.ssh/id_rsa
#Ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected] ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected] SSH - copy - id - I/root /. SSH/id_rsa pub [email protected]Copy the code
4.3 Primary/Secondary installation of MySQL
- MySQL GTID introduces and constructs the GTID slave library
1. The installation of MySQL
MySQL > install MySQLMysql_install -p 3307 -r m -b 2G -v 5.6.27Copy the code
2. Set up a master/slave relationship
DB1 as the master,DB2,DB3 as the slave library# authorization
grant replication client,replication slave on *.* to 'repl'@10. '%' IDENTIFIED BY 'repl123';
grant all privileges on *.* to mha@10. '%' identified by 'mha123';
#DB2,DB3 establish master-slave relationship
CHANGE MASTER TO
MASTER_HOST='192.124.64.212',
MASTER_PORT=3307,
MASTER_USER='repl',
MASTER_PASSWORD='repl123',
MASTER_AUTO_POSITION = 1;
#
start slave ;
show slave status\G
Copy the code
Description:
- Set global read_only=1. Do not write this parameter into the configuration file.
- Set global relay_log_purge=0. MHA relay_log_purge=0. Manually clear the relay log.
4.4 MHA Software Installation
Download and install the software: MHA nodes must be installed on all nodes (database master, slave, and MHA Manager nodes). Because MHA Manager also needs to depend on MHA Node.
# Software downloadMha website: https://code.google.com/archive/p/mysql-master-ha/ making download address: https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads all nodes installation Node software dependent package yum install perl - DBD - MySQL - y RPM - the ivh Mha4mysql - node - 0.56-0. El6. Noarch. RPMInstall mha-manager on DB3Yum install mha4mysql - manager - 0.56-0. El6. Noarch. RPMCopy the code
5. MHA configuration
- Github.com/yoshinorim/…
To ensure that the MHA works properly, you need to configure the MHA configuration file and set proper values for parameters, such as the server IP address, database user name and password, working directory, and log. If you install MHA source code, there will be two configuration file templates, app1.cnf and masterha_default.cnf, under the path $MHA_BASE/samples/conf/.
Create a directory:
Mkdir /etc/mha/script -p Create a log directory mkdir -p /var/log/mha/
Copy the code
Edit the MHA profile:
vim /etc/mha/mysql3307.cnf
[server default]
manager_log=/var/log/mha/mysql3307/manager
manager_workdir=/var/log/mha/mysql3307
master_binlog_dir=/data1/mysql_3307/
user=mha
password=mha123
ping_interval=2
repl_user=repl
repl_password=repl123
ssh_user=root
#master_ip_failover_script=/etc/mha/script/master_ip_failover
#shutdown_script= /etc/mha/script/power_manager
#report_script= /etc/mha/script/send_master_failover_mail
[server1]
hostname=192.124.64.212
port=3307
[server2]
hostname=192.124.64.213
port=3307
[server3]
hostname=192.124.64.214
port=3307
Copy the code
6. Check the status
1. Trust check
$masterha_check_ssh--conf=/etc/mha/mysql3307.cnf Sat Mar 21 23:14:28 2020 - [warning] Global configuration file /etc/masterha_default .cnf not found. Skipping. Sat Mar 21 23:14:28 2020 - [info] Reading application default configuration from /etc/mha/mysql3307.cnf.. Sat Mar 21 23:14:28 2020 - [info] Reading server configuration from /etc/mha/mysql3307.cnf.. Sat Mar 21 23:14:28 2020 - [info] Starting SSH connection tests.. . Sat Mar 21 23:14:29 2020 - [info] All SSH connection tests passed successfully.Copy the code
2. Check the replication
masterha_check_repl --conf=/etc/mha/mysql3307.cnf
Sat Mar 21 23:17:00 2020 - [warning] Global configuration file /etc/masterha_default .cnf not found. Skipping.
Sat Mar 21 23:17:00 2020 - [info] Reading application default configuration from /etc/mha/mysql3307.cnf..
Sat Mar 21 23:17:00 2020 - [info] Reading server configuration from /etc/mha/mysql3307.cnf..
Sat Mar 21 23:17:00 2020 - [info] MHA::MasterMonitor version 0.56.
Sat Mar 21 23:17:01 2020 - [info] GTID failover mode = 1
Sat Mar 21 23:17:01 2020 - [info] Dead Servers:
Sat Mar 21 23:17:01 2020 - [info] Alive Servers:
Sat Mar 21 23:17:01 2020 - [info] 192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:17:01 2020 - [info] 192.124.64.213(192.124.64.213:3307)
Sat Mar 21 23:17:01 2020 - [info] 192.124.64.214(192.124.64.214:3307)
Sat Mar 21 23:17:01 2020 - [info] Alive Slaves:
Sat Mar 21 23:17:01 2020 - [info] 192.124.64.213(192.124.64.213:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Mar 21 23:17:01 2020 - [info] GTID ON
Sat Mar 21 23:17:01 2020 - [info] Replicating from 192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:17:01 2020 - [info] 192.124.64.214(192.124.64.214:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Mar 21 23:17:01 2020 - [info] GTID ON
Sat Mar 21 23:17:01 2020 - [info] Replicating from 192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:17:01 2020 - [info] Current Alive Master: 192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:17:01 2020 - [info] Checking slave configurations..
Sat Mar 21 23:17:01 2020 - [info] read_only=1 is not setOn slave 192.124.64.213(192.124.64.213:3307). Sat Mar 21 23:17:01 2020 - [info] read_only=1 is notsetOn slave 192.124.64.214(192.124.64.214:3307). Sat Mar 21 23:17:01 2020 - [info] Checking Replication filtering settings.. Sat Mar 21 23:17:01 2020 - [info] binlog_do_db= , binlog_ignore_db= Sat Mar 21 23:17:01 2020 - [info] Replication filtering check ok. Sat Mar 21 23:17:01 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Sat Mar 21 23:17:01 2020 - [info] Checking SSH publickey authentication settings on the current master.. Warning: Permanently added'192.124.64.212'(RSA) to the list of known hosts. Sat Mar 21 23:17:01 2020 - [info] HealthCheck: SSH to 192.124.64.212 is reachable. Sat Mar 21 23:17:01 2020 - [info] 192.124.64.212(192.124.64.212:3307) (current Master) +--192.124.64.213(192.124.64.213:3307) +--192.124.64.214(192.124.64.214:3307) Sat Mar 21 23:17:01 2020 - [info] Checking Replication Health on 192.124.64.213.. Sat Mar 21 23:17:01 2020 - [info] OK. Sat Mar 21 23:17:01 2020 - [info] Checking Replication Health on 192.124.64.214.. Sat Mar 21 23:17:01 2020 - [info] ok. Sat Mar 21 23:17:01 2020 - [warning] master_ip_failover_script is not defined. Sat Mar 21 23:17:01 2020 - [warning] shutdown_script is not defined. Sat Mar 21 23:17:01 2020 - [info] Gotexit code 0 (Not master dead).
MySQL Replication Health is OK.
Copy the code
7. Start the MHA
Start the MHA:
# Check MHA Manager monitoring status
# masterha_check_status --conf=/etc/mha/mysql3307.cnf
mysql3307 is stopped(2:NOT_RUNNING).
Start MHA monitoring --remove_dead_master_conf --ignore_last_failover
$nohup masterha_manager --conf=/etc/mha/mysql3307.cnf --remove_dead_master_conf --ignore_last_failover >> /var/log/mha/mysql3307/mha-3307.log 2>&1 &
# Check status
$masterha_check_status--conf=/etc/mha/mysql3307. CNF mysql3307 (pid:10265) is running(0:PING_OK), master:192.124.64.212Copy the code
Stop MHA monitoring:
Stop MHA monitoring masterha_stop --conf= /etc/mha/mysql337.cnfCopy the code
8. MHA test
8.1 Checking the Current Status
$masterha_check_status--conf=/etc/mha/mysql3307. CNF mysql3307 (pid:10265) is running(0:PING_OK), master:192.124.64.212$mysql- h 192.124.64.214 - P 3307-e "set global relay_log_purge=0"
$mysql- h 192.124.64.214 - P 3307-e "show global variables like '%relay_log_purge%'"
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| relay_log_purge | OFF |
+-----------------+-------+
$mysql- h 192.124.64.214 - P 3307-e "show slave status\G" |egrep 'Master_Host|Master_Port|Slave_IO_Running|Slave_SQL_Running|Seconds_Behind_Master' -i
Master_Host: 192.124.64.212
Master_Port: 3307
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 0
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Copy the code
8.2 Test automatic switchover
1. Test the failure of the master library and automatically switch over.
#kill master DB1
$kill mysql_pid;
Copy the code
2. View detailed logs:
Observe the Manager log. If the message “Successfully” is displayed at the end, the switchover succeeds. tail -f /var/log/mha/mysql3307/manager.log
cat /var/log/mha/mysql3307/manager Sat Mar 21 23:24:59 2020 - [info] mha ::MasterMonitor version 0.56. Sat Mar 21 23:25:01 2020 - [info] GTID failover mode = 1 Sat Mar 21 23:25:01 2020 - [info] Dead Servers: Sat Mar 21 23:25:01 2020 - [info] Alive Servers: Sat Mar 21 23:25:01 2020 - [info] 192.124.64.212(192.124.64.212:3307) Sat Mar 21 23:25:01 2020 - [info] 192.124.64.213(192.124.64.213:3307) Sat Mar 21 23:25:01 2020 - [info] 192.124.64.214(192.124.64.214:3307) Sat Mar 21 23:25:01 2020 - [info] Alive Slaves: Sat Mar 21 23:25:01 2020 - [info] 192.124.64.213(192.124.64.213:3307) Version=5.6.27-log (Oldest major Version between slaves)log-bin:enabled Sat Mar 21 23:25:01 2020 - [info] GTID ON ... ----- Failover Report ----- mysql3307: MySQL Master Failover 192.124.64.212(192.124.64.212:3307) to 192.124.64.213(192.124.64.213:3307) Succeeded Master 192.124.64.212 (192.124.64.212:3307) is down! Check MHA Manager logs at LeDB-VM-124064214:/var/log/mha/mysql3307/manager fordetails. Started automated(non-interactive) failover. Invalidated master IP address on 192.124.64.212(192.124.64.212:3307) Selected 192.124.64.213(192.124.64.213:3307) as a new master. 192.124.64.213(192.124.64.213:3307): OK: Applying all logs Succeeded. 192.124.64.213(192.124.64.213:3307): OK: Activated master IP address. 192.124.64.214(192.124.64.214:3307): OK: Slave started, replicating from 192.124.64.213(192.124.64.213:3307) Master failover to 192.124.64.213(192.124.64.213:3307) completed successfully.Copy the code
3. Check the results
DB2 became the master library, and DB3 became DB2’s slave library.
masterha_check_status --conf=/etc/mha/mysql3307.cnf
mysql3307 is stopped(2:NOT_RUNNING).
$mysql- h 192.124.64.213 - P 3307-e "show slave status\G"
$mysql- h 192.124.64.214 - P 3307-e "show slave status\G" |egrep 'Master_Host|Master_Port|Slave_IO_Running|Slave_SQL_Running|Seconds_Behind_Master' -i
Master_Host: 192.124.64.213
Master_Port: 3307
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 0
Copy the code
8.3 Manual Switchover
The MHA Manager must not be running. Manual failover indicates that the automatic MHA switchover is not enabled on the service. When the primary server fails, the MHA is manually invoked to perform failover. The command is as follows: If the MHA Manager detects a dead server, an error is reported and failover is terminated.
DB2 is master. Disable the MHA and kill the primary library 192.124.64.213:3307.
# Manual switch
$masterha_master_switch- master_state = dead - the conf. = / etc/mha/mysql3307 CNF - dead_master_host = 192.124.64.213 dead_master_port = 3307 - new_master_host = 192.124.64.212 - new_master_port = 3307 - ignore_last_failoverCopy the code
The output is interactive and asks you if you want to switch: It is recommended to read the output to understand the manual switching process.
Sat Mar 21 23:55:43 2020 - [info] MHA::MasterFailover Version 0.56. Sat Mar 21 23:55:43 2020 - [info] Starting master failover. Sat Mar 21 23:55:43 2020 - [info] Sat Mar 21 23:55:43 2020 - [info] * Phase 1: Configuration Check Phase.. . Sat Mar 21 23:55:46 2020 - [info] ** Phase 1: Configuration Check Phase completed. Sat Mar 21 23:55:46 2020 - [info] Sat Mar 21 23:55:46 2020 - [info] * Phase 2: Dead Master Shutdown Phase.. ----- Failover Report ----- mysql3307: MySQL Master Failover 192.124.64.213(192.124.64.213:3307) to 192.124.64.212(192.124.64.212:3307) Succeeded Master 192.124.64.213 (192.124.64.213:3307) is down! Check MHA Manager logs at LeDB-VM-124064214forInvalidated master IP address on 192.124.64.213(192.124.64.213:3307) Selected 192.124.64.212(192.124.64.212:3307) as a new master. 192.124.64.212(192.124.64.212:3307): OK: Selected 192.124.64.212(192.124.64.212:3307): OK: Activated master IP address. 192.124.64.214(192.124.64.214:3307): OK: Slave started, replicating from 192.124.64.212(192.124.64.212:3307) Resetting slave info Succeeded. Master failover to 192.124.64.212(192.124.64.212:3307) completed successfully.Copy the code
Reference:
- The official document: MHA architecture https://github.com/yoshinorim/mha4mysql-manager/wiki/
- MHA author PPT https://www.slideshare.net/matsunobu/automated-master-failover
- MySQL MHA high availability architecture https://blog.csdn.net/qq_35209838/article/details/86497864
- Mysql-mha high availability technology
- MySQL GTID introduces and constructs the GTID slave library