Once you use MySQL replication, you are likely to encounter a master/slave switchover. This may be to upgrade the server iteratively, or to convert a standby library to the primary library when there is a problem with the primary library, or simply to reallocate capacity. But for whatever reason, you need to tell the other standby libraries about the new master.

Active/standby switchover is easier if it is planned (at least easier than in an emergency). Simply use the CHANGE MASTER TO command in the standby library and specify the appropriate value. And most of the values are optional, as long as you specify the configuration item interface to change.

The standby repository will discard the previous configuration and relay logs and start replication from the new primary repository. At the same time, the new parameters will be updated to the master.info file so that the standby database configuration information will not be lost even after a restart.

The hardest part of the whole process is getting the right binary log location on the new main library. So that the standby can replicate from the same logical location as the old master.

It is more difficult to promote the standby database as the master database. We divide the promotion of the standby database into two scenarios: planned switchover and unplanned switchover.

1 Switch within the plan

To promote the standby database to the master database, simply put, there are the following steps:

  1. Stop writing to the old master library.
  2. Have the standby library catch up with the primary (optional, to simplify subsequent steps).
  3. Configure a standby library as the new master library.
  4. Point the standby and write operations to the new master, and then enable write to the master.

But there are a lot of details in the above process. Some scenarios may depend on the topology of replication. To dig a little deeper, here are the steps most configurations require:

  1. Stops all write operations on the current primary library. If possible, it is best to close all client applications (except for replication connections).
  2. Stop all active writes on the primary library by running the FLUSH TABLE WITH READ LOCK command. You can also set the read_only option on the main library. This means that from this point on, no writes to the old master are allowed. Because once the new master is switched, writes to the old master will mean data loss. Note that even setting read_ONLY does not prevent existing transactions from continuing to commit. Therefore, you can kill all open transactions, effectively ending all writes.
  3. Select a standby as the new primary and make sure that it has fully kept up with the primary (for example, let it run all relay logs from the primary).
  4. Ensure that the new master database is consistent with the old master database.
  5. Execute STOP SLAVE on the new master library.
  6. Run CHANGE MASTER TO MASTER_HOST= “on the new MASTER and RESET SLAVE TO disconnect it from the old MASTER and discard the information recorded in master.info. (If the connection information is recorded in my.cnf, it will not work properly. Therefore, we recommend not to write replication connection information to the configuration file.
  7. Run the SHOW MASTER STATUS command to record binary log coordinates for the new MASTER library.
  8. Make sure other standby libraries have caught up with the old master.
  9. Close the old master library.
  10. Connect the client to the new master library.
  11. Execute the CHANGE MASTER TO statement on each standby library TO point TO the new primary library, using the binary log coordinates previously obtained.

2 Unplanned switchover

When the primary database crashes, a secondary database needs to be promoted as the primary database. This process is more troublesome. If there is only one standby library, use the standby library directly. But if you have more than one standby, you need to do some extra work.

In addition, there is the potential for lost replication events. There may be cases where changes that have occurred on the primary database have not yet been updated to any of its standby databases. It is even possible that a statement is rolled back on the primary but not on the standby, which may exceed the logical replication location of the primary. If you can recover the data from the master library at some point, you might be able to retrieve the missing statements and execute them manually.

In the following description, be sure to use the values of Master_Log_File and Read_Master_Log_Pos on the server.

2.1 Upgrade of the standby warehouse of the master and standby structure

  1. Determine which standby database has the latest data. Check the output of the SHOW_SLAVE_STATUS command on each standby library and select the one with the latest values of Master_Log_File and Read_Master_Log_Pos.
  2. Let all standby libraries execute all relay logs obtained from the old master before the crash.
  3. Execute STOP SLAVE on the new master library.
  4. Run CHANGE MASTER TO MASTER_HOST= “and then RESET SLAVE on the new MASTER TO disconnect from the old MASTER and discard the information recorded in master.info.
  5. Run the SHOW MASTER STATUS command to record binary log coordinates for the new MASTER library.
  6. Compare the values of Master_Log_File and Read_Master_Log_Pos on each standby and the new master.
  7. Connect the client to the new master library.
  8. Execute the CHANGE MASTER TO statement on each standby library TO point TO the new primary library, using the binary log coordinates previously obtained.

If you already have log_bin and log_slave_updates enabled on all standby repositories, you can restore all standby repositories to a consistent point in time, which would be difficult to do without these options enabled.

An important point in the above process is to determine the log location. Next, let’s look at how.

3 Determine the log location

If the location of the standby database is different from that of the new MASTER database, you need TO find the location of the last event executed by the standby database in the binary log of the new MASTER database and run the CHANGE MASTER TO command. You can use the mysqlbinlog tool to find the last query executed by the standby database, then find the same query on the primary database and perform a simple calculation.

For description purposes, assume that each logging event has an increment numeric ID. The new master gets the event numbered 100 when the old master crashes, and the other two standby libraries are R2 and R3. R2 has acquired event 99, R3 has acquired event 98.

If BOTH R2 and R3 were pointed to the same binary log location in the new master library, they would replicate from event 101, resulting in data out of sync. However, as long as the binary log of the new master library is opened via log_slave_updates, events 99 and 100 can be found in the binary log of the new master library, thus restoring the standby library to a consistent state.

The same event may have different offsets on different servers due to server restarts, different configurations, log rotation, or FLUSH LOGS commands. We can use mysqlbinlog to parse out the last event executed on each standby from the binary or relay log, and we can use mysqlBinlog to parse the binaries on the new primary to find the same query. Mysqlbinlog will print out the offset of the event. Use this value in the CHANGE MASTER TO command.

A faster method is to subtract the byte offset on the new primary library from the stopped standby library, which shows the difference in byte positions. This value is then subtracted from the current binary log location of the new main library to get the desired query location.

Let’s take a look at chestnuts.

Suppose s1 is the primary library for S2 and S3. Where S1 has crashed. SHOW SLAVE STATUS to get the values of Master_Log_File and Read_Master_Log_Pos. S2 has finished executing all binary logs on S1, but S3 has not. As shown in figure 1:

We can be sure that S2 has executed all binary logs on the main library, because the values of Master_log_File and Read_Master_Log_Pos match the last log location on S1. Therefore, we can promote S2 as the new primary and set S3 as s2’s standby.

What parameters should be assigned TO the CHANGE MASTER TO statement that needs TO be executed on S3? We need to do a little bit of math here.

S3 stops at offset 1493, which is 89 bytes smaller than the offset 1582 of the last statement s2 executed.

S2 is writing to the binary log at offset 8167, so in theory we should point S3 to the s2 log at offset 8167-89=8078.

Finally, at position 8078 in s2 log, determine whether this position is the correct log event.

If the verification is correct, you can use the following command to switch S3 to s2 standby:

CHANGE MASTER TO MASTER_HOST=”s2 host”, MASTER_LOG_FILE=”mysql-bin.000009″, MASTER_LOG_POS=8078;

If the server had already executed and logged an event a when it crashed. Since S2 only reads and executes up to 1582, it may lose event A. But if the disks of the old master library are not corrupted, the missing events can still be found through mysqlbinlog or from the log server’s binary log.

conclusion

  1. Standby database promotion distinguishes between planned and unplanned scenarios.
  2. When the standby database is promoted, finding the exact binary log location of the new primary database is key.