preface

MySQL Replication is a feature of MySQL. It can copy data from one Instance of MySQL Server to another Instance of MySQL Server. Although the replication process is asynchronous rather than real-time, the latency is minimal due to its efficient performance design. The Replication function of MySQL is widely used in practical application scenarios to ensure system data security and system scalability design.

This article will focus specifically on how to leverage MySQL’s Replication capabilities to improve system scalability.

What Replication means for scalable design

In the Internet application system, the most convenient extension may be the most basic Web application services. Because Web application services are mostly stateless, they rarely need to store much data, except for information such as sessions. So, for a basic Web application server, it’s easy to Scale Out by simply adding servers and copying applications.

However, database is not easy to achieve convenient Scale Out due to its special nature. Of course, various database vendors have been working hard to achieve the Scalability of their database software as well as the Scalability of regular application servers. Some Scalability features, such as logical replication, are also implemented on Web application servers.

The MySQL database has also made a great effort to do this, and MySQL Replication has been largely developed for this purpose. With MySQL Replication, you can easily replicate data from a database to many MySQL hosts, form a MySQL cluster, and then provide services through this MySQL cluster. In this way, the load on each MySQL host is greatly reduced, and the processing power of the entire MySQL cluster is easily increased.

Why is it possible to Scale Out with MySQL Replication?

This is mainly because MySQL Replication allows complete Replication of data from one MySQL database to multiple MySQL databases on multiple hosts at the same time, and normally the Replication latency is not very long. When we have the same data on each server, the application access is no longer limited to one database host, but can access the same data on any host in the entire MySQL cluster. Another important factor is that MySQL replication is very easy to implement and maintain. This is very important to implement a simple distributed database cluster, after all, the main work of a system implementation is maintenance, a maintenance complex system is certainly not a popular system.

Replication mechanism

To use a good system, it is very important to understand its implementation principle, only to understand its implementation principle, we can develop strengths and circumvent weaknesses, reasonable use, to build the most suitable for our own application environment of the system, to be able to better maintain him after the implementation of the system.

Let’s take a look at how MySQL Replication works.

(1) the Replication thread

Mysql Replication is an asynchronous Replication process from one Mysql instace (which we call Master) to another Mysql instance (which we call Slave). The replication between Master and Slave is performed by three threads, two (Sql thread and IO thread) on the Slave side, and one (IO thread) on the Master side.

To implement MySQL Replication, you must first enable the Binary Log (mysql-bin.xxxxxx) function on the Master side. Otherwise, the Binary Log cannot be implemented. The entire replication process is essentially a Slave retrieving the log from the Master and performing the operations recorded in the log in complete sequence on its own. You can open the Binary Log of MySQL by using the ‘-log-bin’ option during MySQL Server startup. Or add the log-bin parameter item to the mysqld parameter group (the parameter part after the [mysqld] identifier) in the my.cnf configuration file.

MySQL replication process is as follows:

  1. SlaveThe aboveIOThread connectionMasterAnd requests the log content from the specified location in the specified log file (or from the original log);
  2. MasterReceive fromSlaveIOThe thread is responsible for copying after the requestIOAccording to the request information, the thread reads the log information after the specified position of the specified log and returns toSlavetheIOThreads. In addition to the information contained in the log, the returned information includes the information in theMastertheBinary LogFile name as well as inBinary LogThe position of;
  3. SlaveIOAfter receiving the information, the thread writes the received log content toSlavetheRelay LogFile (mysql-relay-bin.xxxxxx) at the very end and will read intoMasterthebin-logThe file name and location of themaster-infoFile so that the next read can be clear at high speedMaster“I need to get from someonebin-logPlease send me the contents of the next log.
  4. SlaveSQLThread detectsRelay LogThe new content is parsed as soon as it is addedLogThe contents of the file become inMasterExecutable when the end actually executesQueryStatement, and execute these on its ownQuery. In this case, it’s actually atMasterThe client andSlaveThe end performs the sameQuery, so the data at both ends are exactly the same.

In fact, in older versions, MySQL replication was implemented on the Slave side not by the SQL thread and IO thread working together, but by a single thread doing all the work. But MySQL engineers quickly discovered that there were significant risks and performance issues associated with doing so, mainly as follows:

First of all, if the Binary Log is replicated on the Master side, parsed on the Master side, and then executed on itself ina serial process, performance will be significantly limited. The latency for Replication in this architecture is naturally longer.

Second, after the Slave gets the Binary Log from the Master, it needs to parse it back to the original Query executed by the Master, and then execute it on its own. In the process, the Master side has probably made a lot of changes and generated a lot of Binary Log information. If the Master storage system fails irretrievably at this stage, all changes made at this stage are lost forever and cannot be retrieved. This potential risk is particularly acute when the Slave side is under a lot of pressure, because if the Slave side is under a lot of pressure, it will take longer to parse the logs and apply the logs, and more data may be lost.

Therefore, in the later transformation, in order to minimize this risk and improve the replication performance, the new version of MySQL changed the replication on the Slave side to two threads, namely the SQL thread and IO thread mentioned above. This improvement was first proposed by Yahoo! “Jeremy Zawodny”, one of the engineers. Through such transformation, this not only solves the performance problem to a large extent, shorts the asynchronous delay time, but also reduces the potential data loss.

Of course, there is still the possibility of Slave data delay and data loss even if the two threads cooperate in the current way, after all, the replication is asynchronous. These problems exist as long as changes to data are not made in a transaction.

If you want to avoid these problems completely, you can only use MySQL Cluster to solve the problem. However, MySQL Cluster is still an in-memory database solution until I write this part of the content, that is, all data including indexes need to be loaded into memory, so the memory requirements are very large, for the general popular application implementation is not too large. Of course, in a previous conversation with MySQL CTO David, MySQL is currently working on improving its Cluster implementation. One of the big changes is to allow data not to be loaded into memory, but only indexes to be loaded into memory. I believe that MySQL Cluster will be much more popular and implementable after this transformation.

② Replication implementation level

MySQL replication can be based on either a Statement Level or a Row Level. This Level can be set in the MySQL configuration parameters. Different levels of replication will affect the Binary Log on the Master side.

  1. Row Level:Binary LogIs recorded as each row of data has been modified, and then inSlaveThe end then modifies the same data.

Advantages: In Row Level mode, the Binary Log does not record the context of the SQL statement executed, only which record was modified and to what extent. So the Row Level log content clearly records the details of each Row’s data modification, making it easy to understand. And there are no specific cases where stored procedures, or functions, or trigger calls and triggers cannot be copied correctly.

Disadvantages: At Row Level, all statements executed in the Binary Log are recorded as changes per Row. This can result ina large amount of Log content, such as the update statement: UPDATE group_message SET group_id = 1 where group_id = 2; UPDATE group_message SET group_id = 1 where group_id = 2 It’s what happens to each record that the statement updates, which is recorded as a number of events in which many records are updated. Naturally, the Binary Log is large. Especially when statements like ALTER TABLE are executed, the amount of logs generated is staggering. Because MySQL handles DDL change statements such as ALTER TABLE by rebuilding all data in the entire TABLE, that is, every record in the TABLE needs to be changed, so every record in the TABLE is logged.

  1. Statement Level: Each entry modifies the dataQueryWill be recordedMasterBinaryLogIn the.SlaveAt the time of copyingSQLThe thread will resolve to the originalMasterThe same as that executed by the endQueryTo do it again.

Advantages: The advantages of the Statement Level are that the disadvantages of the Row Level are solved. You do not need to record every data change in the Row Level, reducing the Binary Log amount, saving I/O costs and improving performance. Because he only needs to record the details of the statement being executed on the Master, and information about the context in which the statement was executed.

Disadvantages: Because he is a record of statement execution, so, in order to make these statements on the slave side can also be executed correctly, then he must also record each statement at the time of execution of some relevant information, namely the context information, to ensure that all statements on the slave side glass of execution can be implemented and in the master when the same results. In addition, due to the rapid development of Mysql, a lot of new functions are constantly added, so that the replication of Mysql has encountered no small challenge. The more complex the content involved in natural replication, the more likely bugs will appear. At the statement level, there are a number of cases that may cause mysql replication problems, mainly when certain functions or functions are used to modify data, such as: The sleep() function cannot be copied exactly in some versions, and the use of last_insert_id() in stored procedures may result in inconsistent ids on the slave and master, and so on. Since row level logs changes on a per-row basis, similar problems do not occur.

MySQL does not support Row Level replication until MySQL 5.1.5 supports Row Level replication. Since 5.0, MySQL replication has solved a number of problems that occurred in older versions of MySQL that did not replicate correctly. However, the appearance of stored procedure brings a new challenge to MySQL replication. In addition to Statement Level and Row Level, MySQL provides a third replication mode, Mixed Level, which is actually a combination of the two modes. In Mixed mode, MySQL will treat the log form of the record according to each specific Query Statement executed, that is, between Statement and Row. The Statment level in the new version is the same as before, and only records statements executed. Not all changes to the Mysql database will be recorded in the Row Level mode. For example, when a table structure changes, the statement mode will be recorded in the statement mode. If the Query statement is indeed an UPDATE or DELETE statement that modifies data, all row changes are logged.

This article is not over yet! The rest (Replication common architecture, Replication scaffolding implementation, and summary) will be updated in the next article!

Keep an eye out for my personal column, Mysql Performance Tuning, and keep an eye out for Mysql performance tuning articles!