Cabbage Java self study room covers core knowledge

Database and table solution MyCat series Basic concepts database and table solution MyCat series data and table solution MyCat series cluster transactions database and table solution MyCat series route distribution solution

1. Read/write separation

1.1. Several schemes of MySQL master-slave replication

Database read-write separation is an essential function for large systems or Internet applications with high traffic.

From the database point of view, for most applications, from centralized to distributed, one of the most basic requirements is not the data storage bottleneck, but the calculation bottleneck, namely the SQL query bottleneck, as we know, under normal circumstances, Insert SQL is written in dozens of milliseconds. While most Select SQL in the system takes seconds to minutes to produce results, many complex SQL, its server CPU consumption is very strong, no less than the power of the infinite loop. On a system without read/write separation, it is likely that some complex SQL queries during peak hours will cause the CPU of the database server to explode, the system to crash, and in severe cases, the database to crash. Therefore, from the perspective of database protection, we should avoid single-node databases with no master-slave replication mechanism.

For MySQL, the standard read/write separation is a master-slave mode, with a write node Master followed by multiple read nodes. The number of read nodes depends on the system pressure, usually 1-3 read nodes, as shown in the following figure:

MySQL supports more master-slave replication topologies, as shown below, but bidirectional master-slave synchronization and circular topologies are generally not used:

MySQL master-slave replication works as follows:

  • The first step is to log binary logs on the main library. Before each ready to commit transaction completes the data update, the master database logs the data update event to the binary log. MySQL logs binary logs in the order in which transactions are committed, not in the order in which each statement is executed. After logging the binary, the master library tells the storage engine that it is ready to commit the transaction.

  • Next, the standby database copies the binary logs from the primary database to its local relay logs. First, the standby library starts a worker thread, called the I/O thread. The I/O thread establishes a normal client connection with the primary library and then starts a special binary dump (binHG dump, thread (this thread has no corresponding SQL command) on the primary library. The binary dump thread reads the events in the binary log on the primary library. It does not poll for events. If the thread catches up with the primary, it goes to sleep until the primary sends a semaphore notifying it of new events, and the standby I/O thread logs the events it receives into the relay log.

  • The last step is performed by the SQL thread of the standby repository, which reads the events from the secondary log and executes them at the standby repository, thereby updating the data of the standby repository. When the SQL thread catches up with the I/O thread, the relay log is usually already in the system cache, so the overhead of the relay log is very low. Events executed by SQL threads can also be configured to write to their own binary logs, which is useful for the scenarios we cover later.

This replication architecture decouples the fetch and replay events, allowing the two processes to proceed asynchronously. This means that I/ O threads can work independently of SQL threads. But this architecture also limits the replication process, the most important of which is that queries run concurrently on the primary can only be serialized in the standby, because there is only one SQL thread to replay events from the relay log. As we will see later, this is a performance bottleneck for many workloads. While there are some solutions to this problem, most users are still subject to single threads.

MySQL5.6 provides a multi-thread synchronous replication solution based on GTID, that is, each library has a separate (SQL thread) row synchronous replication, which will significantly improve the data delay of MySQL master/slave synchronization, with Mycat fragmentation. Can better reduce the data synchronization delay of a super large table to a minimum.

In addition, using GTID eliminates the need to rely on file names and physical offsets to transmit binlog logic, and allows better automatic Dr Switching. This should be a good thing for operations personnel, because in the traditional way, you need to find binlog and POS points. In mysql5.6, you do not need to know the binlog and POS point, you need to know the IP address of the master, port, account password, and so on. Because synchronous replication is automatic, mysql uses its internal mechanism, GTID, to automatically find synchronization points.

Even with the concurrent replication mechanism, the data from the primary and secondary databases cannot be synchronized instantaneously. Therefore, there are enhanced solutions such as Galera for mysql, Percona-Cluster or Mariadb Cluster. They are a multi-master synchronous replication mode that can read and write on any node, automatically control members, automatically delete failed nodes, automatically join nodes, truly give row-level concurrent replication and other powerful capabilities!

Galera for mysql cluster is a way to sacrifice data write speed in exchange for the maximum concurrent data access ability, similar to the global table in Mycat, and ensures the existence of several valid copies of data at the same time, thus having very high reliability. Therefore, to some extent, Can replace some key scenarios of Oracle, currently open source middleware, only Mycat perfectly supports Galera for mysql cluster mode.

MySQL master-slave replication

MySQL master-slave replication is not perfect. There are several long-standing problems with MySQL master-slave replication.

  • Statement-based replication (SBR);
  • Row-based replication (RBR);
  • Mixed-mode replication (MBR);
  • SQL statement based replication is the oldest and currently the default replication. The next two replication methods were introduced after MySQL 5.

Advantages of RBR:

  • Any situation can be replicated, which is the safest way to replicate;
  • As with most other database systems replication techniques;
  • In most cases, replication is much faster for tables on the slave server that have primary keys.

Disadvantages of RBR:

  • Binlog is much bigger;
  • Complex rollbacks contain a large amount of data in the binlog;
  • When an UPDATE statement is executed on the primary server, all records that change are written to the binlog, whereas the SBR writes only once. This causes frequent concurrent write problems in the binlog.
  • You can’t see what statements are copied from the binlog.

Advantages of SBR:

  • Long history, mature technology;
  • The binlog file is small.
  • The binlog contains all database changes and can be used to audit database security.
  • Binlog can be used for real-time restoration, not just replication;
  • The version of the secondary server can be higher than that of the primary server.

Disadvantages of SBR:

  • Not all UPDATE statements can be copied, especially if they contain indeterminate operations;
  • Updates that require a full table scan (WHERE statement with no index) require more row-level locks than RBR requests.
  • For some complex statements, the resource consumption on the slave server is more severe, while in RBR mode, only the changed record is affected.
  • Data tables must be almost identical to the master server, otherwise replication errors may occur;
  • Executing complex statements can consume more resources if errors occur.

The choice of replication mode will affect the replication efficiency, server loss, and even data consistency. At present, there is no good objective means to evaluate which mode of replication is more suitable for a system. Mycat hopes to give more scientific suggestions through intelligent tuning module in the future.

You can run the show slave status command to obtain the status information about the master/slave synchronization. In addition to knowing whether the current master/slave synchronization is working properly, another important indicator is Seconds_Behind_Master. It said the current MySQL master-slave data synchronization delay unit is the second, but the index from the perspective of the DBA is not simple to understand for how many seconds delay, interested students can study for yourself, but for application, simple considered in master-slave synchronization time is ok, in addition, when master-slave synchronization to stop and restart the synchronization, This value is likely to be tens of seconds, depending on master-slave synchronization to stop the length of time, we can think that the data at the moment there are a lot of days without synchronization, and this value is close to zero, then master-slave synchronization delay minimum, we can collect the indicators and convergence curves, to analyze our database synchronization delay curve, and then according to the curve, A reasonable threshold is given. When the delay of master-slave synchronization is less than the threshold, we consider that the slave library is synchronous and data can be safely read from the slave library. Mycat will support this optimization in the future, allowing applications to read the expected data from the slave library more reliably.

1.3. Read and write separation supported by Mycat

  1. Mycat is not responsible for any data synchronization problems.
  2. Mycat configure read/write separation:
<dataHost name="localhost1" maxCon="1000" minCon="10" balance="1" writeType="0" dbType="mysql" dbDriver="native"> <heartbeat>select user()</heartbeat> <! -- can have multi write hosts --> <writeHost host="hostM1" url="localhost:3306" user="root" password="123456"> <! -- can have multi read hosts --> <readHost host="hostS1" url="localhost2:3306" user="root" password="123456" weight="1" /> </writeHost> </dataHost>Copy the code

Or:

<dataHost name="localhost1" maxCon="1000" minCon="10" balance="1" writeType="0" dbType="mysql" dbDriver="native"> <heartbeat>select user()</heartbeat> <! -- can have multi write hosts --> <writeHost host="hostM1" url="localhost:3306" user="root" password="123456" /> <writeHost host="hostS1" url="localhost:3307" user="root" password="123456" /> </dataHost>Copy the code

The first mode is unavailable when the write hangs, while the second mode can continue to be used. All operations inside the transaction will remove the write node, so do not add transactions to the read operation. If the read delay is large, use read/write separation based on the master/slave delay, or force the write node to be removed.

Forcing write is applied

A query SQL statement uses the /* balance */ annotation to determine whether it is a walk or write node. (Mandatory read/write processing was added after 1.6)

Force to walk from: /*! mycat:db_type=slave*/ select * from travelrecord; /*#mycat:db_type=slave*/ select * from travelrecord; /*#mycat:db_type=master*/ select * from travelRecord; / *! mycat:db_type=master*/ select * from travelrecord;Copy the code

Switch according to master/slave delay

<dataHost name="localhost1" maxCon="1000" minCon="10" balance="0" writeType="0" dbType="mysql" dbDriver="native" switchType="2" slaveThreshold="100"> <heartbeat>show slave status </heartbeat> <! -- can have multi write hosts --> <writeHost host="hostM1" url="localhost:3306" user="root" password="123456" /> <writeHost host="hostS1" url="localhost:3316" user="root" password="123456" /> </dataHost>Copy the code

1.4 The read/write separation mechanism for binding the primary/secondary replication status of MySQL is supported to make the read more secure and reliable. The configuration is as follows:

MyCAT heartbeat check statement set to show status like 'wsrep%', dataHost define two new properties: SwitchType ="3" This means that the read/write separation and switchover mechanism of the MySQL cluster replication status binding is enabled. The Mycat heartbeat mechanism detects the delay of cluster replication and does not load the nodes if the delay is too large or the cluster has node problems. <dataHost name="localhost1" maxCon="1000" minCon="10" balance="0" writeType="0" dbType="mysql" dbDriver="native" switchType="3" > <heartbeat> show status like 'wsrep%'</heartbeat> <writeHost host="hostM1" url="localhost:3306" user="root"password="123456" /> <writeHost host="hostS1"url="localhost:3316"user="root"password="123456" /> </dataHost>Copy the code

MyCAT heartbeat check statement set to show slave status and define two new properties on dataHost: SwitchType =”2″ and slaveThreshold=”100″, this means to enable the read/write separation and switchover mechanism of MySQL replication state binding. Mycat heartbeat system check “Seconds_Behind_Master”, “Slave_IO_Running” in show slave status. “Slave_SQL_Running” to determine the current synchronization status and Seconds_Behind_Master replication delay. When Seconds_Behind_Master>slaveThreshold, The read/write separation filter filters out the Slave machine to prevent it from reading the old data long ago. When the master node breaks down, the switchover logic checks whether the Seconds_Behind_Master on the Slave is 0. If the value is 0, the master and Slave are synchronized and the switchover is safe.

2. Ha and cluster

2.1. Several schemes of MySQL high availability

First, let’s look at several options for MySQL high availability:

Primary/secondary replication + Read/write separation

For applications with less strict requirements on real-time data, it is only necessary to expand the number of slaves through cheap PC server and spread the read pressure to multiple Slave machines, so as to solve the database side read performance bottleneck by spreading the read pressure of a single database server. After all, in most database applications, the read pressure is much greater than the write pressure. This has largely solved the database bottleneck problem of many small and medium-sized websites, and even some large websites are using similar solutions to solve the database bottleneck.

MySQL Cluster

The MySQL Cluster consists of a group of computers, each running multiple processes, including the MySQL server, the data node of the NDB Cluster, the administration server, and (possibly) specialized data access programs. NDB is an in-memory storage engine with high availability and good data consistency. The MySQL Cluster requires at least four physical hosts, two of which are management nodes, to achieve full redundancy and fault tolerance. MySQL Cluster is not widely used. In addition to its own architecture and limited applicable services, another important reason is that its installation and configuration management is relatively complicated. There are dozens of operation steps in total, requiring a DBA to spend several hours to build or complete. Restarting the management operations of the MySQL Cluster database requires 46 manual commands and 2.5 hours of DBA time, whereas MySQL Cluster Manager requires only one command. However, MySQL Cluster Manager is only available as part of the commercial MySQL Cluster Carrier-level Version (CGE) database and needs to be purchased. If the SQL operation in the application accesses the primary key database and contains some JOIN operations instead of performing a regular scan and JOIN on the entire table and returning tens of thousands of rows, it is suitable for Cluster. Otherwise, it is not suitable for Cluster. NDB is not suitable for most business scenarios and has security issues.

HeartBeat + double active replication

Heartbeat is a component of the Linux-HA project. At its core, Heartbeat consists of two parts: heartbeat monitoring and resource takeover. If no packet is received from the peer party within a specified period of time, the peer party is considered invalid. In this case, you need to start the resource takeover module to take over the resources or services running on the peer host.

HeartBeat + DRBD + MySQL

DRBD is an open source Cluster software that realizes data mirroring synchronization of block devices through the network. It automatically completes disk synchronization on two different servers on the network. Compared with binlog synchronization, DRBD is a lower-level disk synchronization.

Lvs + Keepalived + double master replication

Lvs is a virtual server cluster system that implements simple load balancing on LINUX. Keepalived is a layer3, 4 & 5 switching software designed for failover between master and standby machines. It is a wide range of load balancing and high availability solutions, most commonly used in Web systems.

MariaDB + Galera

Gluster mode is a new high availability solution, and its advantages have been mentioned before. It has few disadvantages, such as no XA support, no Lock Table support, and only InnoDB engine.

2.2. Mycat – MySQL high availability solution

Mycat as a proxy layer middleware, the high availability of Mycat system involves the high availability of Mycat itself and the high availability of back-end MySQL. The MySQL high availability scheme described in the previous chapters can be used to ensure the high availability of back-end MySQL services connected to Mycat. In most cases, it is recommended to take a standard MySQL master-slave replication high availability configuration and deliver it to Mycat for automatic master-slave switchover of back-end MySQL nodes.

As shown in the figure, the MySQL node enables the master-slave replication configuration scheme, and configures the primary node as writeNode in Mycat dataHost, and the secondary node as readNode. Meanwhile, Mycat periodically launches heartbeat detection for all writeHost and readHost nodes in a dataHost. Under normal conditions, Mycat will take the first writeHost as the write node and send all DML SQL to this node. If read/write separation is enabled for Mycat, the query node will be sent to readHost(+writeHost) for execution according to the read/write separation policy. When two or more writeHosts are configured in a dataHost, if the first writeHost fails, Mycat will automatically switch to the next available writeHost to execute DML SQL statements after the default three heartbeat checks fail. Add the current writeHost index to your conf/dnindex.properties file (the first is 0, the second is 1, and so on). Note that this file cannot be deleted or changed unless you have a good understanding of what it does and why you want to use it.

Mycat = Mycat; Mycat = Mycat; Mycat = Mycat; Mycat = Mycat

Proposal is that the existing state unchanged, change the flag authorities, after the recovery of MySQL node as a slave node, follow the new master node, and reconfigure the master-slave synchronization of the node to do original synchronized with the other nodes in the same handsome, reconfigure the source, the node data manually synchronize, add Mycat.

2.3. Mycat’s own high availability scheme

With MySQL out of the way, let’s take a look at the high availability of Mycat itself. Since Mycat itself is a stateless middleware (except for the dnindex.properties file recorded during the master/slave switchover), Mycat is easy to deploy as a cluster. Provide high availability solutions. The myCAT-Balance component was originally planned for Mycat load balancing. However, due to the lack of volunteers and the lack of verification in production practice, it is not recommended to use it for the time being. The official suggestion is to use hardware-based load balancer or software HAproxy. Compared with LVS, HAProxy is much simpler to use and has rich functions. It is free and open source and has very good stability, which can be comparable to LVS.

The following figure shows the high availability solution composed of HAproxy+Mycat cluster +MySQL master/slave:

If you are also concerned about HAproxy’s stability and single point problems, you can use Keepalived VIP floating function to enhance:

3. Transaction support

3.1. Database transactions in Mycat

Currently, Mycat does not provide strong support for cross-shard transaction consistency. At present, transaction integrity can be guaranteed within single library. If any shard error occurs during the execution of cross-library transaction, all shards can be guaranteed to roll back, but once the commit instruction is applied, all shards cannot be guaranteed to succeed. It is called weak XA because a fragment is unlikely to hang.

3.2. XA Transaction principle

Distributed Transaction Processing (DTP) refers to a collection of programs or program segments that perform certain functions on one or more resources, such as databases or files. The key to distributed transaction processing is that there must be a way to know everything a transaction is doing anywhere, and the decision to commit or roll back a transaction must produce uniform results (all commit or all rollback). The X/Open organization (now known as the Open Group) defines the distributed transaction processing model. X/Open DTP model (1994) includes application program (AP), transaction manager (TM), resource manager (RM) and communication Resource Manager (CRM). In general, a common transaction manager (TM) is transaction middleware, a common resource manager (RM) is a database, and a common communication resource manager (CRM) is messaging middleware.

XA Specification refers to the interface between TM and RM. In fact, this protocol only defines the xA_ and AX_ series of function prototypes, function descriptions, constraints and implementation specifications, etc. As for the protocol through which RM and TM communicate, it is not mentioned. At present, well-known databases, such as Oracle and DB2, all implement XA interface and can be used as RM. Transaction middleware such as Tuxedo and TXseries can connect to these data sources through XA protocol. JTA(Java Transaction API) is an X/Open DTP compliant programming model, and the Transaction management and resource manager scaffolding also uses XA.

The following two images show XA success and failure respectively. The first is the flow chart of XA transaction success:

Then, there is the flow chart of XA transaction failure:

The key of XA transactions lies in TM components, where the difficult technical points are as follows:

  • On the second commit, when RM1 commit is completed but RM2 commit is not completed, TM needs to coordinate. When RM2 is restored, the transaction that was not committed before will be recommitted or the transaction that was Rollback will be automatically rolled back.
  • Therefore, TM needs to record the status of XA transactions and their execution on each RM, and this log file needs to be stored in a reliable place for remedial work after AN XA transaction exception.

TM must store transaction information such as XID, which RM has completed, etc. The transaction information can be discarded only after all RM commits or rolls back.

3.3. XA transaction issues and MySQL limitations

The obvious problem with XA transactions is timeout issues, such as when an RM fails, the entire transaction can only wait. This can have a ripple effect, causing the entire system to slow down and eventually become unusable, as well as two phase commits that greatly increase the time of XA transactions and make them unable to support high concurrency requests.

The way to avoid XA transactions is usually final consistency.

For example, in a business logic, the last step is to increase the user account by 300 yuan. In order to reduce the pressure on the DB, the user account is first put into the elimination queue, and then the back end takes the message from the message queue and updates the DB. So how do you guarantee that this message won’t be re-consumed? Or is it possible to repeat the cost and still get the correct result? Include the version of the user account in the database in the message, compare the version of the data when updating, and add 300 if the same; For example, if the user originally had 500 yuan, the message would update the user’s money to 800, instead of adding 300; Another way is to create a table of whether the message has been consumed, record the message ID, in the transaction, first determine whether the message has been sent, if not, update the database, add 300, otherwise it indicates that the message has been consumed, discard.

Both of the previous approaches must be procedurally guaranteed to be unidirectional. Actually strictly, using the message queue to realize the eventual consistency is still flawed, because the message queue with the current operation of the database are two different resources, there is still a message queue failure leads to an increase of 300 yuan this account information is not stored (complex advanced message queuing products, of course, can avoid this kind of phenomenon, but there is still a risk). The second option does not have this possibility because the new table is represented in the same Database as the previous transaction.

MySQL XA transactions have a long-standing flaw:

The primary and secondary databases of the MySQL database are synchronized by copying the Binlog. Binlog is the coordinator of XA transactions within the MySQL database, and the MySQL database is optimized for Binlog. Binlog does not write prepare logs, but only commit logs. All participating nodes completed preparation and crashed before xa commit. Crash Recover If commit is selected for this transaction. Because the binlog was not written in the prepare phase, the distributed transaction was committed in the primary database. However, the operation of the transaction was not written to the binlog, so it failed to be copied to the standby database. As a result, data inconsistency occurred between the primary and standby databases.

4. Mycat SQL interception mechanism

SQL interception is a useful advanced technique. You can write a Java class that overwrites the SQL passed into MyCAT and then executes it in MyCAT. This technique performs some special functions:

  • Capture and record some special SQL;
  • Record SQL lookup exceptions;
  • Rewriting SQL for performance reasons, such as changing the order of query conditions or adding paging limits;
  • Enforce Read mode for some Select SQL, separating Read from write (many transaction frameworks have difficulty stripping Select SQL from transactions;
  • Later Mycat intelligent optimization, intercept all SQL for intelligent analysis, automatically monitor node load, automatically optimize routes, and provide database optimization suggestions.

The principle of SQL interception is to intercept SQL before routing, and then do other processing, and then do routing, as shown in the following figure:

The default interceptor implements filtering conversion of Mysql escape characters. Non-default interceptors have only one interceptor that intercepts record SQL.

4.1. Default interceptors:

<system> 
<property name="sqlInterceptor">io.mycat.interceptor.impl.DefaultSqlInterceptor</property>
</system>
Copy the code

Source code implementation:

/** * escape mysql escape letter */ @Override public String interceptSQL(String sql, int sqlType) { if (sqlType == ServerParse.UPDATE || sqlType == ServerParse.INSERT || sqlType == ServerParse.SELECT || sqlType == ServerParse.DELETE) { return sql.replace("\\'", "''"); } else { return sql; }}Copy the code

4.2. Capture and record SQL interceptor configurations

<system> <property name="sqlInterceptor">io.mycat.interceptor.impl.StatisticsSqlInterceptor</property> <property Name ="sqlInterceptorType">select, update, insert, delete</property> <property name="sqlInterceptorFile">E:/mycat/sql.txt</property> </system>Copy the code
  • SqlInterceptorType: intercepts the SQL type
  • SqlInterceptorFile: path for saving SQL files

If you need to implement your own SQL interception, simply change the configuration class to configure yourself:

  1. SQLInterceptor defines a custom class, implements SQLInterceptor, and returns it after rewriting the SQL.
  2. Put your implemented classes in a catlet directory, either as classes or jars.
  3. Configuration configuration file:
<system> <property name="sqlInterceptor">io.mycat.interceptor.impl. Custom Class</property> <! -- Other configuration --> </system>Copy the code

Database and table solution MyCat series Basic concepts database and table solution MyCat series data and table solution MyCat series cluster transactions database and table solution MyCat series route distribution solution