Summary of Mycat

What is Mycat? By definition and classification, it is an open source distributed database system. It is a Server that implements the MySQL protocol. Front-end users can regard it as a database agent, which is accessed by MySQL client tools and command line. The back-end can use the MySQL Native protocol to communicate with multiple MySQL servers, and can also use the JDBC protocol to communicate with most mainstream database servers. Its core function is to divide tables and libraries, that is, a large table is horizontally divided into N small tables, which are stored in the back-end MySQL server or other databases.

Mycat is no longer a simple MySQL agent. Its backend can support mainstream databases such as MySQL, SQLServer, Oracle, DB2 and PostgreSQL, as well as MongoDB storage in a new NoSQL mode. More types of storage will be supported in the future. From the end user’s point of view, regardless of the storage mode, Mycat is a traditional database table, which supports standard SQL statements for data operation. In this way, for the front-end business system, it can greatly reduce the difficulty of development and improve the speed of development.

For DBAs, Mycat can be understood as follows:

Mycat is the MySQL Server, and the MySQL Server connected to Mycat is like the storage engine of MySQL, such as InnoDB, MyISAM, etc. Therefore, Mycat does not store data, data is stored in the backend MySQL. So data reliability and transactions are guaranteed by MySQL.

For software engineers, Mycat can be understood as follows:

Mycat is a database server that is approximately equivalent to MySQL. You can connect to Mycat the same way you connect to MySQL (except for the port, the default Mycat port is 8066, not 3306, so you need to add port information to the connection string). In most cases, You can use Mycat with your familiar object mapping framework.

For architects, Mycat can be understood as follows:

Mycat is a powerful database middleware, not only can be used for read/write separation, table and library separation, disaster backup, but also can be used for multi-tenant application development, cloud platform infrastructure, make your architecture very adaptable and flexible.

Principle of Mycat

One of the most important verbs in the principle of Mycat is “intercept”. It intercepts THE SQL statement sent by the user. First, it does some specific analysis on the SQL statement, such as sharding analysis, routing analysis, read/write separation analysis, cache analysis, and then sends the SQL to the real database at the back end.

The returned result is processed properly and finally returned to the user.

The Orders table is divided into three shards datanodes (dn for short). These shards are distributed between two MySQL servers (datanodes =database@datahost), so you can shard from one to N servers. Sharding rule is a typical string enumeration sharding rule. The definition of a rule is Sharding Column + RuleFunction, where the sharding field is PROV and the sharding function is string enumeration.

When Mycat receives an SQL, it will first parse the SQL, find the table involved, and then look at the definition of the table. If there is a sharding rule, it will get the value of the SHARding field in THE SQL, and match the sharding function to get the shard list corresponding to the SQL, and then send the SQL to these shards to execute. Finally, the result data returned by all fragments is collected and processed, and output to the client. Select * from Orders where prov=? For example, find prov= Wuhan, according to the sharding function, Wuhan returns DN1, so THE SQL is sent to MySQL1, to get the query result on DB1, and return to the user.

Select * from Orders where prov in (‘ wuhan ‘, ‘Beijing’); select * from Orders where prov in (‘ Wuhan ‘, ‘Beijing’); But usually in business, our SQL will have the Order By and Limit page turning syntax, which involves the secondary processing of the result set on the Mycat side.

Common concept

Logic library (schema)

Usually for practical applications, there is no need to know the existence of middleware, business developers only need to know the concept of database, so database middleware can be regarded as a logical library composed of one or more database clusters.

Logical table ( table )

Since there are logical libraries, there will be logical tables. In a distributed database, for applications, the tables that read and write data are logical tables. A logical table can be divided into one or more shard libraries after data is shard, or it can be composed of only one table without data shard.

Subdivision table

Shard tables refer to tables with large data that need to be shelled into multiple databases. In this way, each shard has a part of the data, and all the shards constitute the complete data.

The subdivision table

Not all tables in a database are large, and some tables may not need to be shard. Non-shard refers to tables that do not need to be shard.

ER table

Relational database is based on entity-Relationship Model, which describes things and relationships in the real world. ER table in Mycat is derived from this. According to this idea, a data sharding strategy based on e-R relationship is proposed, in which the records of the child Table and the associated parent Table are stored in the same data sharding, that is, the child Table depends on the parent Table, and the Table Group is adopted to ensure that data Join will not operate across libraries.

Table Group is a good idea to solve cross-fragment data join, and it is also an important rule of data partitioning planning.

Global table

In a real business system, there are often a large number of tables similar to dictionary tables, which rarely change. Dictionary tables have the following characteristics:

• Infrequent changes;

• There is little overall change in data volume;

• Data size is small, rarely exceeding hundreds of thousands of records.

For such tables, in the case of sharding, when the business table is sharded because of the scale, the association between the business table and these affiliated dictionary tables becomes a thorny problem. Therefore, Mycat solves the join of such tables through data redundancy, that is, all the sharding has a copy of the data. So define dictionary tables or some tables that conform to dictionary table features as global tables.

Data redundancy is a good way to solve cross-fragmented data join and another important rule in data partitioning planning.

Shard node (dataNode)

After data is sharded, a large table is divided into different shard databases. The database where each table shard resides is the dataNode.

The host node (dataHost)

After data segmentation, each shard node (dataNode) may not have an exclusive machine. There can be multiple shard databases on the same machine, so that the machine where one or more shard nodes (Datanodes) reside is the dataHost. In order to avoid the limit on the number of concurrent hosts on a single node, As far as possible, slice nodes (Datanodes) with high read/write pressure should be evenly placed on different node hosts (datahosts).

Fragmentation rules (rule)

As mentioned above, when a large table is divided into several sharding tables, certain rules are required. In this way, the rules for dividing data into certain sharding tables according to certain business rules are sharding rules. It is very important to select appropriate sharding rules for data sharding, which will greatly avoid the difficulty of subsequent data processing.

Global serial number (sequence)

After data is sharded, the primary key constraint in the original relational database cannot be used under distributed conditions. Therefore, external mechanisms need to be introduced to ensure the unique identification of data. Such mechanism to ensure the unique identification of global data is global sequence.

Mycat installation and deployment

MyCAT is written and developed using the JAVA language. Before using MyCAT, you need to install the JAVA Runtime environment (JRE). Because MyCAT uses some features in JDK7, it must be run on JDK7 or later. You are advised to use Mysql 5.5 or later. 5.6 is recommended.

Enter the github.com/MyCATApache… Or dl.mycat. IO /, select a package suitable for your operating system from the list, download it, and unpack it in a directory.

Mycat is a standard Java program that can be unpacked and run. Run mycat in Linux and startup_nowrap.bat in Windows

In Mycat’s directory:

The bin program directory, which houses the startup scripts, provides the nowrap script commands in addition to the version packaged as a service.

Note: mycat support command {the console | start | stop | restart | status | dump}

Server. XML is the Mycat server parameter adjustment and user authorization configuration file, schema. XML is the logical library definition and table and sharding definition configuration file, rule-xml is the sharding rule configuration file. The specific parameters of the sharding rule are stored in a separate file, which is also in this directory. To modify the configuration file, you need to restart Mycat or reload port 9066.

The lib directory contains jar files that mycat depends on.

Logs are stored in logs/mycat.log, one file per day. Log configuration is in conf/log4j. XML.

An admin command

MyCAT itself has the same management and monitoring mode as other databases. You can use the Mysql command line to log in to the management port (9066) and execute the corresponding SQL for management. You can also use JDBC for remote connection management.

Login: Currently, mycat has two ports, 8066 data port and 9066 management port. Login is performed through 9066 management port, similar to mysql server login.

Mysql -h127.0.0.1 -utest -ptest-p9066 [-dmycat]

-h follows the host, that is, the host address specified by mycat. The local can be 127.0.0.1 remote Requires a remote IP address

-u Mycat server. XML specifies the logical library user

-p Logical library password configured in Mycat server. XML

-p is followed by the default port 9066. Note that P is uppercase

-d Mycat server. XML logical library configured

show @@help; You can view all commands

reload @@config; This command is used to update the configuration file. For example, after the schema. XML file is updated, you can run this command on the CLI to update the configuration file without restarting the configuration file.

show @@database; This command is used to display the list of MyCAT databases, which correspond to the schema child nodes of the schema.xml configuration file

show @@datanode; This command is used to display the list of data nodes of MyCAT, corresponding to datanodes in the schema. XML configuration file, where NAME indicates the NAME of datanodes. DataHost indicates the value of the corresponding dataHost attribute, that is, the dataHost. ACTIVE indicates the number of ACTIVE connections. IDLE indicates the number of IDLE connections. SIZE indicates the total number of connections.

Run the following command to find the dataNode list in the corresponding schema:

show @@datanode where schema = ?

The show @@version command is used to obtain the version of MyCAT

The show @@connection command is used to query the front-end connection status of Mycat, that is, the connection between an application and Mycat

Kill @@connection ID,id,id Used to kill a connection.

Show @@backend Displays back-end connection status.

show @@sql ; **** records the execution of SQL requests sent by users to mycat-server through local port 8066

information

show @@sql.slow ; **** (reload @@SQLslow =0; This command is the same as the “how to set slow SQL time threshold” command.

show @@sql.sum ; **** Statistics of SQL commands executed