First, MySQL extension specific implementation method

With the continuous expansion of business scale, it is necessary to choose appropriate solutions to cope with the increase of data scale, to cope with the increasing access pressure and data volume.

The expansion of database mainly includes: business separation, master and slave replication, database sub-database and sub-table. This article mainly tells about database sub – database and sub – table

(1) Business separation

The article “Big Data and High Concurrency Solutions for Large Web applications” also explains why the business should be split.

At the beginning of the business, many applications adopt centralized architecture to speed up application rollout and rapid iteration. With the expansion of the service system, the system becomes more and more complex, more and more difficult to maintain, the development efficiency becomes lower and lower, and the consumption of resources becomes larger and larger, and the cost of improving the system performance through hardware also becomes higher and higher.

Therefore, in the early stage of selection, a good architecture design is an important guarantee for the later system expansion.

For example, the e-commerce platform contains several modules such as users, goods, evaluation and order. The simplest way is to create four tables of Users, Shops, comment and Order respectively in a database.

However, as the business grew in size and traffic increased, we had to split the business. Each module uses a separate database for storage, different business access to different databases, the original dependence on one database split into the dependence on four databases, so that four databases bear the pressure at the same time, the system throughput will naturally improve.

(2) Master/slave replication

The figure above shows the process of data synchronization between Master and Slave of MySQL.

The Slave obtains the Binary log file from the Master, and then performs the operations recorded in the execution log of the local mirror. Since the Master and Slave replication is asynchronous, data between the Slave and Master may be delayed. In this case, data consistency can only be ensured.

(3) Database sub-library and sub-table

We know that no matter how good it configuration every machine has its own physical limits, so that when we apply already reach or beyond single machine a limit of time, we only looking for other machines to help or to continue to upgrade our hardware, but still common solution by adding more machines to common pressure.

We also have to consider as our business logic grows, can our machines grow linearly to meet demand? Therefore, the use of database sub-database sub-table, can immediately improve the performance of the system, about why to use the database sub-database sub-table other reasons are not described here, mainly about the specific implementation strategy. See the following section.

Second, the implementation strategy of separate tables

Keyword: user ID, table capacity

For most of the database design and business operation are basically related to the user ID, so using the user ID is the most common branch routing strategy. The user ID can be used as an important field throughout the system. Therefore, by using the user ID, we can not only facilitate our query, but also distribute the data evenly among different databases. (Of course, you can also divide tables by category, etc., and there are many ways to divide routing policies.)

Then assume that the order table ORDER holds the user’s order data, and the SQL script is as follows (for demonstration only, some details are omitted) :

CREATE TABLE `order` (
  `order_id` bigint(32) primary key auto_increment,
  `user_id` bigint(32),
   ...
) 
Copy the code

When the data is large and the data is divided into tables, the first step is to determine how many tables the data needs to be evenly distributed, that is, the table capacity.

Assume that there are 100 tables for storage, then when we store data, we first carry out the operation of obtaining the user ID and query the corresponding table according to user_id%100, as shown in the following diagram:

For example, if user_id = 101, then we can get the value by using the following SQL statement:

select * from order_1 where user_id= 101
Copy the code

Where order_1 is calculated based on 101%100 and represents the chapter 1 order table after the sub-table.

Note:

In the actual development, if you use MyBatis to do the persistence layer, MyBatis has provided a good support for the function of database sub-table, such as the above SQL with MyBatis implementation should be:

Interface definition:

@param tableNum specifies the number of a specific table. @param userId specifies the userIdreturn*/ public List<Order> getOrder(@param ("tableNum") int tableNum,@Param("userId") int userId);
Copy the code

XML configuration mapping file:

<select id="getOrder" resultMap="BaseResultMap">
    select * from order_${tableNum}
    where user_id = #{userId}
  </select>
Copy the code

${tableNum} = ${tableNum} = ${tableNum}

Note:

In addition, in actual development, our user ID is more likely to be generated by the UUID, so we can hash the UUID to an integer value, and then perform the modulo operation.

Three, the implementation strategy of separate library

Database partition table can solve the efficiency problem of data query when a single table has a large amount of data, but it cannot improve the efficiency of the concurrent operation of the database, because the essence of the partition table is still carried out on a database, which is easily limited by the IO performance of the database.

Therefore, how to evenly distribute the IO performance of the database, it is obvious that the database operation can solve the performance problem of a single database.

The implementation of branch library strategy is very similar to that of branch table strategy. The simplest one can be routed by taking modules.

As in the above example, the user ID is modded to obtain a specific database, and the same keyword is:

User ID and database capacity

The routing diagram is as follows:

In the figure above, the library capacity is 100.

Similarly, if the user ID is UUID, hash first and then mold.

Fourth, the implementation strategy of sub-database and sub-table

In the above configuration, the database sub-table can solve the query performance problem of massive data in a single table, and the database sub-table can solve the concurrent access pressure of a single database.

Sometimes, we need to consider these two problems at the same time, therefore, we need to separate the table operation of a single table, but also need to carry out a separate operation, in order to expand the concurrent processing capacity of the system and improve the query performance of a single table, that is, we use the separate database and table.

Compared with the preceding two types of routing policies, a common routing policy is as follows:

1. Intermediate variable & EMSP; = user_id% (number of libraries * number of tables per library); 2. Library number & EMSP; = & emsp; Round (intermediate variable/number of tables per library); 3. Table Serial number & EMSP; = & emsp; Intermediate variable % Number of tables per library;Copy the code

For example, if there are 256 databases and 1024 data tables in each database and user user_id = 262145, the following information can be obtained based on the routing policy:

1. Intermediate variable & EMSP; = 1; 2. Library number & EMSP; = & emsp; Round (1/1024) = 0; 3. Table Serial number & EMSP; = & emsp; 1% 1024 = 1;Copy the code

In this case, user_id = 262145 will be routed to table 1 of the 0th database.

The schematic diagram is as follows:

5. Summary of database and table

There are many kinds of strategies to choose, and the above is a relatively simple one according to the user ID. Other methods include using number segments for partitioning or using hash for routing. Those who are interested can find and learn by themselves.

As mentioned above, if the user ID is generated in the way of UUID, we need to perform a separate hash operation, and then perform a modular operation. In fact, hash itself is a strategy of library and table division. When using hash for routing policy, we need to know: Advantages and disadvantages of the hash routing policy are as follows: Uniform data distribution; Disadvantages: data migration is troublesome, can not be allocated according to machine performance data.

The above operations, query performance and concurrency are improved, but there are still some things to be noted, for example: things that used to cross tables become distributed things; Because the records are divided into different databases and different data tables, it is difficult to carry out multi-table associative query, and the data cannot be queried without specifying routing fields. After dividing the database and table, if we need to further expand the lineup of the system (change of routing policy), it will become very inconvenient and we need to carry out data migration again.

Six, summarized

In the above, we learned how to carry out the database read and write separation and sub-database sub-table, then, is it possible to achieve an extensible, high performance, high concurrency site that? Apparently not yet! A large website uses far more than these technologies, it can be said that these are the most basic link, because there are many specific details we have not mastered, such as: database cluster control, cluster load balancing, disaster recovery, failover, transaction management and so on. Therefore, there is still much to learn and research.

In a word:

The way ahead is so long without ending, yet high and low I’ll search with my will unbending.

The road ahead is beautiful and bright. We will embark on a new journey in 2019 without any setbacks.