Relational database itself is relatively easy to become the system bottleneck, single storage capacity, connection number, processing capacity are limited. When the amount of data in a single table reaches 1000W or 100G, the performance of many operations deteriorates seriously even after adding slave libraries and optimizing indexes due to the large number of query dimensions. At this point, it is necessary to consider the shard, the purpose of the shard is to reduce the burden of the database, shorten the query time.

The core content of database distribution is nothing more than data Sharding, and the positioning and integration of data after Sharding. Data sharding is to store data in multiple databases, reducing the amount of data in a single database, alleviating the performance problem of a single database by expanding the number of hosts, and thus improving the operation performance of the database.

Data segmentation can be divided into two ways according to the type of segmentation: vertical (vertical) segmentation and horizontal (horizontal) segmentation.

Vertical segmentation

Divide the data into different databases and servers by function.

When a site has just created, may be considered only dozens or hundreds of people a day visit, may be a db database, all the tables are put together, a common server might suffice, and developers are also very happy, and confident, because all the tables are in a library, so that the query can literally associated, What a beautiful thing. But with the increase of access pressure, read and write operations continue to increase, the pressure on the database is absolutely increasing, may be close to the limit, then people may think of increasing the server, do what cluster and so on, but the problem comes again, the amount of data is also growing rapidly.

Consider separating read and write operations and putting different data into different libraries based on the business. In fact, in a large and bloated database, most of the data between tables are not related, or do not need to join, so in theory, they should be placed on different servers. For example, user favorites and blog databases can be placed on two separate servers. This is called vertical partition (it doesn’t really matter what it’s called).

What to do when your blog or favorites are getting more and more data, which leads to another practice called level splitting.

The level of segmentation

The data of a table is divided into different databases, and the table structure of the two databases is the same. How to divide, should be based on certain rules, according to the data producer to do guidance, the above data is generated by people, according to the id of people to divide the database. And then according to certain rules, the first to know which database data.

In fact, many large sites have experienced the database vertical division and horizontal division of the stage. This can be determined empirically, not necessarily by hard and fast rules.

In the example of the blog, the data can be divided according to the parity of the userID. I’m going to put base NUMBERS in library A, and even numbers in library B.

This allows the userId to know which database the user’s blog data is in. You can actually do this according to userId%10. It can also be handled according to the famous Hash algorithm.

When I first looked at the architecture of the Mobile Home, I found that they are:

Horizontal segmentation: Horizontal segmentation of data.

end