In the previous article, we talked about the master-slave cluster of database clusters, namely read-write separation, and also mentioned that read-write separation actually shares the load of access, but does not solve the load of storage.
Storage pressure simply means that with the evolution of the system, the increase of demand, the number of tables may gradually increase, for example, a period of time the last new function has to add a table. In addition, as the number of users increases, the number of rows in the user table will definitely increase, and the data in the order table will definitely increase over time. When the amount of data reaches tens of millions or even hundreds of millions, read/write separation cannot be satisfied, and the read/write performance deteriorates seriously.
That is, the resources of a server such as CPU, memory, IO, disk, etc., are limited, so this time the sub-library sub-table is on!
depots
To be clear, let’s say you have a database server, and there are two tables in the database: the user table and the order table. Now, if you want to separate your databases, you need to buy two machines, put two databases on each machine, and put one database for users and one database for orders
This split the storage load between the two servers, but it creates new problems, so as things get more complex, there are new problems.
1, join table query problem is join, before in a database can use JOIN with a SQL statement can be linked table query to get the desired results, but now divided into multiple databases, so join is not used. For example, if you want to check the order information of users registered after 2019, you need to go to the user table in database A to query the information registered after 2019, and then get the user ID, and then take the ID to the order table in database B to find the order information, and then join the information back. So I’m going to have to write a little bit more code.
2, transaction problems do database basically cannot do without transaction, but now different database transaction is not the previous simple local transaction, but distributed transaction, and the introduction of distributed transaction also improves the complexity of the system, and some efficiency is not high will affect the performance such as Mysql XA. There are also distributed transactions based on message-oriented middleware and so on that will not be covered here.
table
We have already made a separate database, but now the situation is that there are too many data in our table, so your company’s products become popular accidentally. For example, if all users have only one table, they cannot bear it, so this is the scoring table. They are divided into vertical sub-table and horizontal sub-table.
The vertical table
Vertical partition means that the image point is like the y axis of the coordinate axis, the x axis is cut in half, corresponding to our table is let’s say we have 10 columns in our table, and now we cut it down, and we divide it into two tables, one table with 3 columns, the other table with 7 columns.
The vertical partition is suitable for splitting tables that are not used and take up a lot of space.
Take the headlines of the user information, such as the user table only user ID, nickname, mobile phone number, personal profile of these four fields. But the mobile phone number and personal profile this kind of information belongs to not very commonly used, occupy the space is not small, personal profile some people wrote a tuo. So I split out the cell phone number and my profile.
That vertical partition table impact is before as long as one query, now need two queries to get the complete user table information before the partition table.
The level of table
The idea of a horizontal scale is that points are like the X-axis of the coordinate axis, with the Y-axis cut in half (not just one cut, but several). Let’s take a user table and say we have 50 million rows in the user table, and we cut five times, and we split five tables, 10 million rows in each table.
Horizontal table is suitable for the user table many rows, the general single table row number more than 50 million score table, if the single table data is more complex that may be 20 million or even 10 million score, this depends on the actual situation, some tables are very simple May 100 million rows do not have to score. So pay attention when a table has tens of millions of rows. If there are no performance problems, you can wait. Don’t rush to split tables, because split tables can cause a lot of problems.
The horizontal table problem is more annoying than the vertical table problem.
You have to think about how to cut, and the advanced point is called routing, right
1, according to the ID is the range of routing, for example, the ID value of 1~ 9.999 million put a table, 10 ~1999 put a table, and so on. This has to try, because the scope of the big, the possible performance and problems, the scope of the small. That watch has to be dead.
The advantage of this method is that it is easy to cut ah, simple and crude, the new data table will not affect the previous data, the previous data do not need to move.
For example, if id is hashed, 1500 is mod 8 = 4, so this entry is placed in user_4, 2011 is mod 8 = 3, so this entry is placed in user_3. The advantage of this kind of cent method is cent is very even, basically the data of every table is about the same, but later new data and score table how to do, the data before have to move, more vexed!
3, create a routing table to store the routing relationship, or take the user table for example, is to create a routing table, store the userId and table number, indicating that the userId is the user table. This way is also easy, and then you have to separate tables and then change the routing table, migrate some of the data. However, this method results in two queries per query, and if the routing table is too large, the routing table becomes a bottleneck again!
Let’s talk about the query problem.
For example, if you want to look up the first 100 users, you have to order by the registration time for each table in the horizontal score and take 100, and then compare the 100 results for each table to get the final result. First of all, the operation becomes troublesome. The things that used to be done by an order BY become complicated now, and another factor to consider is the problem of time. If you split 20 tables, you have to execute 20 orders by, and if it is executed in serial, the time cost is also a problem!
The realization of sub – library sub – table
Specific implementation is also divided into program code packaging, database middleware packaging. This is more difficult to implement than read-write separation, but the comparison between the two types of encapsulation was discussed in the section on read-write separation and will not be repeated here.
conclusion
Yes, it will introduce a lot of problems. Therefore, the architecture design should follow the principle of evolution. Nothing can be achieved overnight, and different architectures can be adapted to different scenarios.
In software, simple enough is good, technology is not noble or cheap, it is not superior to use distributed, the more complex system maintenance cost and difficulty is higher, the greater the probability of problems. The evolution of this architecture is often user-driven, almost by necessity.
Basically, the standalone database can support 100,000 users. Therefore, in general, the hardware will be upgraded, the database configuration will be optimized, the code will be optimized, and the Redis will be introduced. Only do these more complicated things if you really can’t.
If there are mistakes welcome to correct! Personal public account: Yes training level guide