Writing in the front
Glaciers had undergone a complete electricity system users from zero to hundreds of millions of research and development process, in the process of business developing and changing, evolved electricity system and accurate real-time recommendation of merchandise based on big data platform, the MySQL database architecture evolution, friends can refer to “from zero to tens of millions of users, I step by step, how to optimize the MySQL database? ” About the architecture evolution of the system, you can refer to “What architecture evolution has the system undergone from the initial stage to supporting hundreds of millions of flows?” . In the process of system research and development, with the continuous growth of data volume, single database single table has been unable to meet the data storage requirements, at this time, we need to carry out separate database and separate table operation. That in the Internet factory, sub-database sub-table usually have what ideas and skills? Today, I’m going to share these ideas and techniques with you.
Note: This article has been included: github.com/sunshinelyz… And gitee.com/binghe001/t…
Depots table
With the continuous development of services, when a single database and a single table cannot bear the entire data storage, a storage solution is adopted to store the entire data to different tables in different databases on different servers. Dividing database into tables can effectively relieve the pressure of data storage. Dividing database into tables is an inevitable problem when data storage reaches a certain scale. Mastering the ideas and skills of database and table can help partners better solve the problems related to data separation in practical work.
Next, we respectively on the sub-table and sub-library to talk about some ideas and skills to use.
table
Split table, the most straightforward meaning, is to divide a table structure into multiple tables, and then, can be the same library, can also be placed in different libraries. Of course, want to know below what circumstance above all, just need cent table. I think a single table record number of millions to tens of millions of levels to use the table.
Classification of sub-tables
1. Vertical table
Artificially divide content that could otherwise be in the same table into multiple tables. (By original, I mean that relational databases should be in the same table according to the third normal form requirements.)
Table splitting technique: Separate the data according to the activity of the data (because different active data, the processing method is different)
Case study:
For a blog system, article title, author, classification, creation time and so on, is the change frequency is slow, the query times are many, and it is best to have very good real-time data, we call it cold data. The number of blog views, the number of comments, things like that, or something that changes a lot, we call it active data. Therefore, in the database structure design, we should consider the sub-table, the first is longitudinal sub-table processing.
After dividing the table vertically in this way:
(1) First of all, the use of storage engine is different. Cold data can be better queried by MyIsam. Active data, can use Innodb, can have better update speed.
(2) Secondly, carry out more slave library configuration for cold data, because more operations are queried, so as to speed up the query. For hot data, there can be relatively more horizontal sub-table processing of the main library.
In fact, for some special active data, you can also consider the use of memcache,redis and other caches, such as accumulated to a certain amount before updating the database. Or noSQL databases like mongodb, but this is just an example.
2. Horizontal table
A large table is split horizontally into different tables of the same structure, e.g., user_1,user_2, etc. The table structure is exactly the same, but the table is divided according to some specific rules, such as partition by user ID.
Table division technique: according to the size of the data volume, to ensure that the capacity of a single table is not too large, so as to ensure the processing capacity of a single table, such as query.
Case study:
Same as above example, blogging system. When the number of blogs is high, horizontal partitioning should be used to reduce the stress on each single table to improve performance. Cold data sheet such as blogs, if divided into 100 tables, when at the same time there are 1 million users when browsing, if is a single table, will be carried out in 1 million requests, and now after the table, each table is likely to be the request of the 10000 data (because of, can’t be absolute average, only assume), it reduce a lot of pressure.
Note: database replication solves the access problem, not the massive concurrent write problem. To solve this problem, consider MySQL data sharding.
Data segmentation
As the name implies, is the scattered data, to divide the data on the a host to more than one, reduce the pressure of single host load, there are two kinds of segmentation methods, one kind is depots, multiple libraries, namely according to the business module points table is different in each library, there are table, according to certain business rules or logical data is split into different host, The tables on each host are the same, which is somewhat similar to Oracle table partitions.
partition
The branch library is also called vertical partition, which is relatively simple to implement. The important thing is to refine the business. When the branch library is divided, the interaction between the business of each module should be clear, so as to avoid too many cross-library read and write operations when writing programs in the future.
Partition table is also called horizontal partition, which is more complex than vertical partition, but it can solve the problem that vertical partition can not solve, that is, the access and write of a single table is very frequent, at this time, according to certain business rules (PS: For example, the concept of membership level in Internet BBS forum can be divided into tables according to membership level. In this way, the pressure of single table can be reduced, and the problem of frequent interaction between various modules can be solved.
The advantages of separate libraries are: simple implementation, clear boundaries between libraries, easy to maintain, the disadvantage is not conducive to frequent cross-library operations, can not solve the problem of large amount of single table data.
The advantages of sub-table are: can solve the shortcomings of sub-library, but the disadvantages are precisely the advantages of sub-library, sub-table implementation is more complex, especially the division of sub-table rules, the writing of the program, and the later maintenance of the database split transplant.
The practical application
In practical application, the general line of Internet enterprises is to divide the database first and then divide the table, and use the two together to learn from each other. This gives full play to the biggest advantage of MySQL expansion, but the disadvantage is that the architecture is very large and complex, and the writing of application programs is also complicated.
There is only one access point for the program. The most common solution is to use the intermediate proxy layer to control all data sources. For example, you can use Mycat middleware, which Glacier was deeply involved in, or You can use ShardingSphere middleware, which is open source.
Ok, that’s enough for today. I’m Glacier. See you next time
Write in the last
If you think my writing is good, you can add me to wechat, and I will pull you into the technical exchange group, where there are many technical talents to share their own technology every day ~~