With the development of the Internet and the continuous expansion of user scale, the performance problems of Mysql single library and single table will also be exposed.
The following factors directly affect database performance:
The amount of data
Although MySQL can store 1 billion levels of data in a single table, if the database hardware is not very good, or the design of the data table structure is not optimized in place, when the amount of data in a single table reaches tens of billions, the performance will be compromised.
Disk I/o
Disk IO can often be a bottleneck, especially for systems running databases. Data is read from disk to memory, to CPU caches and registers, then processed, and finally written back to disk.
Therefore, under concurrent pressure, all requests accessing the same node can have a significant impact on disk IO.
Database connection
Database connection is a very rare resource. If there is data related to users, goods and orders in a database, database connection may become a bottleneck when a large number of users operate at the same time.
In order to improve the processing performance of online services, it is necessary to consider the Mysql database and table.
Common schemes:
The vertical table
That is, based on column field split “large table split multiple small table”. Split a large number of infrequently used, long data length fields from the same table into extended tables.
Extended table can have many forms, single table form, wide table form, KV table form, and so on.
Vertical depots
Too many tables with different business dimensions in the same library. For example, a library contains users, items, orders, and expense tables all at once. When an e-commerce website provides external services, users’ behaviors may operate these tables at the same time, and the disk space, disk IO and memory of a single library will have a great impact.
As shown above, split it across multiple servers to free up resources on a single server.
Horizontal sub-database sub-table
The data of a single table is shard to multiple servers, and each server has a corresponding library and table, but the data set in the table is different.
In this way, the performance bottleneck and pressure of single machine and single library can be effectively alleviated, and bottlenecks such as I/O, connection number, and hardware resources can be broken.
There are several common horizontal splitting strategies:
Time slicing
Sharding by month, sharding by quarter and so on, you can do hot and cold data. Paging and merge sort can be solved by using UNION ALL, of course, provided that the amount of data in the result set of the child table is not too large.
Hash modulus
Suppose there is an order table order, divide it into 4 tables, order0, order1, order2, order3, routing rule = orderId % 4; When orderId=3, the corresponding is order3.
The scope of fragmentation
From 1-10000 a table, 10001-20000 a table.
Geographic fragmentation
One watch in Beijing, one watch in Shanghai and one watch in Nanjing.
The problems introduced after the sub – library sub – table
Distributed transaction problem
If we do a vertical branch or horizontal branch, it will inevitably involve the problem of cross-library SQL execution, which causes the Internet community’s old difficult problem -” distributed transaction “. So how do you solve this problem?
- Use distributed transaction middleware
- Use MySQL’s built-in Transaction consistency scheme for cross-libraries (XA).
- Can you avoid cross-library operations (such as putting users and goods in the same library)?
Problems with cross-library joins
After the database is divided into tables, the associated operations between tables will be limited. We cannot join tables in different sub-databases or with different granularity of sub-tables. As a result, the services that can be completed by one query may need to be completed by multiple queries.
Rough workaround: Global tables: base data, copies for all libraries. Field redundancy: So that some fields do not need to be queried by join. System layer assembly: query all separately, and then assemble, more complex.
The problem of combining and sorting result sets
Because we store the data in different libraries and tables, when we query the specified data list, the data comes from different sub-libraries or sub-tables, which will inevitably lead to the problem of result set combination and sorting.
If every query requires sorting, merging, and so on, performance will suffer significantly.
How to divide, can solve all the above problems?
Distributed ID, horizontal sub-library, vertical sub-table
Take the scenario of a customer order library as an example:
Distributed order ID
Use the snowflake algorithm to generate a 64-bit distributed ID as the unique ID of the order. As shown below:
Such an order ID can be directly located in the local database.
Go language version: github.com/HaroldHoo/i…
User ID module, horizontal branch library
Before placing an order, the user ID is modeled as a part of the above distributed order ID, and the order is created after the distributed order ID is generated.
In this way, the order produced by the same user will always be in the same database, can meet the same user in the order dimension of the result set combination, sorting, paging.
Divide the table vertically according to different business dimensions
In a similar fashion to the previous step, tables for other business dimensions, such as expense details, user tables, and rating tables, can be placed in the same library.
Through the above steps, the problems of cross – library transaction and cross – library join will also be solved.
Welcome to pay attention to the public account “Architecture Watch”, love technology, like reading, keen on sharing and summarizing, I hope to share every good article with you on the way of growth. RSVP 666, you have the resources you want.