One trick to play a billion level Mysql storage

Mysql is one of the most commonly used databases in the industry, especially for small and medium-sized enterprises, Mysql is definitely the first choice.

Why use Mysql, other distributed databases don’t smell good?

Isn’t there a better database than Mysql?

Of course not, from the paper data of each database, there are countless databases better than mysql. Why does Mysql have such a strong market share?

I think there are several reasons

Free, open source
Ecological perfect
Long term, almost everyone does
Balanced performance (unlike many databases that either read, write, or do not support complex queries)
The cost is low

Let me find a word to describe Mysql. I think it’s neutral. A single Mysql database can meet the read and write needs of most enterprises.

Why not use a distributed database?

I think this question is like asking: Why not eat minced meat?

In the long run, distributed database is definitely the mainstream, but it is not mature enough at the present stage, and talents in distributed database are very scarce.

Large companies are starting to use distributed databases because they have their own middleware teams that are capable of solving problems.

But for the vast majority of enterprises, the distributed database is absolutely not their play, this requires professional operation and maintenance personnel and specialized RESEARCH and development team to solve the problems encountered, small and medium-sized enterprises are difficult to afford such human costs.

In the face of rapidly increasing data volume, can Mysql meet storage query requirements?

The answer is yes, it is too much to say, we only use Mysql to support 2 billion order storage query, and the performance is still relatively good, far from reaching the storage limit

I think this number is very convincing, can achieve this number of companies absolutely not many.

How does Mysql store massive data

Before we talk about this question, LET me introduce our previous data scale and storage scheme

The data in the order table is 200 million +, there are 100+ fields, 20+ index, there is no sub-database sub-table, but also to meet the business needs (see here may have students think I brag force, if I had not seen with my own eyes, I also don’t believe it!)

Why can such a large amount of data be supported by a single table?

Because crazy heap machine configuration, our machine configuration: 128 core, 256G, TB SSD.

After that, we reconstructed the order and divided the order table into nearly 20 tables. Then, we divided 16 tables by using the order number as the sub-table key, and there was only one index in all the tables.

Splitting tables is easy, but it can cause some serious problems

How to check conditional query?
How do I do pagination?

We used ES to solve the problem, but I’m not going to talk about this solution.

Our brother department uses Mysql to solve the problem. They have orders of over 2 billion yuan. Currently, they are divided into 8 libraries without separate tables.

Both of our solutions have a core idea:

Type on the blackboard, underline, remember, test to test!

Create an index for an index

We’ll recall that InnoDB actually applies this idea.

So let’s see how this idea works, right

Assume an e-commerce system, and we classify it according to the order number.

Suppose we have the following two tables

order

id order_number user_id

order_product

Id order_number produc_id

Both the order table and the order goods table are sorted according to order_number. As we can see, it is very easy to query the order details through order_number, but it is very difficult for us to do other queries.

For example, IF I want to find all the orders of a user, how do I find them? If there are 1024 tables, do we have to check all 1024 tables to know how many orders the user has? Of course, no one would do that.

We can create another index table with the sub-key user_id, so that all the information about a user falls into one table, and then we can get all the orders in one query, and we can also do paging queries through this index table.

After we get the order number, we can find all orders accurately by the order number.

You can do it by time, you can do it by product, and so on, but the idea is to index an index

In this way, we can create multiple index tables to achieve more complex queries.

Of course, this solution seems stupid, due to mysql’s natural flaws, and in the long run, mysql is doomed.

One trick to play a billion level Mysql storage

Why use Mysql, other distributed databases don’t smell good?

In the face of rapidly increasing data volume, can Mysql meet storage query requirements?

How does Mysql store massive data

Related Posts

Understand the TCP/IP model and TCP/IP protocol group – related streaming protocols and default ports

Two open source Spring Boot + Vue front-end separation projects can now be played online

A list of useful Python “tricks”, in alphabetical order