Side of the Day - iN mysql, I can store 1 billion mobile phone numbers. Consider storage space and query efficiency.

Reference since the problem: www.zhihu.com/question/43…

Firstly, the hypothesis is proposed:

Phone numbers are not updated, only inserted and deleted.
The query includes checking whether a certain mobile phone number exists and obtaining all mobile phone numbers in a certain number segment

Assume that the table has only one field, the phone number, set as the primary key. If there is no primary key and no unique index, InnoDB will automatically generate a hidden primary key column, wasting space.

MyISAM or InnoDB

MyISAM looks better than InnoDB if inserts and deletes are infrequent and phone numbers are pre-loaded dictionary tables rather than generated by active user registration. Because MyISAM does not involve transactions, updates are table-level locks. InnoDB is used for user-triggered inserts and deletes.

The field type

Consider three types, BigInt, Char, and Varchar

The default row format of these types in InnoDB engine is:

For bigINT, if the value is not NULL, it occupies 8 bytes. The first byte is the sign bit, and the rest bits store digits. The digits range from -2^63 to 2^63-1 = -9223372036854775808 to 9223372036854775807. If NULL, no storage space is occupied.
For fixed-length fields, the data can be directly stored without the length information. If the length is insufficient, the data can be supplemented. For char, add 0x20, which corresponds to a space.
There is a list of variable-length field lengths at the beginning of the data, so vARCHAR only needs to save the actual data and does not need to populate additional data. Because of this feature, variable-length field updates typically mark the old record as deleted and add a new record at the end to fill the updated record. This increases the update speed, but increases storage fragmentation.

Since the phone number is not updated, and the length of the phone number varies from country to country, and may have special characters, the range matching of the character type under the default encoding and sorting rules also meets our requirements, so to save space, the VARCHAR type is used.

The index type

Since range queries are involved, it is best to use the default B+ tree index instead of a Hash index

partition

This data volume is relatively large, need to use partition. Phone can be used as a partition key and can be partitioned by range, for example:

PARTITION BY RANGE COLUMNS( phone ) ( PARTITION p0 VALUES LESS THAN ('13100000000'), PARTITION P1 VALUES LESS THAN ('13200000000'),... PARTITION pn VALUES LESS THAN MAXVALUE );Copy the code

You can also partition by hash, for example:

PARTITION BY HASH( phone )
PARTITIONS 64;
Copy the code

This makes it quicker to check whether a mobile phone number exists, since a table is divided into many smaller tables. And if multiple small tables are involved in MySQL can also be multithreaded concurrent lookup, efficiency improved a lot. If you consider retrieving all mobile numbers in a number segment, it is best to partition by range to make the logical query scope smaller. But hash partitioning data can be more balanced than range partitioning.

Note that the best number of HASH partitions is 2 to the n. Because mod to 2^n is equivalent to taking and to 2^n – 1, it increases the efficiency of computing partitions during queries

Further optimization

To query the existence of a mobile phone number, a layer of Bloom filter can be added to the upper layer of the database to improve efficiency.

At the same time, to improve the accuracy, you can pass the number segment. Different number segments use different Bloom filters. At the same time as it is inserted into the database, it is placed into the Bloom filter. If the test does not exist in the Bloom filter, it definitely does not exist. To reduce the error probability of Bloom filter, you can use more Bloom filters and set the cross range, for example, 13000000000 13200000000 bloom filter A, 13100000000 13300000000 Bloom filter B, 13211111111 is verified by Bloom filter A and bloom filter B.

Daily brush, easy to improve skills, get a variety of offers:

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Side of the Day – iN mysql, I can store 1 billion mobile phone numbers. Consider storage space and query efficiency.

MyISAM or InnoDB

The field type

The index type

partition

Further optimization

Side of the Day – iN mysql, I can store 1 billion mobile phone numbers. Consider storage space and query efficiency.

MyISAM or InnoDB

The field type

The index type

partition

Further optimization

Related Posts

Celery(ii) : Use details

43 Spring/SpringMVC interview questions, which ones do you not know?

Zookeeper interview questions (with answers) : 36 questions each are very high frequency, you can answer how many?