How to index a string?

takeaway

Most modern login systems support login by email or mobile phone number. How to create an index on the string of email or mobile phone number to ensure the best performance?
This article will explore how to index a string for best performance in Mysql.
This article is first published in the author’s wechat public number [code ape technology column], the original is not easy, like readers pay attention to it, thank you!!
Chen will be from ** what is the prefix index **, ** prefix index and ordinary index comparison **, ** how to build beautiful best performance of the prefix index **, ** prefix index on the impact of overwriting index ** this several paragraphs.

The prefix index

As the name implies, for long column values such as’ BLOB ‘, ‘TEXT’, and ‘VARCHAR’, it is’ mandatory ‘to use the ** prefix index **, that is, the first part of the value as the index. Because index storage also needs space, the same index is too long to maintain more difficult.
For example, we add a prefix index to the mailboxes in the ‘User’ table as follows:

alter table user add index index1(email(7));
The above statement indexes the first seven characters of the email.

Prefix indexes compare with normal indexes

Let’s index the entire ’email’ and the first 7 characters to see what the difference is in performance. The index statement is as follows:

alter table user add index index1(email);

alter table user add index index2(email(7));
Suppose the ‘user’ table contains the following items: id,name,email: ` (1, “Chen”, “chenmou1993 @ XXX”) `, ` (2, “zhang”, “chenmou1994 @ XXX”) `, ` (3, “area”, “chenmou1995 @ XXX”) `, ` (4, “wang”, “chenmou1996 @ XXX”) `.
Index trees corresponding to index1 and index2 are shown as follows:

! [](https://p1-jj.byteimg.com/tos-cn-i-t2oaga2asx/gold-user-assets/2020/4/2/17139f61d1c39a6a~tplv-t2oaga2asx-image.image)

! [](https://p1-jj.byteimg.com/tos-cn-i-t2oaga2asx/gold-user-assets/2020/4/2/17139f61d21873ae~tplv-t2oaga2asx-image.image)

If the following query statement is executed, how does Mysql use the index to query?

select * from user where email=”chenmou1995@xxx”;

[1] The execution process of common indexes

Select ‘chenmou1995@xxx’ from index1 index tree and obtain ‘id=2’;
Select * from primary key where primary key = ‘id=2’ and add email to result set.
Select the next record in the index tree of ‘index1′ and find that the condition ’email=chenmou1995@xxx’ is no longer met. The loop ends.

In this process, the primary key index only needs to fetch data once, so the system considers that only one row is scanned.

[2] The execution process of prefix index

Select * from index tree where id=1; select * from index tree where id=1;
If the primary key value is id=1, the value of email is not ‘chenmou1995@xxx’. This row is discarded.
Select * from ‘chenmou’; select * from ‘chenmou’; select * from ‘chenmou’; select * from ‘chenmou’;
Repeat the previous step until the value on IDxe2 is not ‘CHENmou’, the loop ends.

In this process, the primary key index is fetched four times, or four rows are scanned.

From the comparison of the above queries, it is easy to see that the use of prefix indexes in ** may cause the query to read more data. **
But for this query, what if the prefix index is created with a length of 13? Then there is only one record satisfying ‘Chenmou1995’, which can directly locate to ‘id=2’. At this time, not only the space is reduced, but also the number of scanned lines is reduced.
The conclusion is that ** using prefix indexes, as long as the length is defined, can save space without adding too much extra query cost. **
So how to build the correct prefix index to achieve the best performance? Go on to…………….

How to build the best performance prefix index

From the above comparison, it can be concluded that the higher the differentiation of the prefix index, the better, which means the fewer duplicate keys **.
So how to calculate the degree of differentiation is actually very simple, just need to determine the number of repeats in the database. SQL is as follows:

Select count(distinct left(email,4)) as L4, count(distinct left(email,5)) as L5, count(distinct left(email,6)) as L6, Count (distinct left(email,7)) as L7, from user;
However, if prefix differentiation is not good, for example, the id number of our country, a total of 18 digits, among which the first 6 digits are address code, so the first 6 digits of the ID number of people in the same county will generally be the same. At this point, if the id number prefix index of length 6, the index is very low differentiation.
Following the previous approach, you may need to create a prefix index of length 12 or more to satisfy the distinction requirement.
However, the longer the index is selected, the more disk space it takes up, the fewer index values can fit on the same data page, and the less efficient the search will be.
So, if we can determine the business needs only in accordance with the id card equivalent query needs, there are no other processing methods? This method can not only occupy less space, but also achieve the same query efficiency. Now briefly introduce a way to solve this problem, of course, there must be more than one way, as follows:

[1] Reverse order storage

If you store your ID number upside down, each time you query it, you can write:

Select field_list from t where id_card = reverse(' id ');Copy the code

Since the last six digits of an ID number don’t repeat the logic of an address code, those last six digits probably provide enough differentiation. In practice, of course, don’t forget to use the count(DISTINCT) method to verify.

The effect of prefix indexes on overwrite indexes

The prefix index invalidates the overwrite index. The query statement is as follows:

select id,name from user where email=”chenmou1995@xxx”;
Because the prefix index is used, it must be returned to the table to verify that the query is correct, and the use of overridden indexes is also invalid.
That is, using prefixed indexes does not optimize query performance with overridden indexes, which is a factor you should consider when deciding whether to use prefixed indexes.

conclusion

How to index strings is a problem that needs to be considered. Chen gives the following suggestions:

If the string length is very short, it is recommended to use all as the index.
Use prefix indexes to analyze the distinction. The higher the distinction, the better.
Using prefix indexes requires consideration of overwrite index invalidation.

takeaway

The prefix index

Prefix indexes compare with normal indexes

How to build the best performance prefix index

The effect of prefix indexes on overwrite indexes

conclusion

Related Posts

Springboot-Mybatis

Vue3 + TypeScript + Gin to achieve background permission management platform

Mysql also picks the wrong index