This is the 16th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

MySQL > create a prefix index for a string field. If you want to add a prefix index to a string field, you can add a prefix index to a string field.

It is not difficult to conclude that as long as the joint index is highly differentiated, there will be fewer repetitions, which will reduce the number of times we go back to the table. So we can determine how long prefixes to use by calculating the selectivity of each length.

So let’s first figure out the selectivity of the complete column.

Select count(distinct email)/count(*) from t_User;Copy the code

Assume that the selectivity of the complete column we calculated is 1, so the closer the prefix we selected is to 1, the higher the differentiation of the prefix index is.

select count(distinct left(email,2)/count(*) as s2),
       count(distinct left(email,3)/count(*) as s3),
       count(distinct left(email,4)/count(*) as s4),
       count(distinct left(email,5)/count(*) as s5),
Copy the code

Assuming s2 = 0.8, S3 = 1, S4 = 1, s5 = 1, we can choose prefix length 3 according to the principle that the selectivity is closer to the complete column selectivity and the character is shorter and occupies less space.

The effect of prefix indexes on overwrite indexes

We learned from the previous chapter that the use of a prefix index may increase the number of scans, but we will find that the effect is more than that.

Let’s look at SQL:

select id,email from t_User where email='[email protected]';
Copy the code

If we use index1, the full string index, because the email index tree leaves already have ids, so we use overwrite index, unordered back to the table query. But if we use a prefix index, we need to go back to the table to determine whether the value of email is the same as the value of the email after where.

Even if we change the length of index2 to 18, InnoDB still has to go back to the ID index, because the engine is not sure if the provided prefix index truncates the full information, so we lose the advantage that overwriting indexes give us.

So what else can we do to index a string field?

  • Using hash fields

We can create a second integer field on the table to hold the mailbox verification code and create an index on this field.

alter table t_User add emial_crc int unsigned, add index(emial_crc);
Copy the code

Then each time a new record is inserted, the crc32() function is used to get the checksum to fill the new field. The where part of the query determines whether the values of the segment emails are exactly the same because the checksum code may conflict, which means that two different emails may get the same result from crc32().

select id,email,name from t_User where email_crc=crc32('[email protected]') and email = '[email protected]';

Copy the code

This makes the length of the index 4 bytes.

summary

  1. Create the full index directly, which may take up more space.

  2. Create prefix indexes, which save space, but increase the number of query scans, and cannot use overwrite indexes.

  3. Creating hash field indexes provides stable query performance and requires extra storage and calculation costs. However, range query is not supported.