For long strings, you can create indexes in the following ways: (1) prefix index (2) string reversal + prefix index (3) Hash field + index to hash field (4) Field splitting (one field can be split into more than two)

Suppose there is now an email field in the table User, which is a long string. If there is no index on the email field, this statement can only do a full table scan

select f1,f2 from User where email = '[email protected]';
Copy the code

Create an index on the email field – Create an index1 index that contains the entire string

alter table User add index index1(email);
Copy the code

– The created index2 index contains only the first six bytes of the string

alter table User add index index2(email(6));
Copy the code

Prefix indexes take up less space, but increase the number of scans. Using a prefix index with a defined length can save space without adding too much to the query cost. Focus on differentiability when building an index; higher differentiability means fewer duplicate keys

We can look at this value by selecting prefixes of different lengths, such as a prefix index of 4 to 7 bytes:

select 
	count(distinct left(email,4)) as L4,
	count(distinct left(email,5)) as L5,
	count(distinct left(email,6)) as L6,
	count(distinct left(email,7)) as L7,
from User;
Copy the code

The effect of prefix indexes on overwrite indexes

Select id,email from User where email = ‘[email protected]’; This statement only requires that the ID and email fields be returned. If the entire string is used as the index, the overwrite index is triggered without the need to query back to the table ID index. If you use a prefix index, you have to go back to the table ID index to get the full value of the email field

So if you use prefixed indexes, you give up the optimization of query performance by overwriting indexes.

Use reverse order and hash to construct new fields

If we extract the prefix index of the ID card number, we will find that the number in front of the repeat rate is very high. Select field_list from t where id_card = reverse(‘ input_ID_card_string ‘); 2. Hash field Create an integer field on the table to store the verification code of the ID card and create an index on this field.

alter table t add id_card_crc int unsigned,add index(id_card_crc);
Copy the code

Then each time a new record is inserted, the crc32() function is used to get the checksum to fill the new field. The query also determines whether the ID_card is exactly the same because the checksum code may conflict, meaning that two different ID numbers may get the same result from crc32().

select field_list from t where id_card_crc = crc32('input_id_card_string') and id_card = 'input_id_card_string';
Copy the code

The similarities and differences between the two devices are as follows: Same: range query is not supported, but equivalent query is supported. 1. Extra space is occupied. Hash requires an additional field. 2. CPU consumption: The reverse function hash is called every time in reverse order. Crc32 function is called. 3. Query efficiency: The hash field query performance is relatively stable and the probability of conflict is small. It can be considered that the average number of scanned rows in each query is 1. The reverse order storage mode still uses the prefix index mode, but also increases the number of scanned lines

Conclusion: