MySQL full-text index

Recently, when reviewing the database index part, I saw fulltext, also known as full-text index. Although full-text index is seldom used in daily business, I feel it is somewhat interesting, so I spent some time to study it, and hereby record it.

The introduction of

concept

Most of the queries we need can be done by comparing values, filtering ranges, and so on. However, if you want to filter queries by keyword matching, you need queries based on similarity rather than precise numerical comparisons. Full-text indexes are designed for this scenario.

Why full-text indexing, you might say, when you can do fuzzy matching with like + %? Like + % is appropriate for small amounts of text, but unthinkable for large amounts of text data retrieval. Full-text indexes can be N times faster than like + % in front of a large amount of data. The speed is not an order of magnitude, but full-text indexes may have accuracy problems.

You may not have noticed full-text indexing, but you should be familiar with at least one full-text indexing technique: the various search engines. Although search engines index huge amounts of data and are often not backed by relational databases, the basic principles of full-text indexing are the same.

Versioning support

Before we begin, let’s talk about versions of full-text indexes, storage engines, and data type support

Prior to MySQL 5.6, only the MyISAM storage engine supported full-text indexing;
MySQL 5.6 and later, MyISAM and InnoDB storage engines support full-text indexing;
Full-text indexes can be created only for fields whose data types are CHAR, VARCHar, text, and their families.

When testing or using full-text indexes, check to see if your version of MySQL, storage engine, and data type support full-text indexes.

Operating full text index

The operation of the index is searched casually, here or again wordy.

create

Create a full-text index when the table is created

create table fulltext_test ( id int(11) NOT NULL AUTO_INCREMENT, content text NOT NULL, tag varchar(255), MyISAM DEFAULT CHARSET=utf8; MyISAM DEFAULT CHARSET=utf8;Copy the code

Create full-text indexes on existing tables

create fulltext index content_tag_fulltext
 on fulltext_test(content,tag);
Copy the code

Create a full-text index using the SQL statement ALTER TABLE

alter table fulltext_test
 add fulltext index content_tag_fulltext(content,tag);
Copy the code

Modify the

Modify an O, delete directly rebuild.

delete

DROP full-text indexes directly using DROP INDEX

drop index content_tag_fulltext
 on fulltext_test;
Copy the code

Delete full-text index by SQL statement ALTER TABLE

alter table fulltext_test
 drop index content_tag_fulltext;
Copy the code

Using full-text indexes

Instead of using like + % for fuzzy matching, full-text indexing has its own syntax, using the match and Against keywords, for example

select * from fulltext_test 
 where match(content,tag) against('xxx xxx');
Copy the code

Note: The columns specified in the match() function must be exactly the same as the columns specified in the full-text index, otherwise an error will be reported and the full-text index cannot be used, because the full-text index does not record which column the keyword came from. If you want to use a full-text index for a column, create a separate full-text index for that column.

Test full text index

Adding test Data

With this knowledge, you can test out full-text indexes.

First, create the test table and insert the test data

create table test (
 id int(11) unsigned not null auto_increment,
 content text not null,
 primary key(id),
 fulltext key content_index(content)
) engine=MyISAM default charset=utf8;
insert into test (content) values ('a'),('b'),('c');
insert into test (content) values ('aa'),('bb'),('cc');
insert into test (content) values ('aaa'),('bbb'),('ccc');
insert into test (content) values ('aaaa'),('bbbb'),('cccc');
Copy the code

Execute the following query using the full text index syntax

select * from test where match(content) against('a');
select * from test where match(content) against('aa');
select * from test where match(content) against('aaa');
Copy the code

According to our conventional wisdom, four records should be displayed, but there are none, only when the following query is executed

select * from test where match(content) against('aaaa');
Copy the code

That’s when I found aAAA.

Why is that? There are many reasons for this, the most common of which is the minimum search length. In addition, when using full-text indexes, the test table must have at least four records, otherwise unexpected results may occur.

MySQL full-text indexes have two variables, minimum search length and maximum search length. Words with a length less than the minimum search length and a length greater than the maximum search length are not indexed. In plain English, to use a full-text index search for a term, the length of the term must be within the range of the above two variables.

The default values for both can be viewed using the following command

show variables like '%ft%';
Copy the code

You can see the names and default values of these two variables in both MyISAM and InnoDB storage engines

// MyISAM
ft_min_word_len = 4;
ft_max_word_len = 84;
// InnoDB
innodb_ft_min_token_size = 3;
innodb_ft_max_token_size = 84;
Copy the code

As you can see, the default minimum search length is 4 for MyISAM and 3 for InnoDB. In other words, MySQL’s full-text index only indexes words with length greater than or equal to 4 or 3.

The minimum search length is specified

Full-text index parameters cannot be dynamically modified, and must be modified by modifying the MySQL configuration file. To change the minimum search length to 1, open the MySQL configuration file /etc/my.cnf and add the following content to [mysqld]

[mysqld]
innodb_ft_min_token_size = 1
ft_min_word_len = 1
Copy the code

Then restart the MySQL server and repair the full-text index. Note that after modifying the parameter, you must repair the index, otherwise the parameter will not take effect.

There are two repair methods, you can use the following command repair

repair table test quick;
Copy the code

A, AA, and AAA can be queried by deleting the index and re-creating it.

However, there is still a question, why does the keyword a, aa, aaa, aAAA not appear in the results, before talking about this question, let’s talk about the two full-text indexes.

Two full-text indexes

Full-text index of natural languages

By default, or when using the in Natural Language mode modifier, the match() function performs a natural language search on the text collection, and the examples above are natural language full-text indexes.

The natural language search engine will calculate the relevance of each document object and query. Here, relevance is based on the number of matched keywords and the number of times that keyword appears in the document. The fewer times words appear in the overall index, the higher their relevance. Conversely, very common words will not be searched, and if a word appears in more than 50% of the records, natural language searches will not search for such words. As mentioned above, there must be more than four records in the test table for this reason.

This mechanism is also easier to understand, for example, a data table storage is a piece of article, the common words in the article, modal words and so on, there must be more, it is not meaningful to search these words, need to search those articles have special meaning of the word, so as to separate the article.

Boolean full text index

In Boolean search, we can customize the relevance of a particular term being searched in the query, and when writing a Boolean search query, we can customize the search with some prefix modifiers.

Ft_boolean_syntax = ft_boolean_syntax = ft_boolean_syntax = ft_boolean_syntax

- The word must be included
- The word must not be included
To improve the relevance of the word, the results of the query are higher
< reduces the relevance of the word, and the results of the query are later
The (*) asterisk wildcard character can only be followed by a word

The problem mentioned above can be solved by using a Boolean full-text index query. Using the following command, a, AA, aaa, and aAAA are all queried.

select * test where match(content) against('a*' in boolean mode);
Copy the code

conclusion

Okay, we’re almost done. It’s time to summarize.

MySQL’s full-text index initially only supported English, because English words have Spaces between them, which is convenient to use as word separators. Asian characters, such as Chinese, Japanese, Chinese, etc., do not have Spaces, which creates certain limitations. However, starting with MySQL 5.7.6, an Ngram full-text parser was introduced to address this issue and is available for both MyISAM and InnoDB engines.

In fact, MyISAM storage engine has a number of limitations on the full text index support, such as the performance impact of table level locking, data file crash, recovery after crash, which makes MyISAM’s full text index unsuitable for many application scenarios. Therefore, in most cases, the recommendation is to use another solution, such as Sphinx, Lucene and other third-party plug-ins, or use InnoDB storage engine full text index.

A couple of points

Find out version support before using full-text indexes; Full-text index is N times faster than like + %, but there may be accuracy problems. If a large amount of data needs to be indexed, you are advised to add data before creating indexes. For Chinese, you can use MySQL versions later than 5.7.6, or third-party plug-ins.

[![attachments-2020-11-XGQCZesg5fab53525291b.jpg](https://six.club/image/show/attachments-2020-11-XGQCZesg5fab53525291b. jpg)](https://six.club/image/show/attachments-2020-11-XGQCZesg5fab53525291b.jpg)

Source: blog.csdn.net/mrzhouxiaof…

The introduction of

Operating full text index

Using full-text indexes

Test full text index

Two full-text indexes

conclusion

Related Posts

Li Kou Competition 247 (part 1) | More text challenge in August

Kubernetes component functions

Breaking the Serverless landing boundary, Ali Cloud SAE released 5 new features