In MySQL, we often use WHERE conditions for value equality comparison and range filtering. However, we also use keywords for searches, which are filtered based on how relevant the search value is to other values. MySQL full text search is used in this scenario. This article explains how to use full-text search.

Full-text indexing requires special query syntax. Full text retrieval can be performed with or without indexes, but the speed of matching can be improved when indexes are present. The index of a full-text index is stored in a special structure to find the content in the document that contains the corresponding search keyword. In our daily life, the most common full-text search is the web search engine. Although web search engines have a huge amount of data and don’t usually use relational databases, the principles are similar.

Full-text indexing supports retrieval through character-based (CHAR, VARCHAR, and TEXT columns), as well as Natural Language Mode (default) and Boolean Mode. For example, when we search for “database engine”, the content including “database”, “engine” and “database engine” will be retrieved. The implementation of full-text indexing is very limited and complex. However, because it is built into the MySQL server and can meet the requirements of many applications, it is widely used.

In versions prior to MySQL5.6, only the MyISAM storage engine supported full-text indexing. To create a full-text index, specify that the column is marked as a full-text index, as shown in the content column below.

CREATE TABLE t_news (
	id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
	content TEXT,
	author VARCHAR(32),
	title VARCHAR(128),
  FULLTEXT (content)
) ENGINE=InnoDB;
Copy the code

MySQL 5.6 does not support Chinese search very well before, so we need to do word segmentation by ourselves and then divide paragraphs into words before storing them in the database. Ngram is built in after MySQL5.7.6. Ngram supports setting the length of the participle and can split Chinese into different words by length (not very smart, but for most scenarios). You can use the MySQL global variable ngram_token_size to set the word segmentation length. The default value is 2. The value can be 1 to 10. For the example above, you need to specify a participle to build a full-text index.

CREATE TABLE t_news (
  id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
	content TEXT,
	author VARCHAR(32),
	title VARCHAR(128),
  FULLTEXT KEY idx(content) WITH PARSER ngram
) ENGINE=InnoDB;
Copy the code

Insert a data test.

INSERT INTO `t_news` 
(`id`, `content`, `author`, `title`) 
VALUES ('1'.'I have a database and an engine.'.'Island code farmer'.'Database Engine');

Copy the code

Simple fuzzy search can be done with LIKE, while full-text search requires the following statement:

SELECT * FROM t_news 
WHERE MATCH (content) AGAINST ('Data engine' IN NATURAL LANGUAGE MODE)
Copy the code

This way you can retrieve what you just inserted in a way that you can’t do with LIKE. It is also supported to use correlation sort to insert a single piece of data:

INSERT INTO `t_news`
(`id`, `content`, `author`, `title`) 
VALUES (2.'I have a database.'.'Island code farmer'.'Database')
Copy the code

Then perform the sort query:

SELECT *.MATCH (content) AGAINST ('Data engine' ) AS relevance
FROM t_news 
WHERE MATCH (content) AGAINST ('Data engine' ) 
ORDER BY relevance ASC
Copy the code

Match values are presented as a list of queries for sorting using their aliases. The higher the relevance, the greater the relevance value, and therefore can be used for sorting. So stakes are irrelevant, so relevance is 0.

Boolean mode can do more control, such as using the + sign to retain the match and using the – sign to exclude the match, the following matches the data, excluding the data containing the engine. See MySQL’s official documentation for more operators: full-text index operators.

SELECT * FROM t_news 
WHERE MATCH (content) AGAINST ('+ data * - Engine ' IN BOOLEAN MODE);
Copy the code