preface

We all know that InnoDB’s use of “%xx” in fuzzy queries results in index invalidation, but this is sometimes the case. There are many other requirements, such as search engines requiring full-text searches based on user data, e-commerce sites requiring full-text searches based on user query criteria, You may need to look up items in detail, which is not something a B+ tree index does very well.

Numerical comparisons, range filtering, and so on can do most of the queries we need. However, if you want to filter queries by keyword matching, you need similarity-based queries rather than precise numerical comparisons, and full-text indexes are designed for this scenario.

Full-text Search (FULL-text Search) is a technique to find out any information in a whole book or article stored in a database. It can obtain the relevant chapter, section, paragraph, sentence, word and other information in the full text as required, and also can carry out various statistics and analysis.

In the early days of MySQL, InnoDB did not support full-text search technology. Since MySQL 5.6, InnoDB has supported full-text search.

Inverted index

Inverted indexes are commonly used for full-text retrieval. Inverted indexes, like B+Tree, are also an index structure. It stores a mapping between a word and its location in one or more documents in a secondary table, usually using associative arrays, and has two representations:

  • inverted file index: {word, id of the document in which the word is located}
  • full inverted index: {word, (the document id of the word, then the location in the specific document)}

May I know what is the inverted File index array in the figure above, and I can see that the word “code” exists in Document 1 and 4. Thus, it is easy to store and perform full-text query. The Documents containing the query keywords can be obtained directly according to Documents. While the Full Inverted Index stores pairs (Position), so its inverted index stores as shown in the figure below, for example, the key word “code” exists in the sixth word of Document 1 and the eighth word of document 4.

In contrast, the Full Inverted Index takes up more space, but can better locate data and extend some other search features.

The full text retrieval

Creating a full-text index

SQL > alter table create full text index;
CREATE TABLE table_name ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, author VARCHAR(200), 
title VARCHAR(200), content TEXT(500), FULLTEXT full_index_name (col_name) ) ENGINE=InnoDB;
Copy the code

Input query statement:

SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_TABLES
WHERE name LIKE 'test/%';
Copy the code

The above six index tables constitute an inverted index and are called secondary index tables. When the incoming document is tokenized, individual words are fully sorted and partitioned in six index tables, with positional information and associated DOC_ID, according to the character set sorting weight of the first character of the word.

SQL > alter table create full text index;
CREATE FULLTEXT INDEX full_index_name ON table_name(col_name);
Copy the code

Using full-text indexes

MySQL database supports full-text query, full-text index can only be used on InnoDB or MyISAM tables, and can only be used to create char, VARCHAR,text columns.

The syntax is as follows:

MATCH(col1,col2,...) AGAINST(expr[search_modifier])
search_modifier:
{
    IN NATURAL LANGUAGE MODE
    | IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
    | IN BOOLEAN MODE
    | WITH QUERY EXPANSION
}
Copy the code

Full-text searches are performed using the MATCH() AGAINST() syntax, where MATCH() takes a comma-separated list naming the columns to be searched. AGAINST() receives a string to search for and an optional modifier for the type of search to perform. Full-text search is divided into three types: natural language search, Boolean search, query extension search, the following will introduce the various query modes.

Natural Language

Natural Language search interprets the search string as a phrase in Natural human Language, and MATCH() defaults to the Natural Language pattern, which means to query documents with specified keywords.

Next, I will use demo to better understand Natural Language

SELECT
    count(*) AS count 
FROM
    `fts_articles` 
WHERE
    MATCH ( title, body ) AGAINST ( 'MySQL' );
Copy the code

The above statement, query the title, body ‘MySQL’ keywords contained in the column number. The above statement could also be written like this:

SELECT
    count(IF(MATCH ( title, body ) 
    against ( 'MySQL' ), 1, NULL )) AS count 
FROM
    `fts_articles`;
Copy the code

Although both statements give the same result, internally the second SQL is faster because the first SQL (where indexed query) also requires correlation sorting statistics, which the second does not.

Correlations can also be queried using SQL statements:

SELECT
    *,
    MATCH ( title, body ) against ( 'MySQL' ) AS Relevance 
FROM
    fts_articles;
Copy the code

Correlation is calculated based on the following four conditions:

  • Whether word appears in the document
  • The number of occurrences of Word in a document
  • The number of words in the index column
  • How many documents contain this word

For full text retrieval with InnoDB storage engine, the following factors need to be considered:

  • The query word is in the StopWord column, and the query for that string is ignored
  • Is the character length of the query word in the range [innodb_ft_min_token_size,innodb_ft_max_token_size]

If the word is in stopword, the word is not queried, such as the word ‘for’ is queried, the result is as follows:

SELECT
    *,
    MATCH ( title, body ) against ( 'for' ) AS Relevance 
FROM
    fts_articles;
Copy the code

As you can see, ‘for’ appears in document 2,4, but because it is stopword, its correlation is 0

The innodb_ft_MIN_token_size and innodb_ft_max_token_size parameters control the length of the InnoDB engine query character, If the length is less than innodb_fT_MIN_token_size or greater than innodb_ft_max_token_size, the search for this word is ignored. In the InnoDB engine, the default value of the parameter innodb_ft_min_token_size is 3 and the default value of innodb_ft_max_token_size is 84

Boolean

Boolean searches use the rules of a special query language to interpret the search string, which contains the word to be searched for, and can also contain operators specifying requirements, such as the fact that a word must or must not be present in the matching line, or that its weight should be higher or lower than usual.

For example, the following statement asks for documents that have the string “Pease” but no “hot”, where + and – indicate, respectively, that the word must or must not exist.

select * from fts_test where MATCH(content) AGAINST('+Pease -hot' IN BOOLEAN MODE);
Copy the code

Boolean Fulltext search supports the following types:

  • +: indicates that the word must exist
  • -: Indicates that the word must not exist
  • (no operator)Indicates that the word is optional, but more relevant if present
  • @distanceThe unit of distance is byte. Such full-text retrieval query is also known as Proximity Search, for exampleMATCH(context) AGAINST('"Pease hot"@30' IN BOOLEAN MODE)Statement indicates that the string Pease must be within 30 bytes of hot
  • >: increases the correlation when the word appears
  • <: decreases the relevance when the word appears
  • ~: indicates that the word is allowed to occur, but the correlation is negative
  • *: indicates a word that starts with the word, such as lik*, which can be lik, like, or likes
  • "2.

Here are some demos to see how Boolean Mode is used.

Not: + –

SELECT
    * 
FROM
    `fts_articles` 
WHERE
    MATCH ( title, body ) AGAINST ( '+MySQL -YourSQL' IN BOOLEAN MODE );
Copy the code

MySQL > query for information containing ‘MySQL’ but not ‘YourSQL

Demo2: no operator

SELECT
    * 
FROM
    `fts_articles` 
WHERE
    MATCH ( title, body ) AGAINST ( 'MySQL IBM' IN BOOLEAN MODE );
Copy the code

SQL > query ‘MySQL IBM’ without ‘+’, ‘-‘, meaning word is optional, if present, its relevance will be higher

Demo3: @

SELECT
    * 
FROM
    `fts_articles` 
WHERE
    MATCH ( title, body ) AGAINST ( '"DB2 IBM"@3' IN BOOLEAN MODE );
Copy the code

The above statement represents that the distance between the words “DB2” and “IBM” is within 3 bytes

Demo4: > <

SELECT
    * 
FROM
    `fts_articles` 
WHERE
    MATCH ( title, body ) AGAINST ( '+MySQL +(>database <DBMS)' IN BOOLEAN MODE );
Copy the code

Query for rows containing ‘MySQL’, ‘database’, and ‘DBMS’, but rows not containing ‘DBMS’ are more relevant than rows containing ‘DBMS’.

demo5: ~

SELECT
    * 
FROM
    `fts_articles` 
WHERE
    MATCH ( title, body ) AGAINST ( 'MySQL ~database' IN BOOLEAN MODE );
Copy the code

Query a row containing ‘MySQL’, but if the row also contains ‘database’, the correlation is reduced.

Demo6: *

SELECT
    * 
FROM
    `fts_articles` 
WHERE
    MATCH ( title, body ) AGAINST ( 'My*' IN BOOLEAN MODE );
Copy the code

Query for rows whose keyword contains ‘My’.

Demo7:”

SELECT
    * 
FROM
    `fts_articles` 
WHERE
    MATCH ( title, body ) AGAINST ( '"MySQL Security"' IN BOOLEAN MODE );
Copy the code

Query for rows containing the exact phrase ‘MySQL Security’.

Query Expansion

A query extension search is a modification of a natural-language search that is usually performed when the query term is too short and the user needs implied knowledge. For example, with the word database, the user might want to query more than just the document containing database, It may also refer to words that include MySQL, Oracle, RDBMS, You can then use the Query Expansion MODE to open the implied knowledge of full-text retrieval by adding WITH Query Expansion/IN NATURAL LANGUAGE MODE WITH to the Query statement QUERY EXPANSION enables Blind QUERY EXPANSION (also known as automatic Relevance Feedback), which is divided into two phases.

  • Stage 1: Full-text index query based on searched words
  • The second stage: according to the segmentation generated in the first stage, a full-text retrieval query is conducted again

Let’s take a look at an example of how Query Expansion is used.

Create FULLTEXT INDEX title_body_index on fts_articles(title,body);Copy the code
SELECT * FROM 'fts_articles' WHERE MATCH(title,body) AGAINST('database'); SELECT * FROM' fts_articles' WHERE MATCH(title,body) AGAINST('database');Copy the code

Query results before using Query Expansion are as follows:

SELECT * FROM 'fts_articles' WHERE MATCH(title,body) AGAINST('database' WITH Query expansion);Copy the code

Query results with Query Expansion are as follows:

Because full text retrieval with Query Expansion can lead to many irrelevant queries, users may need to be very cautious when using it.

Delete full text Index

Mysql > delete full text index
DROP INDEX full_idx_name ON db_name.table_name;
Copy the code
Alter table alter table alter table alter table alter table alter table alter table
ALTER TABLE db_name.table_name DROP INDEX full_idx_name;
Copy the code

Source: juejin. Cn/post / 6989871497040887845


Recommend 3 original Springboot +Vue projects, with complete video explanation and documentation and source code:

Build a complete project from Springboot+ ElasticSearch + Canal

  • Video tutorial: www.bilibili.com/video/BV1Jq…
  • A complete development documents: www.zhuawaba.com/post/124
  • Online demos: www.zhuawaba.com/dailyhub

【VueAdmin】 hand to hand teach you to develop SpringBoot+Jwt+Vue back-end separation management system

  • Full 800 – minute video tutorial: www.bilibili.com/video/BV1af…
  • Complete development document front end: www.zhuawaba.com/post/18
  • Full development documentation backend: www.zhuawaba.com/post/19

【VueBlog】 Based on SpringBoot+Vue development of the front and back end separation blog project complete teaching

  • Full 200 – minute video tutorial: www.bilibili.com/video/BV1af…
  • Full development documentation: www.zhuawaba.com/post/17

If you have any questions, please come to my official account [Java Q&A Society] and ask me