1. Data insertion optimization

1.1 Data preparation

Create a test table builder with FULLTEXT index:

1.2 Optimization of important parameters

1.2.1 the buffer pool

Before inserting data, let’s first look at the current database environment:

show global variables like ‘innodb_buffer_pool_size’; Total buffer pool size

show global variables like ‘innodb_buffer_pool_instances’; # number of buffer pools

Visible test environment music library buffer pool 128M, there are eight buffer pools.

Buffer pool used to make up the gap between CPU and disk IO, at the time of writing data, first change of buffer pool pages (pages are the InnoDB storage engine minimum management unit), and then flushed to disk, at a certain frequency and multiple buffer pools can reduce database internal competition for resources, increase concurrent processing capability of the database, Therefore, the total buffer pool size and the number of buffer pools are important indicators to determine database processing performance. It is possible to increase the buffer pool size to increase the insertion speed, but since this is a public test environment, it will not be changed here.

1.2.2 innodb_flush_log_at_trx_commit and sync_binlog

Innodb_flush_log_at_trx_commit: Flush-to-disk policy for redo logs. The default value is 1, that is, every transaction is flushed to disk. This is the most secure synchronization policy, but when a large amount of data is inserted, the insertion speed is greatly affected because of frequent DISK I/O operations.

Sync_binlog: Policy for flushing binary logs to disk. The default value is 0, that is, binary logs are not enabled, but in many master/slave database architectures, this parameter is greater than 0. Turning this parameter on also greatly affects the insertion speed.

1.3 Inserting Data

Write a stored procedure for bulk inserts:

Then call the stored procedure:

2. Full-text index

When querying information about 100008, use like:

The reason is that a full table scan is used for the first 10 items queried by like. When the first 10 items are searched, the data is returned. Looking back at the stored procedure in 1.3, we can see that the hit ratio of 100008 is about 33%, which means that when 30 entries are scanned in the full table, enough data will be returned.

However, when querying items 1000000 to 1000010, the speed is very slow because about 3 million pieces of data need to be scanned to get the result set:

To test this idea, I insert two pieces of data into the t_SONG_info_test_test table

At this point, 200008 appears only twice in total in 10 million +2 entries. At this point, execute the query statement:

3. Pay attention to the point

3.1 The default minimum length for full-text retrieval is 4

Fields less than 4 in length are not recorded in the full-text search and the results are not found

3.2 Participles are not intelligent

Full-text search is based on word segmentation based on Spaces or obvious delimiter fields such as “, “. Is not support Chinese word segmentation, like elasticsearch but since MySQL 5.7.6, built-in InnoDB support custom length of word segmentation of Chinese word segmentation, can refer to [www.jianshu.com/p/c48106149]… MySQL 5.7 Chinese full text search tutorial.