Understanding indexes: Index optimization

Recently, there is a need to modify the existing storage structure, which involves the consideration of query conditions and query efficiency. I have read several articles related to index and HBase, recalled the relevant knowledge, and combined with the project requirements, talked about my understanding and summary.

If you missed it, you can review the first three articles:

Index structure and data location process
Query procedure and advanced query
The implementation plan is described in detail

The explain command, which was detailed in the last article, is used to pinpoint where a performance problem has occurred, and the next step is to fix it by tuning the index.

Part of the content is an excerpt from a few bloggers’ posts, with links at the end to thank them for their excellent analysis.

Common optimization methods

The left-most prefix principle of the union index

A composite index complies with the left-most prefix rule. The index is used only when the preceding fields of the composite index are used in the query conditions. If the search starts from the left-most column of the index, the index cannot be used.

As in (a, b, c) established on the three fields of joint index, so it can speed up a | | (a, b) (a, b, c) of three groups of query speed, not speed up | b (b, a) the query sequence.

In addition, when creating a federated index, the most distinguished field is on the left.

Do not use functions and operations on columns

Do not use functions on columns; this will invalidate the index and result in a full table scan.

For example, the following SQL statement:

select * from artile where YEAR(create_time) <= '2018'; 
Copy the code

Even if the index is established on date, the full table will be scanned, and the calculation can be placed in the business layer, which can not only save the DATABASE CPU, but also optimize the query cache.

The index cannot be used for a negative query

Negative conditions are:! =, <>, not in, not exists, not like, etc.

select * from artile where status! =1 and status! =2;
Copy the code

You can use in to optimize:

select * from artile where status in (0.3)
Copy the code

Using an override index

Overwritten indexes are columns that are being queried. Data can be retrieved from the indexes, rather than from row locators to the data table, which greatly improves performance.

You can define an extra column for the index to contain, even if the column is not useful to the index.

Avoid casting

Casts occur when the left and right types of the query conditions do not match. Casts may invalidate indexes and result in full table scans.

If the phone field is of type VARCHar, the following SQL cannot match the index:

select * from user where phone=12345678901;
Copy the code

It can be optimized as:

select * from user where phone='12345678901';
Copy the code

Range columns can be indexed

The range conditions are <, <=, >, >=, and BETWEEN.

An index can be used for a range column at most. If there are two range columns in a query condition, the index cannot be used for all of them.

Do not create indexes for fields that are frequently updated and have low data differentiation

Update will change the B+ tree, and the establishment of indexes for frequently updated fields will greatly reduce the database performance.

For attributes such as gender, the index is meaningless and cannot filter data effectively. The performance is similar to that of a full table scan.

Distinctness can be calculated using count(distinct(column name))/count(*), and indexes can be created at 80% or more.

Index columns are not allowed to be null

Single-column indexes do not hold null values, and compound indexes do not hold all null values. If null columns are allowed, you may get an unexpected result set.

Avoid using or to join conditions

The use of or to join conditions in the WHERE clause should be avoided as much as possible because this can invalidate the index and result in a full table scan. Although the new version of MySQL can hit the index, query optimization takes more CPU than in.

Fuzzy query

Leading fuzzy queries cannot use indexes, non-leading queries can.

To optimize the case

Use deferred correlation or subqueries to optimize hyperpaging scenarios

Instead of skipping the offset rows, MySQL takes the offset+N rows and returns the previous offset rows, returning N rows.

When offset is particularly large, it is very inefficient to either control the total number of pages returned or to SQL rewrite the number of pages that exceed a certain threshold.

You can quickly locate the id segment you want to obtain, and then associate it:

Selecta.* from table1 a,(select id from table1 where id= 1000000,10) b WHERE a.id= B.idCopy the code

If it is clear that only one result will be returned, limit 1 can improve efficiency

You know there’s only one result, but the database doesn’t know it, so you tell it to stop the cursor actively.

How to Create an index

where a=1 and b=1  
where b=1  
where b=1 order by time desc
Copy the code

Idx_ab (a,b) and IDx_B_time (b,time)

The MySQL query optimizer automatically adjusts the order of conditions in the WHERE clause to use the appropriate index. For the first SQL, idX_ba (b,a) can also be used.

Multiple value matching and range matching

If there is a federated index (empno, title, fromDate), can the following SQL use indexes, and if so, how many?

select * from employee.titles  
where emp_no between '10001' and'10010'  
and title='Software Engineer'   
and from_date between '2008-01-01'and '2018-01-01'  
Copy the code

This SQL SQL appears to be using two range queries, but between on EMPNo is actually equivalent to in, which means that EMPNO is actually a multi-valued exact match.

Be careful to distinguish between multi-value matches and range matches in MySQL, otherwise you will get confused about MySQL’s behavior.

Leftmost matching principle for a federated index

If you create a joint index (a,b,c), can the following statements use indexes? If so, which columns are used?

Where a= 3 and b = 5 where a= 3 and c = 4 and b = 5 Where a= 3 and b > 10 and c = 7 and a= 3 and b > 10 and c = 7 Where a = 3 and b like 'xx%' andc = 7 where a = 3 and b like 'xx%' andc = 7Copy the code

Create an index based on the distinction

Select * from product where status=1;

SELECT user_id,title,content FROM `comment`
WHERE status=1 AND product_id=1
LIMIT 0.5 ;
Copy the code

You can create a joint index with status and product_id, but you need to calculate the degree of distinction:

SELECT COUNT(DISTINCT status) /COUNT(*) AS audit_rate,
COUNT(DISTINCT product_id)/COUNT(*) AS product_rate
FROM comment;
Copy the code

Generally, product has a higher degree of differentiation, so you can create the following index:

CREATE INDEX idx_productID_Status ON comment(product_id,status)
Copy the code

Sort field index

View the latest 20 login records of a user, sorted by time:

select *  from login_history where uid = $uid order by create_time desc limit 20;
Copy the code

The UID + Timeline composite index was established, and sorting was introduced into the index structure. The database load plummeted.

Reference article:

MySQL index and optimization combat
Considerations for index use

Please scan the qr code below and follow my wechat official account for more articles ~