Want to really master the index bottom is not groundless wind, the principle looked and looked, the results can not be used, can not solve the problem, are a lot of people’s heart. It’s simply that you’re not really mastering it and using it, not thinking about application scenarios, not thinking about spatial structure.

No matter how much you memorize, you still can’t solve a problem. Therefore, this principle is only the first step of your progress, skilled use of the work, in order to really Get it. Don’t just be a paper tiger throwing his weight around. How do you do that? Let’s read the article first

01

What is index optimization?

Index optimization is an optimization solution for creating and using indexes. Index optimization can greatly improve index search efficiency, reduce maintenance costs, and meet service requirements. But some brothers and sisters said that the index is not directly used after the establishment of it? Why optimize it! What a waste of my mood this idea of yours is, at best, useful.

If you can meet the development needs in this way, why do you have to be interviewed by the Leader and ask you to index the bottom layer, principle details, technical solutions and other results, so that you are exhausted, scared and cold sweat? Why bother you like this?

Example: Why is instant coffee different from Starbucks coffee?

The latter focuses on the whole process, from picking the beans, grinding them, filtering them, adding pure milk, cocoa powder, sugar and so on, like a work of art. And your instant coffee is all about mixing and mixing. What’s there? As long as there is taste, no matter the quality.

Pay attention to working procedure to have what profit?

If you want to improve the taste, you can go straight to the seasoning process. If you want to reduce the cost, you can optimize the production process and ingredients. Index optimization is also considered in the process of creation and use. Then you can’t change it if you don’t know. It’s like the woman you like is ignoring you and wasting away.

02

Why index optimization?

If you want to optimize, how do you go about optimizing? Here we need to have a cup of tea and discuss the history of the index!

An index is essentially a B+ tree structure built from column data. With the increase of data volume, there will be the following problems:

  • Column data is too long and costly to maintain

Column data can be divided into single column and multiple column

If you create a singleton index that needs to index long column columns, such as BLOB, TEXT, or long VARCHAR columns, index look-ups can be slow. Because the value of this field is the element under each node of B+Tree. Then there are the following defects for indexes:

  1. If a single column of data is too long, the entire node will contain fewer elements. This requires a higher node tree to accommodate the elements of the entire table. Either you’re fat or you’re thin.

  2. Index search and comparison, first get the data carried after the WHERE condition, take each letter and element in the data one by one, compared directly according to the ASCII code value to judge. How to operate the data in Chinese? Through Unicode encoding comparison, Chinese is first converted to encoding, and then the resulting value is compared.

  3. As we mentioned earlier, index maintenance is automatically created by MySQL background task thread to maintain the whole tree, if you have elements too long, insert the process: 1. The position of inserted leaf node needs to be searched from top to bottom; 2. Split the node from bottom to top. This process can be modified slightly, and if you have long elements, maintenance costs will increase. Index space has also been increased.

That’s why the index selects the field to create the index, okay?

If you switch to a federated index, multiple fields are grouped together, and if one index column is too long, the whole column will also increase. That’s why when a team member drops out, it instantly lowers the average score!

  • The index itself will have index fragmentation

So what is index fragmentation? Is to spread a compact index across multiple pages

That’s the danger of shrapnel. It’s nice to tear, but messy to clean up. If all the data in the index can be saved in 6 pages. Spread over 12 pages because of index fragmentation. When the query scans the index, it reads 12 pages instead of 6, increasing IO by nearly 50%.

How terrible it is. Also, your locking index will reduce MySQL buffer cache efficiency and take up more memory. More fragmented pages will also increase the database file. More space is required to store additional pages and reduce backup and restore performance.

So how did this come about? How to solve it? Don’t worry, we’ll arrange it later

  • Redundant, duplicate indexes, and the results don’t make much sense

The last thing you want to do is repetitive content, time-consuming and laborious, and unable to remind yourself of your ability

So what is duplicate indexing?

Duplicate indexes are indexes of the same type (referring to the underlying structure of the index) that are created on the same columns in the same order. Avoid creating duplicate indexes in this way and remove them as soon as they are discovered. After the index is created, MySQL needs to maintain the index separately, and the optimizer also needs to consider the index one by one when optimizing the query, which will affect performance. After all, when you create a column of indexes, you’re planting a tree. Will you not plant the tree you have planted?

Maybe some people say that the unique and normal index is not the same type ah, so or duplicate index? If you understand it that way, then you have to read brother’s last post.

Analysis of index Principle

Lotus Boy Nezha, public account: Lotus Boy Nezha
Internet big factory interview, talk about index straight force these bottom? The hard part is I don’t understand the principles

Such as:

CREATE TABLE test (ID INT NOT NULL PRIMARY KEY,A INT NOT NULL,B INT NOT NULL,UNIQUE(ID),INDEX(ID)) ENGINE=InnoDB ;Copy the code

The essence of these index types is B+Tree. Is that the same? Furthermore, can your unique and normal indexes be found as fast as the primary key? It works if you switch to T-tree, FULLTEXT, Hash.

A redundant index is the same as a duplicate index. Isn’t a table designed to store redundant data in a database? If this means that you are a good student, but need to improve, why?

For tables, you still have one column per table, and the index is the same.

For example, creating an index (A, B) after creating an index (A) is A redundant index because it is just the prefix of the previous index. So indexes (A, B) can also be used as indexes (A). . But if you create index (B, A), index (B), it is not redundant index, why?

Because the index starting with B is not the left-most prefix column of the index (A, B). In addition, other different types of indexes (such as hash indexes or full-text indexes) are not redundant indexes of the B-tree index, regardless of the overridden index column.

To summarize: is the relationship between the intersection, the left-most index generated by the intersection, is redundant index. Is there a redundant index for the Hash, full text? Looking forward to your reply

You may be in the heat of the head of the case, inadvertently created index, later in the SQL statement query conditions have not been used. Instead it is a waste of space, resources, this time should be decisively deleted. Otherwise it’s a drag on your performance.

You know it’s going to rain, but you didn’t bring an umbrella. It doesn’t matter what you say or do when people don’t like you. You have to show your worth in order to get noticed. Taught everyone another trick!

03

What is your understanding of index optimization

Understanding indexes can be divided into cost and application skills. The index itself is to improve the efficiency of the search, if the cost of using it is too high to reduce the efficiency of the query, that is against the original purpose of using it. If it’s not used at the SQL application level, or if it doesn’t work well, it’s not doing its job. No matter how good the tools are, they cannot be implemented. If you can achieve higher performance at limited cost. This can better reflect the value of the index. However, if it is useless, it can also affect performance. Application skills and cost control that all go hand in hand, just fine the technology is there to make the product better, and the product is there to drive the technology.

04

How to start index optimization?

  • To solve the problem that a field needs to index a long character column, there are the following solutions:

    • Prefix index and index selectivity

The default index selects the entire character column, so you can select only some characters at the beginning of the index, which reduces the index space and improves index efficiency. But there’s a key here. What is it?

That is, it reduces index selectivity.

Index selectivity is the ratio of non-repeating index values (also known as cardinality) to the total number of records (#T) ina table, ranging from I/#T to 1. The more selective an index is, the more efficient the query is, because a more selective index allows MySQL to filter out more rows in a lookup.

Index selectivity is good for instance: the girl friend that you seek is a twin sister, a lot of time appear the circumstance of mistaken person!Copy the code

Looking inside the index is equivalent to doing a full table scan. Why ah! Index B+Tree

Because there are so many of them, you need to take them out one by one and compare them with your where query until the conditions are met. That’s why primary key, unique index selectivity is the best, because it’s 1, and that’s the best performance.

So how do you choose this prefix?

The trick is to choose prefixes that are long enough to be selective, but not too long. Evaluate the column selectivity so that the prefix selectivity is close to the full column selectivity. Instead of columns. As follows:

Select count(DISTINCT index)/count(*) from table; // The field does not duplicate the data, the proportion of the table data is straightSelect count(DISTINCT LEFT(num))/count(*) from table; Step by step increase the index column length (num) until the ratio is close to the ratio above.Copy the code

Eventually you get the most appropriate length for the prefix index.

    • Pseudo hash index

What’s the hash index? You talk to me

This hash is not that hash, by which I mean a custom hash index, which creates a field in your table specifically for storing hash values. The hash key stores a slot, which is represented by an int. This is less index space than string characters, and you can still search based on B+Tree. I’m just using this to compare it with the hash value. So all you have to do is manually add the hash function to the WHERE clause.

So what’s this hash value?

You can use the hash function to compute the inserted data and store the result in the corresponding field. However, it is better for this function to be generated numerically, otherwise it will cost too much computing and storage resources. For example: CRC32

  • Index order arrangement

The same is true for a federated index, but the order of the index columns needs to be taken into account, because the query needs to match the left-most index to use the hit index. If the order is not correct, the index will not be hit. The results of indexed queries are also sorted based on column order.

So how do you choose the right order?

It depends on the situation. The rule of thumb is to place the most selective columns at the top of the index. But it is important to avoid random I/O and sorting situations. It is not very useful to create index pages if you specify that the cardinality of the index column to be created is itself large or close to that of all the rows in the table. In this case, the application layer should cancel this type of query and use other types of query criteria.

  • Optimization of sorting

The data after each query will be sorted in memory and returned to the client, if the small data set query is ok, then if it is millions, millions of rows of data?

At this time if in the storage engine to obtain data, you can find the checked data through the way of index sort. In this way, the order by operation can be avoided after finding the result set by default, thus improving the efficiency of the efficient query.

For example, you can create an index for a low selectivity column, such as (sex, rating).

SELECT <cols>  FROM profiles WHERE sex = 'M' ORDER BY rating LIMIT 100000, 10;Copy the code

Is there any requirement for index sort here?

As long as the order of the index column is exactly the same as that of the order by clause, and all columns are sorted in the same direction (backward or forward order). To sort with an index scan. If multiple tables are associated with a query, index sort can only be used if the order by clause applies all the fields to the first table. Order by also needs to satisfy the prefix index requirement.

An additional point to note here is that you can query by index, but not by index sort.

For example: joint indexes (' IS_role ', 'role_name', 'role_status')EXPLAIN SELECT * FROM tp_user_role where is_role = 1 AND role_status =1;Copy the code

Because the union index order is not satisfied, the index sort is not walked, but the index is still triggered. It’s just the first column of the index. This is a big loss for a specified index column conditional query because the second and third columns don’t match each other in the index lookup based on the first column index. This also requires where criteria filtering in memory.

We should pay special attention to the most left match rule, interview often test ah! Remember that the left-most match starts and multiple fields are compared in order. If the middle is skipped, the index comparison is broken and only the first column is used.

  • Reduce redundancy and duplicate indexing

General redundant indexes usually occur when a new index is added to a table, but adding an index can cause slower INSERT, UPDATE, DELETE, and other operations. Therefore, when there is a need to add new indexes, it is preferred to extend indexes rather than add them directly.

So what is an extended index?

Create an index without relying on the existing index column as the leftmost column.

For example, adding A new index (A, B) to an existing index (A) is A redundant column. Therefore, you can create index (B) or use other index columns in the table without extending the new index. For example, if the query only needs the result (ID, A), then you can consider overwriting the index. That’s why syndication is used a lot, right?

What, you don’t know the overwrite index? Then you can read little brother’s last article

Indexes cover

Lotus Boy Nezha, public account: Lotus Boy Nezha
Internet big factory interview, talk about index straight force these bottom? The hard part is I don’t understand the principles

Redundant indexes are not needed in most cases, and you should try to extend existing indexes rather than create new ones. But sometimes redundant indexes are needed for performance reasons, because extending an existing index can cause it to become too large, affecting the performance of other queries that use the index.

For example, state_id is an index column, the table data is 100W, and state_id is about 2W rows per record

SELECT state_id city, address FROM userinfo where state_id=5;Copy the code

So how can you improve the performance of this query? The simplest way to do this is to extend the index to (state_ ID, city, address) to override the query:

 ALTER TABLE userinfo DROP KEY state id,ADD KEY state_id_2 (state id, city, address);Copy the code

The solution to redundant indexes and duplicate indexes is to delete those indexes and locate those indexes. You can query the following index names in the INFORMATION_SCHEMA table by writing some complex access.

But there are two simpler ways. Some views in Common_ Schema by Shlomi Noach can be used for location, as can pt-duplicate-key-checker in PerconaTolkit, which analyzes table structures to find redundant and duplicate indexes.

For large servers, using external tools may be more appropriate, and querying INFORMATION_ SCHEMA tables can cause performance problems if there is a large amount of data or a large number of tables on the server.

 

  • Index and lock overhead

InnoDB supports row and table locks. Row locks are used by default, while MyISAM uses table locks. Indexes allow queries to lock fewer rows, reducing lock contention and increasing concurrency. This also improves query performance. So how did this lock get added?

Innodb row locks are triggered by indexes, which are valid when the storage engine filters out unwanted rows. If the index query fails to filter out invalid rows, data is loaded to the server layer and the WHERE condition is used. There’s no way to avoid locking rows.

That some people have a question, all go index query, not already reduced lock line? Why are rows locked unnecessarily?

First, B+Tree is a scoped search, and if the criteria are limited, the entire scoped search will be locked. For example: table data with indexes (‘ IS_ROLE ‘, ‘role_name’, ‘roLE_status’)

The query returns 1 to 4, but 3 is actually locked, which is locking invalid rows

Because first go to the index, then go to the service layer, and finally apply the condition after the where sentence.

Now access to 3 records will be locked

So if you lock 10,000 rows in your query, but you only need 100 rows, you’re losing. Before 5.1, these locks were released after a transaction was committed. After 5.1, the locks could be released after rows were filtered out on the server side, but some lock conflicts still occurred.

So what’s the solution?

Avoid using the WHERE clause in the service layer, and keep only the data needed for the index lookup, often overwriting the index.

  • Reduce index fragmentation

Before talking about the index fragmentation problem, we must first know the cause of index fragmentation is what?

  • Choosing a primary key is too random to grow in order, such as uuid

By default innoDB’s B+Tree is written in order according to the primary key, which makes the index more compact and does not split. If your primary key is random, the data will be discontinuous when written to disk.

  • Delete operation

Delete is mainly because the integrity of the page is damaged. When the database deletes data, it only marks the deletion mark in the disk page, but does not actually erase the data on the disk, which is similar to pseudo deletion. When there is new data behind the marked data directly overwrite, so as to better use of space. Now you know why when you delete a record and add new data, the primary key continues to increase.

In that case, the page structure has to be reorganized to make the data more compact and not spread out. You can OPTIMIZE the TABLE or re-import the TABLE. For storage engines that don’t support OPTIMIZE TABLE, rebuild tables directly through the ALTER TABLE operation. Simply change the storage engine of the table to the current engine:

ALTER TABLE <table> ENGINE=<engine>;Copy the code
  • Avoid casts

If the types on the left and right sides of the query condition do not match, the cast may cause index failure and full table scan.

If the phone field is of type VARCHAR, the following SQL cannot match the index:

select * from user where phone=12345678888;Copy the code

Can be optimized as:

select * from user where phone='12345678888';Copy the code

conclusion

  1. The basics of index optimization:

    1. MySQL optimization cost is the number of rows scanned by the query

    2. Index maintenance consumption

  2. The key of index optimization is that when MySQL executes a query plan through an index, it first compares the order of the index columns, then reads the data after comparison, and finally restricts the WHERE condition.

  3. This index optimization is aimed at the underlying composition of the index, the next article will continue to improve the SQL statement optimization, after all, the index depends on its use.

Good articles are constantly updated. It was really hard to write each article, but I worked my way through it. Forwarding or following is support, thank you