What is an index

An index is a structure that sorts the values of one or more columns of a database table. Using an index allows quick access to specific information in a database table.

For example, in elementary school, when we use a dictionary to search for words, we search according to the table of contents, which is the function of an index. Without a directory, we would have to start from the first character in order to find a character. So an index is a tool to speed up retrieval.

Index structure in InnoDB

So let’s go back to the B+ tree

Before introducing database indexes, let’s review B+ trees. Why are we reviewing them? Because InnoDB index implementation is in the form of B+ tree implementation.

You can check out the relevant materials for the concept of B+ trees. Here we review some of its characteristics suitable for use as database index structures

  1. The middle node of a k subtree contains K elements (k-1 elements in a B tree). Each element does not hold data, but is used only for indexing. All data is stored in the leaf node. Intermediate nodes do not store data. Therefore, more data is stored in the same space, and the index range is larger. Therefore, more information can be obtained in a single disk access, effectively reducing the I/O access to the disk. Because the data is stored only on the leaf node, the performance per lookup is very stable.
  2. All leaf nodes contain information of all elements and Pointers to records containing these elements, and the leaf nodes themselves are linked in large order according to the size of the key word. Mysql is a relational database, so interval access is a common scenario. Child nodes are linked in a linked list according to their size, which can effectively improve the efficiency of interval access.

Let’s take a look at the figure below to familiarize ourselves with the search process for B+ trees.

First disk access

Second disk access

Third disk access

The most basic index – primary key index

The data in the database is formed into a singly linked list by primary key in the data page. Each data page has a certain size, and if one data page is full, multiple data pages are needed to store data and link the data pages together through linked lists. Assume that a data page can now store up to two data. So let’s guess how the data might be stored in the database as follows.

But in the above structure our lookups can only be stored by starting a global scan of linked list points to find the data we need.

Can we increase the query efficiency by adding some indexes to the data structure to form a B+ tree? The answer, of course, is yes. So let’s convert it to the following figure.

Isn’t there two data stores on one page? Why are there three of them? Please ignore this detail because this is a B + tree, the middle node may store more data. (The root problem is that I was lazy and reused the original diagram)

The data of the above leaf nodes are arranged in the page according to the size of a certain value, which is usually the primary key of our database. When building a table, the primary key is usually the clustering index of the table, and the data in the table is arranged in the data page according to the clustering index from small to large.

Now let’s think about, if we insert a number of 8, what happens to this structure?

We first find the location where 8 should be stored and find that the page is full. In order to ensure the orderliness of our child nodes, we can only split the page. This will obviously reduce efficiency. Therefore, we usually use the database default increment sequence as the primary key. But a primary key index is not necessarily a clustered index. (I’ll leave this question to the students themselves.)

Normal index

A single-field index, by its name, means that only one field is indexed. A joint index is a combination of multiple fields. Let’s look at the table structure below

CREATE TABLE `tb_predict_user_info` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Add primary key ID, season ID',
  `user_id` bigint(20) NOT NULL DEFAULT '0' COMMENT 'user ID',
  `app_id` int(11) NOT NULL DEFAULT '0' COMMENT 'User's appID',
  `bonus_amount` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Total amount of user red envelope, in minutes',
  `is_del` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Soft delete flag,0: not deleted 1: deleted',
  `create_time` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) COMMENT 'Record creation time',
  `update_time` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3) COMMENT 'Record last update time',
  PRIMARY KEY (`id`),
  KEY `index_user` (`user_id`)
  KEY `index_user_app` (`app_id`,`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COMMENT='User Information Table';
Copy the code
  • Index_user is a single-field index
  • Index_user_app is a federated index

What is the storage structure in the database for the table above?

For each index, InnoDB builds a B+ tree to index the data. However, for space saving reasons, only the leaf nodes of the clustered index store the complete data, while the other leaf nodes of the B+ tree only store the data of the corresponding index field.

The row format in the database can be simplified as shown in the figure below

Record_type: indicates the data type

  • 0: indicates common user records
  • 1: records of directory entries
  • 2: minimum record
  • 3: indicates the maximum record

The index page is shown below

The data page is shown below

Use of indexes in queries in InnoDB

As mentioned above, data is stored in the database as a B+ tree. So how do we look up data?

Is the process of searching by primary key index simply understood as the following process?

  1. Suppose you find data with a primary key of 10000.
  2. Because the data on the same page is already sorted, binary lookup is possible within the page.
  3. Find the page with the next-level index by binary lookup in the root.
  4. And so on to find the leaf node.
  5. Locate the target data in the leaf node.

Innodb has as many B+ trees as it has indexes. How to solve this problem?

The answer, of course, is no repetition! Only the leaf node of the primary key index (clustered index) holds all the data, while other indexes do not hold all the data, only the value of the index and the primary key of the record.

What about the process of ordinary indexing?

This is similar to the primary key index search, except that the leaf node does not hold all the recorded data, so it needs to be searched again based on the primary key. This process is commonly referred to as back table (back table: to go back to the table for a second query).

How do you avoid returning to the table?

The leaf node does not store all the fields of the record, but does store the values of the index fields.

If all the fields in our query field can be overwritten by the index fields used, then we can avoid the back table.

Does the range query go to the index?

All we have said above is to specify a certain index condition for the query, but in daily development we will inevitably encounter a range query (>,<,! =) etc, search for relevant information on the Internet, some people say not to go to the index, some people say to go to the index (the network is open, we need to identify the authenticity of their own), then the scope of query in the end do not go to the index? If we use the index, what role does the index play in our query at this point?

So let’s go back to the B+ tree. Let’s think for a moment about what happens if you’re looking for something in a B+ tree with an index greater than 5000.

  1. Because in the DATABASE B+ tree, the data is arranged from the largest to the smallest.
  2. We look for the leaf node with the value id=5000, and then sequentially traverse the next pointer to find the values that match the criteria
  3. Find other fields by going back to the table.

SQL > select * from table_name; SQL > select * from table_name; SQL > select * from table_name; SQL > select * from table_name;

And then there’s something like! So does this go to the index? Let’s think about it, let’s find one through B+ tree! = How do I manipulate the data of a value? 1, find the value equal to this value first, then find the rest of the data?

Does it feel like a full table scan? We can assume that this operation is less efficient than a full table scan. No table uses an index, so this type of query does not use an index.

In addition to where conditions, indexes are used in order by fields.

Select * from tb_table order by id ASC limit 100 How to query this time which? 1, because the data in the table is sorted by ID from largest to smallest. 2. So we just need to find the one with the smallest ID, and then iterate through the values that match the number of conditions in sequence. 3. If you need to perform table back operations

Order by must be indexed?

Order by = order by = order by Select * from table order by userid desc, tradeDay ASC; select * from table order by userid desc, tradeDay ASC; Will the index still go at this time? The old way, let’s look at this query and see how we should do it in a B+ tree.

Select * from B+; select * from B+; select * from B+;

Let’s look at a situation

Select * from table where user_id =? Select * from table where trade_day =? Select * from table where user_id =? , the query condition is only trade_day but not user_id.

Will the above two queries go to the index? Again, we’re going to start with B plus trees. 1. Our index is sorted by user_id and trade_day. 2, our query condition is user_id, so as trade_day does not use, so not go to the index to query? 3, our query condition is trade_day, first according to…… Ahhhh, can’t get to index ME, I can’t think of ahhhh.

We use user_id and trade_day in the WHERE condition, but the order is reversed in the WHERE condition. Yes, the query optimizer will do that for us.

Let’s look at the table below

CREATE TABLE `tb_user_info` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Add primary key ID, season ID',
  `user_id` bigint(20) NOT NULL DEFAULT '0' COMMENT 'user ID',
  `user_id_c` varchar(200) NOT NULL DEFAULT ' ' COMMENT 'user ID',
  PRIMARY KEY (`id`),
  KEY `index_user` (`user_id`)
  KEY `index_userc` (`user_id_c`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COMMENT='User Information Table';
Copy the code

The above table has two main columns user_id of type int and user_ID_c of type vARCHar and related indexes. We have some data in our table

Let’s look at the following query statements. A:select * from tb_user_info where user_id = 0;

B:select * from tb_user_info where user_id = ‘0’;

C:select * from tb_user_info where user_id = ‘abc’;

D:select * from tb_user_info where user_id_c = 0;

E:select * from tb_user_info where user_id_c = ‘0’;

F:select * from tb_user_info where user_id_c = ‘abc’;

What is the result of the above 6 statements, and whether the above query is indexed? A: Select * from index_user where id=1;

B: Select * from index_user where id=1;

C: Select * from index_user where id=1.

D: The index_user_c index is used to query the data whose ID is 1

E: The index_user_c index is used to query the data whose ID is 1

F: The query result is empty and the index_user_c index is deleted

Is there a question here?

1, I have built the index, why sometimes go, sometimes not go?

= ‘ABC’ =0;

Here we introduce a concept: implicit conversion

Mysql > select * from ‘SQL’; select * from ‘SQL’; select * from ‘SQL’;

  1. When at least one of the two parameters is NULL, the comparison result is also NULL, except when the pair 1 is used. When you compare two nulls, it returns 1, and in both cases you don’t need to do any type conversion because both of the arguments are strings, so you compare them as strings, no type conversion
  2. Both parameters are integers and are compared as integers without type conversion
  3. Hexadecimal values are treated as binary strings when compared with non-numbers
  4. One parameter is TIMESTAMP or DATETIME, and the other parameter is a constant, which is converted to TIMESTAMP and one parameter is of type decimal, and if the other parameter is decimal or an integer, Integers are converted to decimal for comparison, or decimal to float for comparison if the other argument is a floating-point number
  5. In all other cases, the two arguments are converted to floating point numbers and compared

In view of the above principle we look at our question just now

1, I have built the index, why sometimes go, sometimes not go? Did you notice that all indexes are of the same type or that the database type is int? There is nothing wrong with a consistent type going to the index, so why is it possible not to go to the index after a type conversion? Float sort: 3, 21′ 21′,’3′,’3′ . So as I understand it, it’s easy to assume that when an implicit cast occurs, if the cast is of a database field type, then the index does not take effect. (I don’t know if that’s right, welcome Diss)

= ‘ABC’ =0; Type inconsistency will be converted ah! How much is’ ABC ‘converted to a number? It can’t be converted, of course it’s the default 0, so you get the idea.

Conclusion:

If the query condition is XXX and the index is XXX, how can I optimize the query?