The index

There are several common types of indexes

The common types of indexes are hash index, ordered array index, binary tree index, hop table and so on. This paper focuses on the index structure of InnoDB, the default storage engine of MySQL.

InnoDB index structure

In InnoDB, the index structure is implemented through a multi-way search tree, B+ tree. In a B+ tree only leaves store data, and all leaves form a linked list. InnoDB maintains a bidirectional linked list.

You might be wondering, why use B+ trees instead of binary trees or B trees?

First of all, we know that accessing a disk requires accessing a specified block, and accessing a specified block requires disk rotation and arm movement. This is a time-consuming process. Increasing the tree height means that you need to make more disk accesses, so the n-fork tree is used. The reason for using a B+ tree is that a range lookup will be re-searched every time, and a B+ tree can make full use of the linked list of leaf nodes.

When you build a table, you may add multiple indexes, and InnDB will create a B+ tree for each index to store indexes.

For example, at this time we set up a simple test table

create table test(
  id int primary key,
  a int not null.name varchar.index(a)
)engine = InnoDB;
Copy the code

At this point InnDB will create two B+ index trees for us

One is the primary key cluster index, and the other is the secondary index of the normal index. Here I will directly post the above texture of MySQL Brief article (because I am too lazy to draw a picture).

You can see that the value of the leaf node above the secondary index only stores the value of the primary key, while the value of the leaf node above the cluster index of the primary key stores the value of the entire record.

Back to the table

So this is where the concept of a back table comes into play, where we’re doing a query

select name from test where a = 30;
Copy the code

MySQL > select * from a; MySQL > select * from a; MySQL > select * from a; MySQL > select * from A; MySQL > select * from A; MySQL > select * from A; MySQL > select * from A;

So let’s summarize what is the return table? MySQL calls back the table when it finds the corresponding primary key on the secondary index and uses the primary key to find the desired data on the cluster index.

Index maintenance

We know that the index takes up space, index can improve our query speed but also can not be abused.

For example, if we use the id number as the primary key in the user table, the leaf node of each secondary index takes about 20 bytes, compared to 4 bytes for the integer primary key and 8 bytes for the bigint. That is, if I maintain an index list of 4 Gigabytes followed by an integer, it will be 20 gigabytes with id.

So we can reduce the index space by reducing the index size.

Of course, B+ trees undergo some necessary maintenance when deleted or inserted to maintain index orderliness (in InnoDB deletion marks nodes as “reusable” to reduce structural changes).

Such as an increase in a node may encounter data page is full, this time you need to do pages divided, it is a relatively lengthy work, and split can also lead to data pages utilization become low, such as the original store three data when data page again to add a page needs to be done, At this point, the existing four data are divided into two data pages, which reduces data page utilization.

Cover index

In some cases, InnoDB performs an operation called overwrite index to improve efficiency and reduce table returns.

Let’s say we do a select operation here

select id from test where a = 1;
Copy the code

At this time, it is obvious that we can directly obtain the value of ID by using the index of A, so there is no need to return the table, we use the overwrite index at this time.

Simply put, an overwrite index is an operation that allows us to get the data we need when we go to the secondary index without having to go back to the table again.

Joint index

Now let’s create a new student table

CREATE TABLE `stu` (
  `id` int(11) NOT NULL.`class` int(11) DEFAULT NULL.`name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `class_name` (`class`.`name`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
Copy the code

We use class(class number) and name to create a joint index. What is the use of this joint index, you may ask? We can understand it by combining the above coverage index. For example, at this time, we have a demand that we need to find the corresponding student name by the class number.

select name from stu where class = 102;
Copy the code

At this time, we can directly find the student name in the secondary index without the need to return to the table.

In general, well-designed indexes and full use of overridden indexes can greatly improve the speed of retrieval.

Left-most prefix rule

This is based on the joint index, which is a matching rule of the joint index.

At this time, we changed the above requirements slightly. At this time, we had a student who was late, but he only wrote his name Zhang SAN instead of his class when the guard recorded the information, so we need to find the corresponding class number through the student’s name.

select class from stu where name = 'Joe';
Copy the code

At this point, instead of using our federated index, we do a full table scan.

Why is that? Because of the leftmost matching rule. We can draw a simple picture to understand this.

And of course the leftmost matching rule has these rules

  • The optimizer will change the order of a full value match, which means that the order of your full value match is not in the same order as the original union index, and the optimizer will adjust it for you.
  • Index matching starts at the leftmost point, if not, a full table scan is performed. For example, if you design a joint index (a,b, C), then you can use (a),(a,b),(a,b,c) and you can use (b),(b,c),(c) without the index.
  • Index is deindexed when a range match is encountered. Let’s say you do a select operation like this
select * from stu where class > 100 and name = 'Joe';
Copy the code

At this point InnoDB will drop the index and perform a full table scan, because InnoDB will not know how to traverse the index, so it will perform a full table scan.

An index pushdown

I dug a hole for you. Back to the table was required prior to MySQL5.6, but an optimization called index push-down was implemented later.

select * from stu where class > 100 and name = 'Joe';
Copy the code

How do you optimize it? We abandoned the index because of the left-most matching rule, and then we will check the name by returning to the table. At this time, we should do something like this

But with the index push down, it looks like this, and then li Si and Xiao Ming don’t go back to the table anymore.

InnoDB will still push down indexes to optimize performance if the left-most matching rule is terminated due to range queries.

Some best practices

When do you need to create an index?

  • Indexes should be created for fields that are frequently queried.
  • When a query is associated with multiple tables, the associated fields should be indexed.
  • The sort field in the query should be indexed.
  • Statistics or grouping fields need to be indexed.

Which cases do not need to create indexes

  • Table records are small.
  • Often add, delete, change and check the table.
  • Fields that are frequently updated.
  • The WHERE condition uses a low field.
  • When the field is large.

other

  • Try to select a highly differentiated column as the index.
  • Avoid functional manipulation of indexes, and pay attention to implicit type conversions and character encoding conversions.
  • Expand the index as much as possible, do not create new indexes. For example, if you want to add (a,b) to a table that already has an index of A, you only need to modify the original index.
  • Think overwrite index, index push down, left-most match.

The lock

Global lock

MySQL provides a method for adding a global read lock by using Flush tables with read lock (FTWRL). You can use this command when you need to make the entire library read-only, after which the following statements from other threads will be blocked: data update statements (adding, deleting, or modifying data), data definition statements (including table creation, table structure modification, etc.), and update class transaction commit statements.

This is typically used during a full logical backup to ensure that no other thread can update the database.

In MVCC, the operation to get a consistent view is provided to make backup very easy. If you want to learn more about MVCC, you can refer to my other article. Do you really know MVCC? How about doing it by hand? .

Table locks

Meta Data Lock (MDL) Metadata Lock

The MDL lock is used to ensure that only one thread can make table structure changes to the table.

How can I put it? MDL is divided into MDL write lock and MDL read lock. The locking rule is as follows

  • Added when a thread performs CRUD operations on a tableMDL read lock
  • Added when a thread makes a table structure change operation on a tableMDL write lock
  • Write locks and read locks, write locks and write locks are mutually exclusive, read locks are not mutually exclusive

lock tables xxx read/write;

Lock tables T1 read, T2 write; lock tables T1 read, T2 write; This statement blocks statements that write to or write to T1 or T2 from other threads. In addition, thread A can only read T1 and read T2 before executing unlock tables. It is not allowed to write to T1, so it is not allowed to access other tables.

This table lock is a way to handle concurrency, but row locks are commonly used in InnoDB.

Row locks

We know that MySQL’s default storage engine prior to version 5.5 was MyISAM, and there are two major differences between MyISAM and InnoDB

  • The transaction
  • Row locks

Row locks are our topic for today. If you don’t know about transactions, you can learn about them.

The implementation lock is two locks, you can think of as write lock (exclusive lock X lock) and read lock (shared lock S lock)

  • Shared lock (S lock) : An exclusive lock that allows one transaction to read a row, preventing other transactions from acquiring the same data set. Also called read lock: A read lock is shared. Multiple customers can read the same resource at the same time, but other customers are not allowed to modify it.

  • Exclusive locks (X locks) : Allows transactions to acquire exclusive locks on update data, preventing other transactions from acquiring shared read locks and exclusive write locks on the same data set. Also called write locks: Write locks are exclusive and block other write and read locks.

But line lock can also cause a very headache problem, that is deadlocks.

If transaction A writes to row 100 and transaction B writes to row 101, transaction A wants to modify row 101 and transaction B wants to modify row 100, then possession and wait leads to deadlock problems, and deadlock problems can only be detected and prevented.

Next, the key lock

MVCC and row locking cannot solve the illusion problem, so InnoDB uses a GAP lock (GAP lock), which works with row locking to form a next-key lock to solve the illusion problem.

However, because of its locking rules, some locking scope is extended to reduce the concurrency of the database. The specific locking rules are as follows:

  • The basic unit of lock is next-key lock is a combination of line lock and GAP lock.
  • Objects accessed during the lookup are locked.
  • A next-key lock is degraded to a row lock when a unique index is locked.
  • The next-key lock is degraded to a gap lock when the last value does not meet the equivalence condition.
  • A range query on a unique index accesses up to the first value that does not satisfy the condition.

MVCC to solve the illusion of the idea is more complex, here do not do too much verification.

conclusion

I’ve given you a lot of best practices for MySQL indexes, and they’re all based on principles, and InnoDB is basically a modified VERSION of the B+ tree, and the structure for storing indexes. Know these things and you’ll be in your element.

InnoDB actually uses row locks, MVCC locks and next-key locks to achieve transaction concurrency control.

For MySQL, the most important is lock and index, because the content is too much, this article only do some introduction and simple analysis, if you want to know more about the corresponding article can be viewed.