Original address:www.cnblogs.com/bypp/p/7755…
Introduce a,
1. What is an index?
In general application systems, the read/write ratio is about 10:1, and insert operations and general update operations rarely appear performance problems. In the production environment, we encounter the most, but also the most prone to problems, or some complex query operations, so the optimization of query statements is obviously a top priority. When it comes to expediting queries, indexes have to be mentioned.
2. Why have an index?
An index, also known as a “key” in MySQL, is a data structure used by the storage engine to quickly find records. Indexes are critical for good performance, especially as the amount of data in a table increases, and their impact on performance becomes more important. Index optimization is probably the most effective way to optimize query performance. Indexes can easily improve query performance by several orders of magnitude. An index is equivalent to a list of words in a dictionary. If you want to look up a word, if you do not use a list of words, you need to look up hundreds of pages page by page.
Second, the principle of index
A principle of indexing
The purpose of an index is to improve search efficiency in the same way that we use a table of contents to consult a book: locate the chapter, then locate the section under that chapter, and then find the page number. Similar examples are: look up the number of trains, flights, etc
The essence is to filter out the desired results by constantly narrowing the range of data to be retrieved, while turning random events into sequential events. In other words, with this indexing mechanism, we can always lock the data in the same search method.
The same is true for databases, but it is obviously much more complex, because there are not only equivalent queries, but also range queries (>, <, between, in), fuzzy queries (like), union queries (OR) and so on. How should the database choose to deal with all these problems? If we think back to the dictionary example, can we break up the data into segments and query it in segments? The simplest is if 1000 pieces of data, 1 to 100 into the first paragraph, 101 to 200 into the second paragraph, 201 to 300 into the third paragraph…… If you look up the 250 data, just look for the third paragraph, and you eliminate 90% of the invalid data in one fell swoop. But what if it’s 10 million records? What’s a good number? Those of you with a little algorithmic foundation will think of search trees, which have an average complexity of lgN and good query performance. But here we are missing a key point, the complexity model is based on the cost of the same operation every time. And database implementation is more complex, on the one hand, the data is stored in the disk, on the other hand in order to improve the performance, every time can put some data into memory to calculate again, because we know that the cost of access disk access memory is probably about one hundred thousand times, so a simple search tree is difficult to meet the application of complex scenes.
Two disk IO and prefetch
Considering the disk IO is a very expensive operation, computer operating system made some optimization, when an I/o, not only the current disk address data, but the adjacent data are read into memory buffer, because local pre-reading sex principle tells us that when the computer access to an address data, the adjacent will soon be access to data. Each IO read is called a page. The specific data size of a page depends on the operating system, and it is usually 4K or 8K. In other words, when we read the data in a page, we actually only have one IO. This theory is very helpful for the data structure design of the index.
The data structure of the index
Any kind of data structure is not without foundation generation, will have its background and usage scenarios, we summarize, now we need to what could be done this kind of data structure, actually very simple, that is: each time to find the data to disk I/o control in a small number of orders of magnitude, it is best to constant orders of magnitude. So we wondered if a highly controllable multi-path search tree could meet the requirements? And so the B + tree was born.
As shown above, it is a b + tree, about the definition of b + tree can see b + tree, said only a few key here, light blue piece of what we call a disk block, you can see every disk block contains several data items (shown in blue) and pointer (shown in yellow), such as disk blocks 1 contains data item 17 and 35, contains a pointer (P1, P2, P3, P1 indicates a disk block smaller than 17, P2 indicates a disk block between 17 and 35, and P3 indicates a disk block larger than 35. Real data exists in leaf nodes 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99. Non-leaf nodes only store the real data and only the data items that guide the search direction, such as 17 and 35, which do not really exist in the data table.
# # # b + tree search process As shown, if you want to search a data item 29, so you can take 1 by the disk loaded into memory, disk blocks at this point one IO, using binary search in memory between 29 in the 17th and 35, locking disk 1 piece of P2 Pointers, memory for a very short time (compared to a disk I/o) can be neglected, By disk 1 piece of P2 pointer disk address the disk block 3 by disk loaded into memory, in the second IO, 29 between 26 and 30, locking disk blocks of the 3 P2 pointer, pointer loading disk blocks through eight into memory, in the third IO, binary search to find memory do 29 at the same time, the end of the query, a total of three IO. The reality is that a 3-tier B + tree can represent millions of data. If millions of data lookups take only three IO, the performance improvement is huge. If there is no index and each data item takes one IO, the total number of IO will be millions, which is obviously very, very expensive.
1. Index fields should be as small as possible: through the above analysis, we know that the number of I/O depends on the height of b+ number h, assuming that the current data table is N, the number of data items in each disk block is M, then h=㏒(m+1)N, when the amount of data N is certain, the larger M, the smaller H; The size of a disk block is the size of a data page. The size of a disk block is fixed. If the space occupied by data items is smaller, the number of data items is larger and the height of the tree is lower. That’s why each data item, or index field, should be as small as possible; for example, int is 4 bytes, less than half of bigInt8 bytes. This is why b+ trees require real data to be placed in leaf nodes rather than inner nodes, where the number of data items in a disk block drops dramatically, causing the tree to grow taller. When the data item is equal to 1 it will degenerate into a linear table. 2. The left-most matching feature of an index (i.e. matching from left to right) : When the data items of b+ tree are compound data structures, such as (name,age,sex), the search tree is built on the order of b+ numbers from left to right. For example, when retrieving data such as (Tom,20,F), the b+ tree will preferentially compare names to determine the next direction of search. If the name is the same, then age and sex are compared in turn to obtain the retrieved data. However, when data such as (20,F) without name comes, b+ tree does not know which node to search next, because name is the first comparison factor when the search tree is established, and search must be conducted according to name first to know where to search next. For example, when (three,F) data to search, b+ tree can use name to specify the search direction, but the next field age is missing, so we can only find the data whose name is equal to three, and then match the data whose gender is F, this is a very important property, that is, the left-most matching feature of the index.
Mysql > select * from ‘Mysql’
A, functionality,
Mysql > select * from primary key; mysql > select * from primary key; mysql > select * from primary keyCopy the code
MySQL > select * from ‘MySQL’
Index category 1. Common index index: accelerated search 2. Unique index Primary key Index: accelerated search + constraint (not empty and unique) Unique index: accelerated search + constraint (unique) 3. Joint index - Primary key(ID,name): joint primary key index - Unique (id,name): joint unique index -index(id,name): joint common index 4. Fulltext: Works best when searching for long articles. 5. Spatial indexCopy the code
For example, let's say you are making a membership card system for a store. This system has a membership table 4 with the following fields: 5 Member id INT 6 Member name VARCHAR(10) 7 Member ID card number VARCHAR(18) 8 Member phone number VARCHAR(10) 9 Member address VARCHAR(50) 10 Member remarks information TEXT 11 12 So this Select PRIMARY key, PRIMARY key, PRIMARY key, PRIMARY key, PRIMARY key, PRIMARY key, PRIMARY key, PRIMARY key, PRIMARY key, PRIMARY key FULLTEXT 17 member note information, if you need to build an index, you can choose full-text search. 18 works best when searching for long articles. 19 is used for short text, if only one or two lines, plain INDEX will work. 20 In fact, for full-text search, we will not use the index of MySQL. Instead, we will choose third-party software such as Sphinx to do full-text search exclusively. SPATIAL index: 21, 22 #Copy the code
The two main types of indexes are hash and btree
Select * from btree; select * from btree; select * from btree; Innodb supports transaction, row-level locking, b-tree, full-text, and other indexes, but does not support Hash indexes. MyISAM does not support transactions, table-level locking, b-tree, full-text and other indexes, but does not support Hash indexes. Memory does not support transactions, table-level locking, b-tree and Hash indexes, and full-text indexes. NDB supports transactions, row-level locking, and Hash indexes, but does not support b-tree and full-text indexes. Archive does not support transactions, table-level locking, and indexes such as B-tree, Hash, and full-text.Copy the code
Create/drop index syntax
Select * from table_name where table_name = '1' and table_name = '1'; , 4 field names 2 data types [integrity constraints...] , 5 [UNIQUE | FULLTEXT | SPATIAL] INDEX | KEY 6 [INDEX name] (field name [(length)] [ASC | DESC]) 7); 8 9 10 # method 2: CREATE to CREATE an INDEX ON the existing table 11 CREATE [UNIQUE | FULLTEXT | SPATIAL] INDEX INDEX name 12 ON the table name, field name [(length)] [ASC | DESC]); # 3: The ALTER TABLE create indexes on the existing TABLE 16 ALTER TABLE TABLE name ADD [UNIQUE | FULLTEXT | SPATIAL] 17 INDEX name INDEX (field name [(length)] [ASC | DESC]); 18 19 # DROP INDEX ON table name;Copy the code
Be apt to help document the help create a help the create index = = = = = = = = = = = = = = = = = = 1. Create table s1(id int,# primary key #id int index); create table s1(id int,# id int index); Select name char(20), age int, email varchar(30) #primary key(id) # index(id); Create index name on s1(name); Create unique age on s1(age); Alter table s1 add primary key(id); Create index name on s1(id,name); create index name on s1(id,name); Drop index id on s1; drop index id on s1; drop index name on s1; Drop index age on s1; Alter table s1 drop primary key; alter table s1 drop primary key; Alter primary key alter primary key alter primary keyCopy the code
Help to check the
Test indexes
1, preparation,
Create table s1(id int, name varchar(20), gender char(6), email varchar(50)); Delimiter $$# $CREATE PROCEDURE auto_insert1() BEGIN DECLARE I int default 1; while(i<3000000)do insert into s1 values(i,concat('egon',i),'male',concat('egon',i,'@oldboy')); set i=i+1; end while; END$$#$$END delimiter; Show create procedure auto_insert1\G #4. Call the stored procedure call auto_insert1();Copy the code
2. Test query speed without indexes
Mysql > select * from s1 where id=333; +------+---------+--------+----------------+ | id | name | gender | email | +------+---------+--------+----------------+ | 333 | egon333 | male | [email protected] | | 333 | egon333 | f | alex333@oldboy | | 333 | egon333 | f | alex333@oldboy | + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + rows in the set (0.32 SEC) mysql > select * from s1 the where email='egon333@oldboy'; . . Rows in set (0.36 SEC)Copy the code
3. Add indexes
Select * from t1 where age > 5 select * from t1 where age > 5 Create index idx on s1(id); create index idx on s1(id); create index idx on s1(id); The system scans all the data in the table and creates an index structure using the ID as the data item, which is stored in the table on the hard disk. Note that innoDB table indexes are stored in s1.ibd, while myISam table indexes have a separate file table1.MYICopy the code
6. Use indexes correctly
Overwrite indexesCopy the code
Select * from s1 where id=123; The SQL hits the index, but does not overwrite it. Use id=123 to locate the id's location on the disk, or in the table, in the data structure of the index. But if we select *, we need other fields besides id, which means that it's not enough to get the ID from the index structure, we need to use that ID to find the value of the other fields in the row where the ID is, and that takes time. Obviously, if we just select ID, we subtract the agony. Select id from s1 where id=123; This is the overwrite index, hit the index, and from the data structure of the index directly fetch id on the hard disk address, very fastCopy the code
Second, the joint index
Index merge
Create index ne on s1(name,email) create index ne on s1(name,email); Select * from s1 where name='egon'; select * from s1 where name='egon'; select * from s1 where name='egon' and email='adf'; Select * from s1 where name='egon'; select * from s1 where email='adf'; select * from s1 where name='egon' and email='adf'; At first glance, it may seem that index merges are better: you can hit more cases, but in fact, if name='egon' and email='adf', combined indexes are more efficient than index merges, and if single criteria are used, index merges make more senseCopy the code
If we want to use indexes to achieve the desired effect of increasing query speed, we must follow the following principles when adding indexes
Copy the code
Create index ix_name_email on s1(name,email,) Select * from s1 where name='egon'; Select * from s1 where name='egon' and email='asdf'; Select * from s1 where email='[email protected]'; Mysql > select * from (a,b,c,d); mysql > select * from (a,b,c,d); mysql > select * from (a,b,c,d); If the index (a,b,d,c) can be used, the order of a,b,d can be arbitrarily adjusted. #2.= and in can be out of order, such as a = 1 and b = 2 and c = 3. Create (a,b,c) indexes in any order, mysql's query optimizer will help you optimize to the form #3 that the indexes recognize. Try to select columns with high distinction as indexes. The formula of distinction is count(DISTINCT Col)/count(*), which indicates the proportion of fields that are not duplicated. The larger the ratio is, the fewer records we scan. So one might ask, is there any empirical value to this ratio? This value is also difficult to determine in different application scenarios. Generally, we require more than 0.1 join fields, that is, an average of 10 records per scan #4. From_unixtime (create_time) = '2014-05-29'; from_unixtime(create_time) = '2014-05-29'; from_unixtime(create_time) = '2014-05-29'; Obviously the cost is too great. Create_time = unix_timestamp(' 2014-05-29 ');Copy the code
Left-most prefix demonstration
mysql> select * from s1 where id>3 and name='egon' and email='[email protected]' and gender='male'; Empty set (0.39 SEC) mysql> create index idx on s1(id,name,email,gender); Query OK, 0 rows affected (15.27 SEC) Records: 0 Duplicates: 0 Warnings: 0 mysql> select * from s1 where id>3 and name='egon' and email='[email protected]' and gender='male'; Empty set (0.43 SEC) mysql> drop index idx on s1; Query OK, 0 rows affected (0.16 SEC) Records: 0 Duplicates: 0 Warnings: 0 mysql> create index idx on s1(name,email,gender,id); Query OK, 0 rows affected (15.97 SEC) Records: 0 Duplicates: 0 Warnings: 0 mysql> select * from s1 where id>3 and name='egon' and email='[email protected]' and gender='male'; The Empty set (0.03 SEC)Copy the code
Index (id,age,email,name) 4 id 5 ID age 6 ID email 7 ID name 8 9 email 10 mysql> select count(*) from s1 where id=3000; 11 + -- -- -- -- -- -- -- -- -- -- + 12 | count (*) | 13 + -- -- -- -- -- -- -- -- -- -- 15 + + 14 | 1 | -- -- -- -- -- -- -- -- -- -- + 16 1 row in the set (0.11 SEC) 17 18 mysql > create index xxx on s1(id,name,age,email); 19 Query OK, 0 rows affected (6.44 SEC) 20 Records: 0 Duplicates: 0 Warnings: 0 21 22 mysql> select count(*) from s1 where id=3000; 23 + -- -- -- -- -- -- -- -- -- - + 24 | count (*) | 25 + -- -- -- -- -- -- -- -- -- -- + | | 1 26 and 27 + -- -- -- -- -- -- -- -- -- -- + 28 1 row in the set (0.00 SEC) 29 30 mysql > select count(*) from s1 where name='egon'; 31 + -- -- -- -- -- -- -- -- -- -- + | 32 count (*) 33 | + -- -- -- -- -- -- -- -- -- -- 35 + + 34 299999 | | -- -- -- -- -- -- -- -- -- -- + 36 1 row in the set (0.16 SEC), 37, 38 mysql > select count(*) from s1 where email='[email protected]'; 39 + -- -- -- -- -- -- -- -- -- -- + | 40 count (*) | 41 + -- -- -- -- -- -- -- -- -- - 43 + + 42 | 1 | -- -- -- -- -- -- -- -- -- -- + 44 1 row in the set (0.15 SEC) 45 46 mysql > select count(*) from s1 where id=1000 and email='[email protected]'; 47 + -- -- -- -- -- -- -- -- -- -- + 48 | count (*) 49 | + -- -- -- -- -- -- -- -- -- -- + | 0 | 51 + 50 -- -- -- -- -- -- -- -- -- -- + 52 1 row in the set (0.00 SEC) 53, 54 mysql > select count(*) from s1 where email='[email protected]' and id=3000; 55 + -- -- -- -- -- -- -- -- -- -- + 56 | count (*) | 57 + -- -- -- -- -- -- -- -- -- -- + | | 0 58 59 + -- -- -- -- -- -- -- -- -- -- + 60 1 row in the set (0.00 SEC)Copy the code
If an index cannot be hit, note:
- like '%xx' select * from tb1 where email like '%cn'; Select * from tb1 where reverse(email) = 'wupeiqi'; - or select * from tb1 where nid = 1 or name = '[email protected]'; Select * from tb1 where nid = 1 or name = 'seven'; select * from tb1 where name = 'seven'; Select * from tb1 where nid = 1 or name = '[email protected]' and email = 'alex' select * from tb1 where email = 999; Normal index is not equal to will not go index -! = select * from tb1 where email ! Select * from tb1 where nid! = 123 - > select * from tb1 where email > 'alex' Select * from tb1 where nid > 123 select * from tb1 where num > 123 select * from tb1 where num > 123 - order by select name from s1 order by email desc; Select * from s1 order by email desc; select * from s1 order by email desc; Select * from tb1 order by nid desc; select * from tb1 order by nid desc; - Left-most prefix of combined index if combined index is: (name,email) name and email -- use index name -- use index email -- do not use index -count (1) or count(column) instead of count(*) -- create index XXXX on TB (title(19)) #textCopy the code
- Avoid using select * -count (1) or count(column) instead of count(*) - when creating a table char instead of vARCHar when possible - table field order fixed-length fields preference - Composite index instead of multiple single-column indexes (often when using multiple conditional queries) - - Use a short index - Use a JOIN instead of a sub-query - Note that the condition types must be the same when joining tables - Index hash values (with few repetitions) are not suitable for index construction. For example, gender is not suitableCopy the code
Basic steps of slow query optimization
Run SQL_NO_CACHE to check whether the cache is slow. Set SQL_NO_CACHE to 1. Select * from table where the smallest number of entries is returned. Select * from table where the smallest number of entries is returned. Select * from table where the smallest number of entries is returned. Mysql > alter table SQL > alter table SQL > alter table SQL > alter table SQL > alter table SQL > alter table SQL > order by limit 5. Refer to the principles for index construction when adding indexes. 6Copy the code