preface
The first time I came into contact with MySQL was three years ago. At that time, MySQL was still in version 4.x and did not support many functions, such as stored procedures, views, triggers, not to mention complex features such as distributed transactions. But since 5.0(October 2005), MySQL has gradually moved into the enterprise database category; Replication, clustering, partitioning, distributed transactions, these enterprise-class features, make MySQL now, can be fully applied to enterprise-class application environment (many Internet companies use it as a database server, although cost saving is a factor, but without powerful backing, it is unimaginable). Although MySQL still has a lot of shortcomings, such as limited support for replication, partitioning, and query optimization, MySQL is a good enough DBMS, not to mention openSource. I have nothing to do these days. Out of curiosity, I did a little research on MySQL and accumulated some data to sum up. These materials are intended to be divided into two parts, the first part mainly discusses MySQL optimization, if there is time, later in the lower part of the analysis of MySQL source code. If you are a MySQL expert, I hope you don’t hesitate to teach; If you’re new to this, I hope it works for you.
MySQL architecture and concepts
MySQL logical architecture
The top is not specific to MySQL, all network based C/S network applications should include connection handling, authentication, security management, etc. The middle tier is the core of MySQL, including query parsing, analysis, optimization, and caching. It also provides cross-storage engine functionality, including stored procedures, triggers, and views. At the bottom is the storage engine, which is responsible for accessing data. The server can interact with various storage engines through the Storage Engine API.
Query Optimization and Execution
MySQL parses the user’s query and creates an internal data structure, the parse tree, and makes various optimizations, such as rewriting the query, choosing the order in which to read the table, and which index to use. The query optimizer does not care about the storage engine used by a table, but the storage engine influences how the server optimizes the query. The optimizer uses the storage engine to get parameters, the execution cost of an operation, statistics, and so on. Before parsing the query, the server accesses the Query cache, which stores SELECT statements and the corresponding query result set. If a query result is already in the cache, the server will no longer parse, optimize, and execute the query. It simply returns the cached results to the user, which greatly improves the performance of the system.
MySQL provides two levels of concurrency control: the Server level and the storage Engine level. (1) TABLE level lock: MySQL provides TABLE locks independently of the storage engine. For example, for ALTER TABLE statements, the server provides TABLE level locks. (2) Row-level locking: InnoDB and Falcon storage engines provide row-level locking, and BDB supports page-level locking. InnoDB’s concurrency control mechanism is discussed in detail in the next section. In addition, it is worth mentioning that some storage engines of MySQL (such as InnoDB and BDB) not only use locking mechanism, but also combine MVCC mechanism, namely Multiversion two-phrase locking protocal. To achieve transaction concurrency control, so that read-only transactions do not need to wait for locks, improve the concurrency of transactions. Note: Concurrency control is one of the core technologies of a DBMS (and, indeed, an OS), and it has a critical impact on system performance, which will be discussed in more detail later.
In MySQL, InnoDB and BDB support transaction processing. This paper mainly discusses InnoDB transaction processing (about BDB transaction processing, also very complex, I have seen its source code in detail before, will have the opportunity to discuss later). A transaction is a logical processing unit consisting of a set of SQL statements. A transaction has the following four properties, often referred to simply as the ACID property of a transaction (Jim Gray discusses transactions in detail in Transaction Processing: Concepts and Techniques). (1) Atomicity: A transaction is an atomic unit of operation in which all or none of the modifications to data are performed. (2) Consistent: Data must be Consistent at the beginning and completion of a transaction. This means that all relevant data rules must be applied to transaction modifications to preserve data integrity; At the end of the transaction, all internal data structures (such as b-tree indexes or bidirectional linked lists) must also be correct. (3) Isolation: The database system provides a certain Isolation mechanism to ensure that transactions are executed in an “independent” environment that is not affected by external concurrent operations. This means that intermediate states during transaction processing are not visible to the outside world and vice versa. (4) Durable: The modification of data after the completion of transactions is permanent and can be maintained even in case of system failure. Due to the concurrent execution of transactions, there are some famous problems as follows: When two or more transactions select the same row and then update the row based on the value originally selected, the problem of lost updates occurs because each transaction is unaware of the existence of the other transactions — the last update overwrites the updates made by the other transactions. (2) Dirty Reads: a transaction is making changes to a record whose data is in an inconsistent state before the transaction completes and commits; At this point, another transaction reads the same record, and if left unchecked, the second transaction reads the “dirty” data and performs further processing accordingly, resulting in uncommitted data dependencies. This phenomenon is aptly called “dirty reading”. (3) Non-repeatable Reads: a transaction Reads the previously read data again ata certain time after reading some data, only to find that the read data has been changed or some records have been deleted! This phenomenon is called unrepeatable reading. (4) Phantom Reads: a transaction re-reads previously retrieved data according to the same query conditions, only to find that other transactions insert new data that meets its query conditions. This phenomenon is called “Phantom Reads”. 1.3.3 Transaction Isolation SQL2 standard defines four isolation levels. Define statement as follows: SET the TRANSACTION ISOLATION LEVEL [READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SERIALIZABLE] this with Jim The isolation level proposed by Gray is somewhat different. READ UNCOMMITTED is Jim’s 10 (browse); READ COMMITTED is 20, cursor stability; REPEATABLE READ = 2.99990 isolation (no phantom protection); SERIALIZABLE isolation level is 30, indicating full isolation. The SQL2 standard defaults to full isolation (30). Problems at each level are as follows:
For example, Oracle provides only two standard isolation levels, READ COMMITTED and Serializable. Oracle also provides its own READ ONLY isolation level. SQL Server supports an isolation level called “Snapshot” in addition to the above four isolation levels defined by ISO/ANSI SQL92, but strictly speaking it is a Serializable isolation level implemented with MVCC. MySQL supports all four isolation levels. The default isolation level is Repeatable Read, but the implementation has some features, such as MVCC consistent read at some isolation levels. Domestic database DM also supports all levels, with the default level being READ COMMITTED.
There are two types of InnoDB row-level locks:
(1) Shared lock (S) : allows one transaction to read a row, preventing other transactions from acquiring an exclusive lock on the same data set.
(2) Exclusive lock (X) : allows a transaction to obtain an exclusive lock on update data, preventing other transactions from obtaining a shared read lock and an exclusive write lock on the same data set. In addition, InnoDB supports multiple Granularity locking, which allows simultaneous locking of records and tables. InnoDB introduces intention locks for tables: (1) Intention shared locks (IS) : a transaction intends to place a shared lock on a row. Before a transaction can place a shared lock on a row, it must acquire an IS lock on that table. (2) Intentional exclusive lock (IX) : a transaction intends to lock a row exclusively. Before a transaction can lock a row exclusively, it must acquire the IX lock on that table. For example, SELECT… LOCK IN SHARE MODE SELECT… Alter table T alter table T alter table T alter table T alter table T alter table T alter table T alter table T (2) the transaction must acquire the lock IX on T before acquiring the lock X on T. InnoDB lock compatibility matrix:
InnoDB grants the requested lock to a transaction if its lock mode is compatible with the current one; Otherwise, if the two are incompatible, the transaction waits for the lock to be released. Intent locks will only block requests for TABLES from other transactions, for example, LOCK TABLES… WRITE, the main purpose of an intent lock is to indicate that the transaction will or is locking records in a table. Deadlock is an important problem with concurrency control using a blocking mechanism. Consider an example of a deadlock:
Non-blocking read consistency is one of the important features of MySQL. InnoDB uses THE MVCC mechanism to represent a snapshot of the database query at a certain point in time. The query can see the changes made by the transactions submitted before that point, but can not see the changes made by the transactions after that point or before that point. However, the query can see changes made by previous statements in the same transaction, for example:
If the transaction isolation level is REPEATABLE READ (the default), all consistent reads in the same transaction are snapshots created by the first READ operation of the READ transaction. You can commit the current transaction and see the latest snapshot in a new query, as shown above. If the isolation level of a thing is READ COMMITTED, consistency is only for reads within the transaction and its own snapshots, as follows:
Note that session 2 is not repeatable. When InnoDB processes SELECT statements at READ COMMITTED and REPEATABLE READ isolation levels, consistency is the default mode. Consistent reads do not lock the table, so other joins can change the table at the same time. Assuming the transaction is at REPEATABLE READ level, InnoDB gives you a time point based on the data seen by the query when you are doing a consistent READ. If another transaction deletes a row after that point in time and commits the transaction, you will not see that the row has been deleted. Insert is the same as update. However, unlike other DBMSS, InnoDB does not create illusions at the REPEATABLE READ isolation level. Consistency does not work with DROP TABLE or ALTER TABLE. If the noDB_LOCKS_UNSAFE_FOR_binlog variable is set or the transaction isolation level is not SERIALIZABLE, InnoDB does not specify FOR UPDATE or LOCK IN SHARE MODE FOR INSERT INTO… SELECT, UPDATE … (SELECT), and CREATE TABLE… The SELECT statement uses consistent reads, in which case the query statement does not lock tuples in the table. Otherwise, InnoDB will use locks.
1.3.6, the SELECT… FOR UPDATE and SELECT… Locking read IN some cases, locking read is not convenient. IN this case, locking read can be used. InnoDB supports two types of accelerated reads:
(1) SELECT … LOCK IN SHARE MODE: LOCK S on read tuples.
(2) SELECT … FOR UPDATE: blocks SELECT from other connections while scanning index records… LOCK IN SHARE MODE and read operations at a certain transaction isolation level. InnoDB uses a two-phase blocking protocol where all locks are not released until a transaction is committed or rolled back, which is done automatically by the system. In addition, MySQL supports LOCK TABLES and UNLOCK TABLES, but these are implemented in the server layer, not the storage engine. They are useful, but do not replace a storage engine for transaction processing. If you need transaction capability, use a transactional storage engine. To consider locking read, suppose you want to insert a new tuple into the child table and ensure that the records in the Child table have a parent record in the parent table. If you use a consistent read to read the parent table, it is possible to insert the parent row of the child row to be inserted, but can it be safely inserted? No, because by the time you read the parent table, other connections might have deleted the record. (Consistent reads are intra-transaction and should be called “inconsistent reads” for database states.) IN this case, SELECT LOCK IN SHARE MODE is used, which locks the read tuple S to prevent other connections from deleting or updating the tuple. Alternatively, if you want to update a query at the same time, you can use SELECT… FOR UPDATE, which reads the latest data and then locks the read tuple with X. In this case, use SELECT… Locking IN SHARE MODE is not a good idea, because if two transactions do this, a deadlock will occur. Note: SELECT… FOR UPDATE locks tuples only when auto commit is turned off (that is, manual commit), and when auto commit, eligible tuples are not locked.
InnoDB has the following row level locks:
(1) Record lock: Lock index records. InnoDB row-level lock is achieved by locking index entries, rather than locking record instances themselves. If the table does not have an index defined, InnoDB creates a hidden clustered index and uses it to lock records (see the next chapter for more on the relationship between indexes and locks).
(2) Gap lock: lock the interval between index records, or the interval before the first index record and the interval after the last index.
(3) Post-code lock: add record lock to index record and lock the interval before index record. By default, InnoDB transactions work at the isolation level of REPEATABLE READ and the system variable Innodb_lockS_unSAFE_for_binlog is turned off. In this case, InnoDB uses a next-key lock for lookup and index scans to prevent “phantom”. A next-key lock is a combination of a record lock and a gap. When InnoDB looks up or scans an index of a table, it adds an S or X lock to the index records it encounters, so row-level locks are essentially index-record locks. In addition, it locks the interval before the index record. That is, a next-key lock is an index record lock, plus a gap lock for the interval before the index record. If one connection holds an S or X lock on a record R in the index, other connections cannot insert an index record before R in index order. If the index contains the following values: 10, 11,13, and 20, then the next-key lock of the index will cover the following ranges (” (” indicates no, and “[” indicates yes) : (negative infinity, 10) (10, 11](11, 13](13, 20) (20, positive infinity) for last interval, next-key lock will lock the interval above the maximum index value, The value of the upper bound virtual record (” supremum “pseudo-record) is larger than any value in the index. In fact, the upper bound is not a real index record, so next-lock locks the interval after the maximum value of the index. A gap lock is not necessary to query a unique value in a unique index. For example, if an ID column has a unique index, the following query only adds an index-record lock on a tuple with ID =100, regardless of whether other joins insert tuples in the previous interval. SELECT * FROM child WHERE id = 100; If the ID has no index, or is not unique, the statement locks the previous space.
In the above example, the illusion problem arises. If you change a unique query to a range query, the result is as follows (following the index in the example) :
As you can see, session 2’s next-key blocks all inserts before and after I =4. In addition, if index i_index is dropped, the result is as follows:
In addition, for INSERT, transactions do not have to wait for each other as long as they do not INSERT records at the same location in the same index interval, which is called insertion intention gap lock. For example, the index records have values of 4 and 7, and two independent transactions insert 5 and 6, respectively. Although they both hold a gap lock between 4 and 7, they do not block with each other. This can improve food concurrency.
If you set the transaction isolation level to READ COMMITTED or turn on the innodb_LOCKS_unSAFE_for_binlog system variable, gap locks will be turned off. In this case, finding or scanning an index only does a foreign key constraint check and a duplicate key value check. In addition, the READ COMMITTED isolation level and turning off nodb_LOCKS_unSAFE_for_binlog have another side effect: MySQL releases record locks that do not match Where conditions. For example, InnoDB can only do “semi_consistent reads” for UPDATE statements, so it returns the latest committed transaction to make changes, resulting in non-repeatability and phantom issues.
1.3.8 Using next-key Lock to Prevent phantom Problems Examples 1-4 show a phantom problem. A SELECT statement using a next-key lock can solve the illusion problem, but examples 1-4 always have a unique index, which causes the SELECT statement to use an index-record lock instead of a gap lock.
The plug-in storage engine is one of the most important features of MySQL and what sets it apart from other DBMSS. MySQL supports many storage engines to meet different application requirements, including MyISAM, InnoDB, BDB, MEMORY, MERGE, NDB Cluster, etc. Among them, BDB and NDB Cluster provide transaction support. MySQL’s default storage engine is MyISAM, of course, you can specify other storage engines when creating tables, you can use different storage engines for different tables in the same database (this is a very powerful and unique feature). You can run the SHOW TABLE STATUS command to query the storage engine used by the TABLE, for example, to query the user TABLE of the mysql database:
Name: the Name of the table; Engine: storage Engine used by tables. Row_format: indicates the format of a record. MyISAM supports three different storage formats: static (fixed length) tables (default format), dynamic tables, and compressed tables. Static tables have fields of fixed length, such as CHAR and INTEGER. Fields in dynamic tables can be variable-length, for example, VARCHAR or BLOB. Rows: The number of records in a table. Avg_row_length: the average length of a record (in bytes); Data_length: the total number of bytes of data in the table; Max_data_length: indicates the maximum number of bytes of data in a table. Index_length: disk space consumed by the index. Data_free: For MyISAM tables, space allocated but not yet used; This space contains space left over from previously deleted records that can be reused by INSERT operations. Auto_increment: indicates the next increment. Check_time: indicates the time when CHECK TABLE or myISamchk was last used to CHECK the TABLE.
MyISAM 1.4.1.1, the default storage engine for MySQL, is a compromise between performance and functionality, including full-text index, data compression, spatial (GIS) data support, but does not support transaction and row-level locking. In general, MyISAM is better suited for large query operations. If you have a lot of inserts and deletes, you should use InnoDB. Each table contains three files: (1).frm: table definition file, also for other storage engines. (2).myd file: data file. (3).myi file: index file. You can specify paths for DATA files and INDEX files through DATA DIRECTORY and INDEX DIRECTORY at table creation time, and they can reside in different directories. In addition, MyISAM storage format is cross-platform, you can copy data files and index files from Intel platform to PPC or SPARC platform. In 5.0, MyISAM’s variable length record table handles 256TB data by default, using a 6-byte pointer to point to data records; The previous version used the default 4-byte pointer, so it could only handle 4GB of data. All versions can increase the pointer to an 8-byte pointer. If you want to change the size of the MyISAM table pointer, you can do this by setting MAX_ROWS and AVG_ROW_LENGTH: CREATE TABLE mytable ( a INTEGER NOT NULL PRIMARY KEY, b CHAR(18) NOT NULL ) MAX_ROWS = 1000000000 AVG_ROW_LENGTH = 32; In the example above, MySQL will store at least 32GB of data. You can view the table information:
As you can see, Create_options lists the options at creation time, and the maximum amount of data in this table is 91GB. You can ALTER TABLE to change the size of the pointer, but that will cause the TABLE and index to be rebuilt, which can take a long time. 1.4.1.2 MyISAM features (1) Locks and concurrency: MyISAM only has table-level locks, but does not support row-level locks. Not suitable for large numbers of writes, but it supports concurrent inserts, a very important and useful feature. (2) Automatic repair: MySQL supports automatic check and repair of MyISAM tables. (3) Manual REPAIR: You can use CHECK TABLE to CHECK the status of the TABLE and REPAIR the TABLE using REPAIR TABLE. (4) Indexes: You can create indexes for the first 500 characters of BLOB and TEXT. Also, MyISAM supports full-text indexing, but only for CHAR, VARCHAR, and TEXT columns. Delayed key writes: if DELAY_KEY_WRITE is specified when MyISAM table is created, MySQL will not write the changed index data to disk at the end of the query, but will save the changes in the key buffer. The index data is flushed to disk only when the cache is changed or the table is closed. InnoDB InnoDB is a high performance transaction storage engine. In addition, BDB also supports transaction processing. InnoDB has the following features:
InnoDB stores tables and indexes in two ways: (1) Shared tablespace storage: In this way, the definition of the table is located in the. FRM file, and the data and indexes are stored in the innodb_datA_home_DIR and innodb_datA_file_PATH specified tablespace. (2) Multi-tablespace storage: Table definitions are still in the.FRM file, but each InnoDB table and its index are in its own file (.idb), and each table has its own table space. Multiple table Spaces can be beneficial for users who want to move specific tables to separate physical disks, or for those who want to quickly restore the backup of a single table without interrupting the use of other InnoDB tables. You can add the following line to the [mysqld] section of my.cnf to allow multiple tablespaces: [mysqld] Innodb_file_per_table After restarting the server, InnoDB stores each newly created table in its own file tbl_name. Ibd in the database directory where the table belongs. This is similar to what the MyISAM storage engine does, but MyISAM divides the tables into data files tbl_name.MYD and index files tbl_name.MYI. For InnoDB, data and data are stored together in an.ibd file. The tbl_name. FRM file was created as usual. If you delete the innodb_file_per_TABLE row from the my.cnf file and restart the server, InnoDB creates the table again in the shared tablespace file. Innodb_file_per_table only affects table creation. If you start the server with this option, new tables are created using.ibd files, but you can still access the tables in the shared table space. If you remove this option, the new table is created in a shared tablespace, but you can still access any table created with multiple tablespaces. InnoDB always needs shared tablespaces,.ibd files are not enough for InnoDB to run, shared tablespaces contain familiar IBData files, InnoDB puts its internal data dictionary and undo logs in this file. 1.4.2.2 Foreign Key Constraints In MySQL, InnoDB is the only storage engine that supports foreign keys. When creating a foreign key, InnoDB requires that the referenced table must have a corresponding index. When creating a foreign key, the reference table will automatically create the corresponding index. InnoDB combines MVCC mechanism with next-key lock to achieve various isolation levels of transactions, which is not commonly used. If the noDB_LOCKS_UNSAFE_FOR_binlog variable is set or the transaction isolation level is not SERIALIZABLE, InnoDB does not specify FOR UPDATE or LOCK IN SHARE MODE FOR INSERT INTO… SELECT, UPDATE … (SELECT), and CREATE TABLE… The SELECT statement uses consistent reads (see above), in which case the query statement does not lock tuples in the table. Otherwise, InnoDB will use locks.
MySQL indexing and optimization
Indexes have a critical impact on the speed of queries, and understanding indexes is also a starting point for database performance tuning. Consider the following situation, where a table in a database has 10^6 records, a DBMS has a page size of 4K, and stores 100 records. If there is no index, the query will scan the entire table, and in the worst case, 10^4 pages will be read if all the data pages are out of memory, and 10^4 I/ OS will be required if the 10^4 pages are randomly distributed on disk, assuming that each DISK I/O takes 10ms(ignoring data transfer time), It takes a total of 100s(but it’s much, much better). If the b-tree index is built, only log100(10^6)=3 page reads are required, which takes 30ms in the worst case. This is where indexes come in. Many times, when your application is slow to perform SQL queries, you should think about whether you can build an index. To get down to business:
1. Select the data type of the index
MySQL supports many data types, and selecting the right data type to store data has a significant impact on performance. In general, the following guidelines can be followed:
(1) Smaller data types are usually better: Smaller data types generally require less space in disk, memory, and CPU cache and are faster to process. (2) Simple data types are better: Integer data is less expensive to process than character data because string comparisons are more complex. In MySQL, you should use built-in date and time data types instead of strings to store time; And storing IP addresses with integer data types. (3) Avoid NULL as much as possible: the column NOT NULL should be specified unless you want to store NULL. Columns with null values are difficult to query optimize in MySQL because they complicate indexes, index statistics, and comparison operations. You should replace null values with 0, a special value, or an empty string. 1.1 selecting an Identifier It is important to select the right identifier. You should consider not only the storage type, but also how MySQL performs calculations and comparisons. Once the data type is selected, you should ensure that all related tables use the same data type. (1) Integer: is usually the best choice for an identifier because it is faster to process and can be set to AUTO_INCREMENT.
(2) Strings: Try to avoid using strings as identifiers, they take up more space and are slower to process. Also, strings are generally random, so their placement in the index is also random, which can lead to page splitting, random disk access, and clustered index splitting (for storage engines that use clustered indexes).
For any DBMS, indexes are the most important factor in optimization.
For small amounts of data, not having the right indexes doesn’t matter much, but performance degrades dramatically as the volume of data increases. If multiple columns are indexed (composite indexes), the order of the columns is very important, and MySQL can only effectively look up the leftmost prefix of the index. For example, if the combined index IT1C1C2 (c1,c2) exists, the query statement SELECT * from T1 where c1=1 and c2=2 can use the index. Select * from T1 where c1=1 However, the query statement SELECT * from T1 where c2=2 cannot use this index because there is no lead column for the combined indexes, that is, c1 must be equal to some value in order to use the C2 column for the lookup.
2.1 Types of Indexes Indexes are implemented in the storage engine, not in the server layer. Therefore, each storage engine may not have the same index, and not all storage engines support all index types.
2.1.1. Suppose the following table exists in the B-tree index:
Its index contains the last_name, first_NAME, and DOB columns for each row in the table. Its structure is roughly as follows:
The index stores values in the order in the index column. You can use a B-tree index to query all keywords, keyword ranges, and keyword prefixes. Of course, if you want to use an index, you must ensure that the index is queried by the leftmost prefix of the index. (1) Match the full value: specify a specific value for all columns in the index. For example, the index in the figure above helps you find Cuba Allen born in 1960-01-01. (2) Match a leftmost prefix: You can use the index to find the person whose last name is Allen, using only the first column in the index. (3) Match a column prefix: For example, you can use the index to find people whose last name starts with J, using only the first column in the index. (4) Match a range of values: you can use the index to find the person whose last name is between Allen and Barrymore, and only use the first column in the index. (5) Match one part exactly and Match a range on another part: You can use the index to find people whose last name is Allen and whose first name starts with the letter K. Select * from ‘Index’ where ‘Index’ = ‘Index’; select * from ‘Index’ where ‘Index’ = ‘Index’; Since the nodes in the B-tree are stored sequentially, you can use indexes to look up (find certain values) or ORDER BY the query results. Of course, using b-tree indexes has the following limitations:
- The query must start from the leftmost column of the index. This has been mentioned many times. For example, you can’t use an index to look up people born on a particular day.
- An index column cannot be skipped. For example, you can’t use an index to find someone whose last name is Smith and who was born on a certain day.
- The storage engine cannot use the column to the right of the range condition in the index. For example, if your query is WHERE last_name=”Smith” AND first_name LIKE ‘J%’ AND DOB =’1976-12-23′, then only the first two columns in the index will be used because LIKE is a range query.
In MySQL, only the Memory storage engine displays Hash indexes, which are the default index type for Memory tables, although Memory tables can also use b-tree indexes. The Memory storage engine supports non-unique hash indexes, which are rare in the database world. If multiple values have the same Hash code, the index stores their row Pointers in a linked list into the same hash entry. Suppose we create the following table:
CREATE TABLE testhash (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
KEY USING HASH(fname)
) ENGINE=MEMORY;
The data included is as follows:
Suppose the index uses the hash function f() as follows:
At this point, the index structure looks like this:
Slots is organized, but the records are not. Mysql > SELECT lname FROM testhash WHERE fname=’Peter’; MySQL computes the hash value of ‘Peter’ and uses it to query the row pointer of the index. Since f(‘Peter’) = 8784, MySQL will look for 8784 in the index and get a pointer to record 3. Because indexes themselves only store very short values, they are very compact. The Hash value does not depend on the data type of the column; the index of a TINYINT column is the same size as that of a long string column. Hash indexes have the following restrictions:
(1) Since indexes only contain hash codes and record Pointers, MySQL cannot avoid reading records by using indexes. But accessing records in memory is very fast and does not affect sex too much.
(2) Hash index sorting cannot be used.
(3)Hash indexes do not support partial key matching because the Hash value is computed by the entire index value.
(4)Hash indexes only support equivalence comparison, such as =, IN(), and <=>. WHERE price>100 does not speed up the query.
MyISAM supports spatial indexes, which are mainly used for geospatial data types, such as GEOMETRY.
Full-text Index is a special index type of MyISAM, mainly used for full-text search.
3. High-performance indexing policies
Clustered Indexes ensure that tuples with similar keyword values are stored in the same physical location (therefore, it is not appropriate to build Clustered Indexes for string types, especially random strings, which will cause a large number of movement operations), and only one Clustered index can be configured for a table. Because storage engines do the indexing, not all engines support clustered indexing. Currently, only solidDB and InnoDB support it. The structure of the cluster index is roughly as follows:
Note: Leaf pages contain complete tuples, whereas inner node pages contain only the columns of the index (the columns of the index are integers). Some DBMSS allow users to specify clustered indexes, but MySQL’s storage engine has so far not supported them. InnoDB builds clustered indexes on primary keys. If you do not specify a primary key, InnoDB will use an index with a unique and non-null value instead. If no such index exists, InnoDB defines a hidden primary key and then clusters the index on it. In general, a DBMS stores the actual data in the form of clustered indexes, which are the basis for other secondary indexes.
To understand more about clustered indexes and non-clustered indexes, or primary indexes and second indexes (MyISAM does not support clustered indexes), let’s compare InnoDB’s data layout with MyISAM’s:
Assume primary key values between 1 and 10,000 and insert in random order, then OPTIMIZE with the OPTIMIZE TABLE. Col2 is randomly assigned values between 1 and 100, so there are many duplicate values. (1) MyISAM’s data layout is very simple, MyISAM stores data on disk in the order of insertion as follows:
Note: The row number on the left starts from 0. Because tuples are fixed in size, MyISAM can easily find the position of a byte from the beginning of the table. The index structure of the primary key is as follows:
Note: MyISAM does not support clustered indexes. Each leaf node in the index contains only the row number, and the leaf nodes are stored in col1 order. Let’s look at the index structure of COL2:
In fact, in MyISAM, the Primary key is no different from any other index. The Primary key is simply a unique, non-empty index called Primary. InnoDB stores data in the form of clustered indexes, so its data layout is quite different. It stores tables roughly as follows:
Note: Each leaf node in the cluster index contains the value of the primary key, transaction ID, and Rollback pointer — used for transactions and MVCC, and the remaining columns (e.g. Col2). In contrast to MyISAM, secondary indexes are quite different from clustered indexes. InnoDB’s secondary index leaves contain primary key values instead of row Pointers. This reduces the overhead of maintaining secondary indexes when moving data or splitting data pages, since InnoDB does not need to update the index’s row Pointers. Its structure is roughly as follows:
Comparison of clustered index and non-clustered index table:
3.1.2, Insert rows by primary key (InnoDB)
If you use InnoDB and don’t need a special clustered index, a good idea is to use surrogate keys — independent of the data in your application. The simplest way to do this is to use an AUTO_INCREMENT column, which ensures that the records are inserted in sequence and improves the performance of queries joined using the primary key. Random clustering primary keys should be avoided as much as possible; for example, string primary keys are a bad choice because they make inserts random.
Covering Indexes are called Covering Indexes if they contain all the data that satisfies the query. Overwriting indexes is a very powerful tool that can greatly improve query performance. There are some advantages to reading indexes instead of data:
(1) Index entries are usually smaller than records, so MySQL accesses less data;
(2) Indexes are stored in order of value size, which requires less I/O than random access records;
(3) Most data engines can cache indexes better. For example, MyISAM only caches indexes.
(4) Overwrite indexes are especially useful for InnoDB tables because InnoDB uses clustered indexes to organize data. If a secondary index contains data needed for a query, there is no need to look it up in the clustered index. The overwrite index cannot be any index, only the B-tree index stores the corresponding value. Also, different storage engines implement overwriting indexes in different ways, and not all storage engines support overwriting indexes (Memory and Falcon do not). For index-covered Queries, you can see “Using Index” in the Extra column when you use EXPLAIN. For example, in the Inventory table at Sakila, there is a combined index (store_id,film_id) that MySQL can use for queries that only need to access these two columns, as follows:
In most engines, an index is overridden only if the column accessed by the query statement is part of the index. However, InnoDB is not limited to this. InnoDB’s secondary index stores the primary key value in leaf nodes. Therefore, sakila.actor tables use InnoDB and have indexes on last_names, so indexes can override queries that access actor_id, such as:
In MySQL, there are two ways to generate ordered result sets: using filesort or scanning by index. Sorting with an index is very fast, and you can use the same index for both lookup and sorting. You can use an index to sort when the index is in the same ORDER as the columns in ORDER BY and all the columns are in the same direction (all ascending or all descending). If the query joins multiple tables, the index is used only if all columns in the ORDER BY are columns of the first table. All other cases use filesort.
When MySQL can’t sort data using indexes, it uses its own sort algorithm (quicksort algorithm) to sort data in memory (sort Buffer). If memory doesn’t fit, it splits the data on disk into chunks and sorts the chunks. The pieces are then combined into an ordered result set (in effect, outer sort). For filesort, MySQL has two sorting algorithms.
(1) The Two passes algorithm is realized by first retrieving the fields that need to be sorted and the pointer information that can be directly located to the relevant row data, and then sorting the fields in the set memory (set by sort_buffer_SIZE). After the sorting is complete, the desired Columns are retrieved using the row pointer information. Note: This algorithm is the algorithm used before 4.1. It requires two accesses to the data, especially the second read operation will result in a large number of random I/O operations. On the other hand, memory overhead is low.
(2) Single pass algorithm this algorithm takes out all Columns required at one time and outputs the results directly after sorting them in memory. Note: This algorithm has been used since MySQL 4.1. It reduces the number of I/ OS and is efficient, but also has a large memory overhead. If we took out Columns that we didn’t need, we would waste a lot of memory for sorting. In MySQL versions after 4.1, you can control whether MySQL chooses the first or second sort algorithm by setting the max_LENGTH_FOR_sort_data parameter. When the total size of all large fields fetched is greater than max_LENGTH_FOR_sort_DATA, MySQL chooses to use the first sorting algorithm, and if not, MySQL chooses the second sorting algorithm. In order to improve the sorting performance as much as possible, we naturally prefer to use the second sorting algorithm, so it is necessary to fetch only the Columns needed in the Query. SQL > select * from ‘ORDER BY’, ‘ORDER BY’, ‘select * from’ ORDER BY ‘, ‘select * from’ ORDER BY ‘, ‘select * from’ ORDER BY ‘, ‘select * from’ ORDER BY ‘; Otherwise, MySQL must create a temporary table from the query result set and perform filesort operations after the join is complete. Using filesort “.
Indexes are important to InnoDB because they allow queries to lock fewer tuples. This is important because in MySQL 5.0 InnoDB is not unlocked until a transaction commits. There are two reasons for this: First, even though InnoDB smart locks are very efficient and have a low memory overhead, there is an overhead anyway. Second, locking unneeded tuples increases the cost of locking and reduces concurrency. InnoDB locks only tuples that need to be accessed, and indexes reduce the number of tuples InnoDB accesses. However, this can only be achieved by filtering out unwanted data at the storage engine layer. Once InnoDB’s index does not allow it to do that, MySQL server can only perform WHERE operations on data returned by InnoDB. At this point, it is impossible to avoid locking tuples: InnoDB has locked tuples and the server cannot unlock them. Here’s an example:
This query only returns 2- 3 data, and actually locks 1- 3 data exclusively. InnoDB locks tuple 1 because MySQL’s query plan only uses indexes for range queries (without filtering) :
Indicates that the storage engine starts at the start of the index and gets all rows until acTOR_id <4 is false and the server cannot tell InnoDB to remove tuple 1. To prove that row 1 is locked, create a separate connection and do the following:
The query is suspended until the first connected transaction commits to release the lock (this behavior is necessary for statement-based replication). As shown above, InnoDB locks tuples it doesn’t need when using indexes. To make matters worse, if a query cannot use an index, MySQL does a full table scan and locks every tuple, whether it is really needed or not.