1. Cluster index

  1. If the table has a primary key set, the primary key is the cluster index
  2. If the table has no primary key, the first NOT NULL and UNIQUE column is used as the cluster index by default
  3. If none of the above is present, a hidden ROW_ID is created as the cluster index by default
  4. InnoDB’s clustered index leaves store rows (actually pages, a page contains multiple rows of data) 1
  • advantages
  1. A clustered index is also better than a non-clustered index when you need to retrieve a range of data.
  2. It is theoretically faster to find target data through a clustered index than a non-clustered index because a non-clustered index has one more target record addressing, that is, one more I/O, to locate the corresponding primary key.
  3. 3. Queries that use overridden index scans can directly use primary key values in page nodes
  4. disadvantages
  5. The insert speed depends heavily on the insert order. The primary key order is the fastest way to insert. Otherwise, pages will split and performance will be severely affected. Therefore, for InnoDB tables, we generally define an incremented ID column primary key.
  6. Updating the primary key is expensive because it causes the updated row to move. Therefore, for InnoDB tables, we generally define primary keys as not updatable.
  7. Secondary index access requires two index lookups, the first to find the primary key value and the second to find the row data based on the primary key value.
  8. The leaf nodes of secondary indexes store primary keys rather than row Pointers (non-clustered indexes store Pointers or addresses). This reduces the maintenance of secondary indexes in the event of row movement or data page splitting, but allows the secondary indexes to occupy more space.
  9. Using clustering index insert new value than using the clustering index insert new values to slow a lot, because insert to ensure that the primary key can’t heavy compound, determine the primary key can’t repeat, adopt the way under the different index can make a big performance gap, clustering index iterate through all the leaf nodes, the clustering index also judge all leaf nodes, However, the leaf node of the cluster index has a record value in addition to the primary key, and the record size is often much larger than the primary key. This can result in expensive I/O for clustered indexes to determine whether the primary key carried by the new record is duplicated


Two, back table query

The index B+ tree is scanned twice to locate the cluster index value based on the value of the common index and then locate the row record data based on the value of the cluster index, which has lower performance than scanning the index tree once

Index coverage

You can get all the column data you need for SQL in a single index tree without going back to the table, which is faster

Four,Page divided

The mysql database is stored in the B+ tree in order (so the index itself is ordered). The mysql database is stored in the B+ tree. The default size of a data page is 16K. You can also customize the size of a data page. If a data page is full, mysql will request a new data page to store data. When a data page is full, the mysql database can directly apply for a new data page to write. If the primary key is a non-incremented ID, mysql will need to place the inserted data in the right place each time to ensure that the index is in order. When data is inserted into a data page that is almost full or already full, the new data page will be full, so mysql needs to apply for a new data page, because some data from the previous data page may be moved to the new data page. This results in page splitting, a process of moving a lot of data that can seriously affect insert efficiency

MySQL has six types of log files: Redo logs, undo logs, binary logs, errorlogs, slow Query logs, and General log Relay logs The log) https://juejin.cn/post/6844903809118912525

Redo logs are unique to InnoDB; Binlog is implemented in the Server layer of MySQL and is available to all engines.

A redo log is a physical log of what was changed on a data page. The binlog is the logical log that records the original logic of the statement, such as “add 1 to the C field on the line ID=2”.

Redo logs are written in a redo log cycle. Binlog can be appended. “Appending” means that the binlog file will be switched to the next one after being written to a certain size without overwriting the previous log

Procedure For updating data

The executor first finds the engine and fetches the line ID=2. ID is the primary key, and the engine uses the tree search directly to find this row. If the row ID=2 is already in memory, it is returned directly to the executor. Otherwise, you need to read into the memory from disk and then return.

The executor takes the row given by the engine, adds 1 to it, for example, N+1 to get a new row of data, and then calls the engine interface to write the new row of data.

The engine updates the data to memory and logs the update to the redo log. The redo log is in the prepare state. The executor is then told that the execution is complete and the transaction can be committed at any time.

The executor generates a binlog of this operation and writes the binlog to disk.

The executor calls the commit transaction interface of the engine, and the engine changes the redo log to the commit state