SQL performance optimization, Mrs. Mrs. Mrs. Too useful!

My official account is MarkerHub and my website is Markerhub.com

For more selected articles, please click: Java Notes Complete.md

Small Hub read:

For mysql, said a lot of optimization points, collection is good, ha ha ha ~

wolearn
juejin.im/post/59b11ba151882538cb1ecbd0

preface

This article mainly aims at relational data database MySql. Key class database can refer to:

www.jianshu.com/p/098a870d8…

First, briefly review the basic concepts of Mysql, and then divided into the creation and query of these two stages of optimization.

1 Basic Concepts

1.1 Logical Architecture

Layer 1: The client transmits the SQL instructions to be executed by connecting to the service
Layer 2: The server parses and optimizes the SQL, generates the final execution plan, and executes it
Layer 3: storage engine, responsible for data storage and extraction

1.2 the lock

Databases address concurrent scenarios through locking mechanisms – shared (read) and exclusive (write) locks. Read locks are non-blocking and multiple clients can read the same resource at the same time. Write locks are exclusive and block other read and write locks. A brief mention of optimistic locks and pessimistic locks.

Optimistic locking, typically used in scenarios where data is not hotly contested, is more read than write, with version numbers and timestamps.
Pessimistic locking, usually used in scenarios where data is hotly contested, locks data for each operation.

Locking data requires a locking strategy.

Table locks, which lock the entire table, have minimal overhead, but can increase lock contention.
Row locking, which locks the row level, is the most expensive, but supports maximum concurrency.

However, the real implementation of MySql’s storage engine is not simple row-level locking, but generally implements multi-version concurrency control (MVCC). MVCC is a variant of row-level locking, which avoids locking operations in most cases and costs less. MVCC is implemented by saving a point-in-time snapshot of the data.

1.3 transactions

Transactions guarantee that a set of atomic operations will either all succeed or all fail. On failure, all previous operations are rolled back. MySql uses automatic commit, so if a transaction is not explicitly opened, each query is treated as a transaction.

The isolation level controls changes within a transaction and what is visible within and between transactions. There are four common isolation levels:

Read UnCommitted, changes in a transaction that are visible to other transactions even if they are not committed. Transactions may read uncommitted data, resulting in dirty reads.
Read Committed, when a transaction starts, only the changes made by the Committed transaction can be seen. Changes made before a transaction is committed are not visible to other transactions. Also called non-repeatable read, the same transaction may read the same record multiple times.
RepeatTable Read: the same record result is Read many times in the same transaction.
Serializable, the highest isolation level that forces transactions to be executed serially.

1.4 Storage Engines

InnoDB engine, the most important and most widely used storage engine. It is designed to handle a large number of short-term transactions with high performance and automatic crash recovery features.

MyISAM engine, which does not support transaction and row-level locking, cannot be safely recovered after a crash.

2 Optimization during creation

2.1 Schema and data type optimization

The integer

TinyInt, SmallInt, MediumInt, Int, 8,16,24,32,64 BigInt use storage storage space. Using Unsigned to indicate that negative numbers are not allowed doubles the limit of positive numbers.

The set of real Numbers

Float,Double, supports approximate floating point operations.
Decimal, for storing exact decimals.

string

VarChar, stores variable-length strings. One or two extra bytes are required to record the length of the string.
Char: fixed length. Suitable for storing fixed length strings, such as MD5 values.
Blob, Text is designed to store very large amounts of data. Binary and character modes are used respectively.

Time to type

DateTime, which holds a wide range of values in 8 bytes.
TimeStamp, recommended, is the same as UNIX timestamps and is 4 bytes long.

Optimization suggestion point

Use the corresponding data type whenever possible. For example, don’t store time as a string, store IP as an integer.
Choose a smaller data type. TinyInt instead of Int.
Identifier column. It is recommended to use an integer rather than a string because it occupies more space and is slower to calculate.
Schemas automatically generated by ORM systems are not recommended. They often have problems such as not paying attention to data types, using large VarChar types, and improper index utilization.
Real world scenarios mix paradigms and antiparadigms. High redundancy High query efficiency, low insert update efficiency; Low redundancy High insert update efficiency but low query efficiency.
Create a completely separate summary table \ cache table that periodically generates data for user time-consuming operations. For summary operations that require high accuracy, you can use the historical results plus the latest records to achieve the purpose of quick query.
During data migration, you can use a shadow table to change the name of the original table to save historical data without affecting the use of the new table.

2.2 the index

An index contains the value of one or more columns. MySql only makes efficient use of the left-most prefix column of the index. Advantages of indexes:

Reduce the amount of data scanned by query
Avoid sorting and zero-hour tables
Change random I/O to sequential I/O (sequential I/O is more efficient than random I/O)

B-Tree

The most used index type. The B-tree data structure is used to store data (each leaf node contains a pointer to the next leaf node to facilitate traversal of leaf nodes). The B-tree index is applicable to all key values, key value range, key prefix search, and sorting.

B-tree index restriction:

If you do not start the query at the leftmost column of the index, you cannot use the index.
Columns in an index cannot be skipped. If the first and third column indexes are used, only the first column index can be used.
If there is a range query in the query, all columns to the right of it cannot be indexed to optimize the query.

The hash index

A query is valid only if all columns of the index match exactly. The storage engine computes a hash code for all index columns, and a hash index stores all hash codes in the index and holds Pointers to each data row.

Hash index limits:

Can’t be used for sorting
Partial matching is not supported
Only equivalent queries such as =, IN () are supported. < > is not supported.

Optimization suggestion point

Note the scope and limitations of each index.
Index columns are invalid if they are part of an expression or arguments to a function.
For extremely long strings, you can use a prefix index that selects the appropriate prefix length based on the selectivity of the index.
When using multi-column indexes, you can join them using the AND AND OR syntax.
Duplicate indexes are not necessary, as (A, B) and (A) repeat.
Indexes are particularly useful for WHERE conditional queries and group by queries.
Place the range query at the end of the conditional query to prevent the right index invalidation problem caused by the range query.
Indexes should not be long strings, and index columns should not be null.

3 Query optimization

3.1 Three important indicators of query quality

Response time (service time, queue time)
The scan line
Of rows returned

3.2 Querying optimization points

Avoid querying extraneous columns, such as using Select * to return all columns.
Avoid querying irrelevant rows
Shard query. Break up a server heavy task into a long period of time and perform it multiple times. If you want to delete 10,000 pieces of data, you can perform this operation in 10 times. After each operation is complete, pause the operation for a period of time before continuing. Release server resources to other tasks during the process.
Decompose associated query. Decompose a query of multiple table associated query into multiple queries of a single table. Lock competition can be reduced and the query efficiency is relatively high. Because MySql connections and disconnections are lightweight operations, there are no efficiency issues caused by queries being split multiple times.
Note that the count operation can count only columns that are not null, so use count (*) to count the total number of rows.
The efficiency of group by is high. The result of group by is not suitable for the column other than the group column.
Associated query delay association. You can narrow the query scope based on query conditions and then associate the query.
Limit paging optimization. You can override a scan based on the index and then associate itself with querying other columns based on the index column. Such as

SELECT id, NAME, Age WHERE student s1 INNER JOIN (SELECT id FROM student ORDER BY age LIMIT 50,5) AS s2 ON s1.id = s2.idCopy the code

The Union query is deduplicated by default. If you do not need to use Union All, you are advised to use Union All with higher efficiency

Add the content

From the great God – Xiao Bao

If the field type in the condition is inconsistent with the table structure type, mysql automatically adds a conversion function, which invalidates the index as a parameter in the function.

2. If the preceding part of the like query is not entered, the index cannot be matched if the value starts with %.

Added two new features in version 5.7:

The generated column in the database is computed from other columns

CREATE TABLE triangle (sidea DOUBLE, sideb DOUBLE, area DOUBLE AS (sidea * sideb / 2));
insert into triangle(sidea, sideb) values(3, 4);
select * from triangle;
Copy the code

+-------+-------+------+
| sidea | sideb | area |
+-------+-------+------+
|   3      |   4      |  6     |
+-------+-------+------+
Copy the code

Supports JSON data and provides built-in functions

CREATE TABLE json_test (name JSON);
INSERT INTO json_test VALUES('{"name1": "value1", "name2": "value2"}');
SELECT * FROM json_test WHERE JSON_CONTAINS(name, '$.name1');
Copy the code

From JVM expert – da

Focus on the use of Explain in performance analysis

EXPLAIN SELECT settleId FROM Settle WHERE settleId = "3679"
Copy the code

Select_type, with several values: Simple (represents a simple select, without union and subquery), primary (has a subquery, the outermost select query is primary), union (the second or subsequent select query in the union, does not depend on external query results), Dependent union (second or subsequent select query in union, dependent on external query results)
Type, with several values: System (table has only one row (= system table), which is a special case of the const join type), const (constant query), ref(non-unique index access, normal index only), eq_ref (use unique index or component query), all (full table query), index (full table query by index), Range query
Possible_keys: possible index in the table
Key, which selects the index to use
Key_len, the length of the index used
“Rows”, the number of rows scanned, the bigger the worse
Extra, which has several values: Only index (information is retrieved from the index, faster than scanning the table), WHERE used (Using WHERE restriction), Using filesort (possibly sorting in memory or on disk), Using temporary (ordering the query results Using temporary tables)

(after)

Recommended reading

Java Notes Complete.md

Great, this Java site, everything! https://markerhub.com

The UP master of this B station, speaks Java really good!

Too great! The latest edition of Java programming ideas can be viewed online!