“Do not use SELECT *” has almost become a golden rule used by MySQL. Even ali Java Development Manual explicitly states that * is not allowed to be used as the list of fields in the query, which makes this rule authoritative.
However, I tend to use SELECT * directly during development for two reasons:
- Because of the simplicity, the development efficiency is very high, and if the late frequent addition or modification of fields, SQL statements do not need to change;
- I think it’s a bad habit to optimize too early, unless at the beginning you can determine what fields you actually need in the end and index them properly; Otherwise, I choose to optimize my SQL only when I run into trouble, as long as the trouble isn’t fatal.
However, we should always know why it is not recommended to use SELECT * directly, and this article gives four reasons.
1. Unnecessary disk I/O
We know that MySQL essentially stores user records on disk, so a query operation is an act of disk I/O (provided that the record to be queried is not cached in memory).
The more fields you query, the more data you need to read, which increases the DISK I/O overhead. This is especially true when certain fields are of type TEXT, MEDIUMTEXT, or BLOB.
The use ofSELECT *
Will MySQL take up more memory?
Theoretically not, because instead of storing the entire result set in memory and passing it to the client all at once, each row fetched from the storage engine is written to a memory space called net_buffer, the size of which is controlled by the system variable net_buffer_length. The default is 16KB; When the net_buffer is full, data is written to the socket send buffer in the memory space of the local network stack and sent to the client. After the socket send buffer is successfully sent (the client finishes reading data), the net_buffer is cleared, and the next line is read and written.
That is, by default, the maximum amount of memory used by the result set is net_BUFFer_LENGTH, with no additional memory for a few more fields.
2. Increase network latency
The socket send buffer is used to send data to the socket buffer. The socket send buffer is used to send data to the socket buffer. The socket send buffer is used to send data to the socket buffer.
This overhead is obvious if MySQL and the application are not on the same machine. Even if the MySQL server and client are on the same machine and still use TCP, communication takes extra time.
3. Overwrite index cannot be used
To illustrate this, we need to build a table
CREATE TABLE `user_innodb` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL.`gender` tinyint(1) DEFAULT NULL.`phone` varchar(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `IDX_NAME_PHONE` (`name`.`phone`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Copy the code
We created a table user_InnoDB with storage engine as InnoDB, and set id as primary key, and created a joint index for name and phone, and finally randomly initialized 500W+ data into the table.
InnoDB automatically creates a B+ tree called a primary key index (also called a cluster index) for primary key ids. The most important feature of this B+ tree is that the leaf nodes contain the complete user record, which looks something like this.
If we execute this statement
SELECT * FROM user_innodb WHERE name = 'Cicada Mu Wind';
Copy the code
Use EXPLAIN to view the statement execution plan:
It is found that the SQL statement uses the IDX_NAME_PHONE index, which is a secondary index. The leaf node of the secondary index looks like this:
InnoDB storage engine will find the name of the leaf node in the secondary index according to the search criteria, but the secondary index only records the name, phone and primary key ID field (why we use SELECT *). So InnoDB needs to take the primary key ID and look up the complete record in the primary key index, a process called table-back.
If the leaf node of the secondary index has all the data we want, we don’t need to return the table. Yes, this is the overwrite index.
For example, we just want to search the name, phone, and primary key fields.
SELECT id.name, phone FROM user_innodb WHERE name = "Cicada Mu Wind";
Copy the code
Use EXPLAIN to view the statement execution plan:
It can be seen that the column “Extra” shows Using index, indicating that our query list and search criteria only contain columns belonging to a certain index. In other words, the overwritten index is used, which can directly abandon the back table operation and greatly improve the query efficiency.
4. May slow down the JOIN query
We create two tables T1, t2 do join operations to illustrate the following problem, and insert 100 data into t1 and 1000 data into T2.
CREATE TABLE `t1` (
`id` int NOT NULL.`m` int DEFAULT NULL.`n` int DEFAULT NULL,
PRIMARY KEY (`id`))ENGINE=InnoDB DEFAULT;
CREATE TABLE `t2` (
`id` int NOT NULL.`m` int DEFAULT NULL.`n` int DEFAULT NULL,
PRIMARY KEY (`id`))ENGINE=InnoDB DEFAULT;
Copy the code
If we execute the following statement
SELECT * FROM t1 STRAIGHT_JOIN t2 ON t1.m = t2.m;
Copy the code
Here I used STRAIGHT_JOIN to force the T1 table as the driven table and the T2 table as the driven table
For join queries, the driven table is accessed only once, whereas the driven table is accessed many times, depending on the number of entries in the driven table that match the query record. Now that the driver and driven tables are forcibly identified, let’s talk about the nature of joining two tables:
t1
As a driver table, performs pairs against the filter criteria of the driver tablet1
Table query. Because there is no filter condition, which is fetcht1
All data in the table;- For each record in the result set obtained in the previous step, the records are respectively put into the driven table, and the matching records are searched according to the connection filtering conditions
In pseudocode the whole process looks like this:
// t1Res is the result set filtered against driver table T1
for (t1Row : t1Res){
// t2 is the complete driven table
for(t2Row : t2){
if(satisfy join condition && satisfy t2 filter condition){send to client}}}Copy the code
This approach is the simplest, but also the least performing, and is called nested-loopJoin (NLJ). How to speed up the connection?
One way is to create an index, the best is in the driver table (t2) connection condition of fields involved in creating indexes, after all, was the driver table needs to be queried many times, and access to the driver table is essentially just a single table query (for t1 result set set, each connection query conditions of t2 is dead).
Since we use indexes, to avoid repeating the mistake of not being able to use overwrite indexes, we should also try not to SELECT * directly, but instead use the actual fields as query columns and create appropriate indexes for them.
But if we don’t use indexes, does MySQL really do join queries like nested circular queries? Of course not, this kind of nested circular query is too slow!
Prior to MySQL8.0, MySQL provided block-based Nested Loop Join (BLJ) methods, and MySQL8.0 introduced hash Join methods, both of which were proposed to solve a problem, That is, to minimize the number of accesses to the driven table.
Both methods use a fixed-size memory area called the Join Buffer, which stores several records in the result set of the driven table (the only difference between the two methods is the form of storage). In this way, when the records of the driven table are loaded into memory, Matching the records of multiple driven tables in the Join buffer at one time can significantly reduce the I/O cost of the driven table and the cost of repeatedly loading the driven table from disk, since the matching process is completed in memory. The process of using join buffer is shown in the figure below:
If we look at the execution plan for the join query above, we see that a Hash JOIN was actually used (provided that no index was created for the join query field of the T2 table, otherwise the index would have been used and the Join buffer would not have been used).
In the best case, the Join buffer is large enough to hold all the records in the result set of the driven table, so that only one access to the driven table is required to complete the join operation. We can configure this using the system variable join_BUFFer_size, which is 256KB by default. If not, the result sets of the driver table must be placed into the Join buffer in batches. After the join buffer is compared in memory, the join buffer is emptied and the next set of result sets is loaded until the join is complete.
Here we go! Not all columns from the driver table are added to the Join Buffer. Only columns from the query list and filter criteria are added to the Join buffer, so once again, it is better not to use * as the query list, but to put the columns we care about in the query list. This also places more records in the Join buffer, reducing the number of batching, which naturally reduces the number of visits to the driven table.
Recommended reading
-
Some things you must know to use MySQL indexes well
-
This time, understand MySQL indexes thoroughly
-
MySQL primary key query why is so fast
-
How is an SQL update statement executed?
-
How is an SQL query executed?