Why does everyone say SELECT * is inefficient

  • First, the cause of low efficiency

      1. Unneeded columns increase data transfer time and network overhead
      1. For large fields that are useless, such as VARCHar, BLOb, and text, IO operations are added
      1. Lost the possibility of MySQL optimizer “overwrite index” policy optimization
  • Second, index knowledge extension

    • Joint index (A, B, C)
    • The advantages of federated indexes
    • Is it better to build as many indexes as possible
  • Three, experience


Interviewer: “Chen, tell me about your common SQL optimization methods.” Chen xiaoha: “that many ah, such as do not use SELECT *, query efficiency is low. Blah blah blah…” Interviewer: “Why not use SELECT *? Where is it inefficient?” SELECT * from table_name (SELECT * from table_name); SELECT * from table_name (SELECT * from table_name); Interviewer: “HMM…” Chen xiaoha: “EMMM ~ didn’t” Chen Xiaoha: “…. ?? Interviewer: “HMM… Well, is there anything else you want to ask me?” Chen xiaoha: “I ask you a hammer, return my resume to me!”

Whether at work or in an interview, do not use “SELECT *” in SQL, are we overhear the problem, although overhear, but the general understanding is still in a very shallow level, and there are not many people to get to the bottom, explore its principle.

Without further ado, this article gives you an in-depth look at why and how “SELECT * “is inefficient.

“This article is very dry! Bring your own tea, save it before you have time to read it — advice from a programmer who was beaten up by his tech manager for years

First, the cause of low efficiency

MySQL > MySQL > MySQL > MySQL > MySQL > MySQL > MySQL > MySQL > MySQL

4-1. ** [mandatory] ** In table query, do not use * as the query field list, which fields must be clearly indicated. Description:

  • Increased query parser parsing costs.
  • Adding or subtracting fields may be inconsistent with the resultMap configuration.
  • Useless fields add network overhead, especially fields of the text type.

The development manual Outlines several reasons, so let’s take a closer look:

1. Unnecessary columns increase data transmission time and network overhead

  1. Use “SELECT *” database need to resolve more objects, fields, permissions, attributes and other related content, in the CASE of SQL statement complex, hard parsing more, will cause a heavy burden on the database.
  2. Increase network overhead; * Sometimes useless and large text fields such as log and IconMD5 will be mistakenly added, and the data transfer size will increase geometrically. This overhead is obvious if the DB and the application are not on the same machine
  3. Even if the mysql server and client are on the same machine and still use TCP, communication takes extra time.

2. Add I/O operations for large fields that are useless, such as VARCHar, BLOb, and text

To be precise, exceeding 728 bytes serializes the excess data to another location, so reading this record adds an IO operation. (MySQL InnoDB)

3. Lost the possibility of MySQL optimizer “overwrite index” strategy optimization

SELECT * eliminates the possibility of overwriting indexes, and the “overwriting index” strategy based on MySQL optimizer is extremely fast, efficient, and highly recommended query optimization method in the industry.

For example, let’s have a table called T (a,b, C, D,e,f) where A has a primary key and B has an index.

(a, B, C, D, E, F) and (a, B), respectively. If the where condition can filter out some records through the index of column B, the secondary index will go first. If the user only needs the data in columns A and B, the secondary index can be used directly to know the data queried by the user.

If the user uses SELECT * to retrieve unwanted data, the data is filtered through the secondary index first, and then all columns are retrieved through the clustered index, which is an additional B + tree query and necessarily much slower.

Due to the secondary index data, there were fewer than the clustered index in many cases, cover index using auxiliary index (by index can get all the columns) to meet the needs of users, all don’t need to read disk, direct access from within, and clustered index may data in the disk (CRT) (depending on the buffer pool size and shooting), in this case, One is memory read, one is disk read, the speed difference is significant, almost an order of magnitude difference.

Second, index knowledge extension

Secondary index (s); secondary index (s); secondary index (s)

Joint index (A, B, C)

The joint index (A, B, C) actually establishes three indexes (a), (a,b), (a, B, C)

We can think of a composite index as a level 1 directory, level 2 directory, and level 3 directory of a book. For example, index(a, B, C) is a level 1 directory,b is the level 2 directory under the level 1 directory, and C is the level 3 directory under the level 2 directory. To use a directory, you must first use its parent directory, except for the first level directory.

The advantages of federated indexes

1) Reduce expenses

To create a joint index (a,b,c) is equivalent to creating three indexes (a), (a,b), (a,b, C). Each additional index increases the overhead of write operations and disk space. For tables with a lot of data, using a federated index can greatly reduce overhead!

2) Overwrite the index

SQL > select * from (a,b,c);

SELECT a,b,c from table where a='xx' and b = 'xx';
Copy the code

MySQL can then retrieve data directly by traversing the index without returning to the table, which reduces a lot of random I/O operations. Reducing IO operations, especially random IO, is actually a major optimization strategy for DBAs. Therefore, in real application, overwriting index is one of the main optimization methods to improve performance.

3) High efficiency

The more index columns, the less data is filtered through the joint index. SQL > create table with 1000W entries

select col1,col2,col3 from table where col1=1 and col2=2 and col3=3;
Copy the code

Assumption: Assume that each condition can filter out 10% of the data.

  • A. If there is only A single column index, then the index can filter 1000W10%= 100W data, and then go back to the table from 100W data to find col2=2 and COL3 = 3 data, and then sort, and then paging, and so on (recursion);
  • B. If it is a (COL1, COL2, COL3) joint index, filter 1000W10% 10% *10%= 1W through the three column index, the efficiency improvement can be imagined!

Is it better to build as many indexes as possible

The answer, of course, is no

  • Tables with small data volumes do not need to be indexed, which incurs extra index overhead
  • Do not index columns that are not frequently referenced, because they are not often used and even if they are indexed, they do not make much sense
  • Do not index columns that are frequently updated, as this will definitely affect the efficiency of inserts or updates
  • Data is repeated and evenly distributed fields, so it is not very effective to create indexes (for example, gender fields, only male and female, not suitable for indexing)
  • Data changes require indexes to be maintained, meaning that more indexes mean more maintenance costs.
  • More indexes also require more storage space

The last

Writing this article is mainly this knowledge point online summary is rarely very scattered, also not standard, is to give yourself is to give you a summary of a more detailed, worth remembering. Tell the interviewer what to say so he can’t pick on you