To prepare for deployment, create sample tables and sample data

‐‐ Example table:CREATE TABLE `t1` (
	`id` INT ( 11 ) NOT NULL AUTO_INCREMENT,
	`a` INT ( 11 ) DEFAULT NULL,
	`b` INT ( 11 ) DEFAULT NULL.PRIMARY KEY ( `id` ),
	KEY `idx_a` ( `a` ) 
) ENGINE = INNODB DEFAULT CHARSET = utf8;
CREATE TABLE t2 LIKEt1; ‐ insert some sample data into the T1 table1Ten thousand rowsDROP PROCEDURE
IF
	EXISTS insert_t1;

delimiter;;
CREATE PROCEDURE insert_t1 () BEGIN
	DECLARE
		i INT;
	SET i = 1;
	WHILE
			( i < = 10000 ) DO
			INSERT INTO t1 ( a, b )
		VALUES
			( i, i );
		SET i = i + 1;
	END WHILE;
END;;
delimiter;
CALLinsert_t1 (); ‐‐ Insert into table T2100rowsDROP PROCEDURE
IF
	EXISTS insert_t2;

delimiter;;
CREATE PROCEDURE insert_t2 () BEGIN
	DECLARE
		i INT;
	SET i = 1;
	WHILE
			( i < = 100 ) DO
			INSERT INTO t2 ( a, b )
		VALUES
			( i, i );
		SET i = i + 1;
	END WHILE;
END;;

delimiter;
CALL insert_t2 ();
Copy the code

Paging optimization

  • This is a paging query for the 10 items after the 10,000th item. In the end, it seems that only 10 pieces of data are displayed in the final query. In fact, 10010 pieces of data are queried here, and the last 10 pieces of data are returned for display. Therefore, the efficiency of paging query for a large table is very slow.
select * from employees limit 10000.10;
Copy the code
  • Optimization 1: If the incrementing ID is continuous and uninterrupted, then you can use the ID to check the results before paging. However, this is not practical in the actual scenario, because the deletion deletion will lead to the discontinuous deletion of the auto-increment ID, resulting in inconsistent results.
select * from employees where id > 10000 limit 10;
Copy the code
  • The mysql optimizer decided that it would be more expensive to scan the entire index and find rows without an index (traversing multiple index trees) than to successfully scan a full table, so the index query was not used. So the optimal sort here is to return as few fields as possible, so the primary key is first queried, and then the data rows are queried based on the primary key ID. Using filesort; Using index;
EXPLAIN select * from employees ORDER BY name limit 90000.5;
EXPLAIN select a.* from employees a INNER JOIN (select id from employees ORDER BY name limit 90000.5) b on b.id = a.id;
Copy the code

There are two common algorithms for table association in mysql

Nested Loop Join algorithm (BLJ)

  • Read the rows from the first table (the driver table) one row at a time, fetch the related fields from this row, fetch the qualified rows from the other table (the driven table) of the associated fields, and fetch the result set of the two tables;
EXPLAIN select * from t1 inner join t2 on t1.a= t2.a;
Copy the code
  • From the above execution results, it can be seen that T2 is the driver table and T1 is the driven table. The driver table is executed first (if the ids of the execution plan results are the same, the SQL is executed from the top down). The optimizer generally selects small tables as the driver table, so when using inner Join, the driver table is not necessarily the first one.
  • When left join is used, the left table is the driver table and the right table is the driven table. When right join is used, the right table represents the driver table and the left table is the driven table. When join is used, mysql will use small tables as channel tables and large tables as driven tables.
  • The NLJ algorithm is used. In general join statements, if the execution plan does not contain Using join buffer, the NLJ algorithm is used.
  • So the above SQL execution process:
    • First query the data in T2 (filter conditions, if any), then search in T1 according to keyword A, take out the data satisfying the conditions in T1 and the results in T2 for collection return, and then repeat the previous steps.
    • In the whole process, T2 will scan data 100 times, and then according to the condition A, T1 will also scan (a of t2 data will scan t1 data), so T1 will also scan 100 times, so the whole process will scan 200 rows.
  • If the table is not indexed, then the NLJ algorithm is inefficient and mysql uses BNL algorithm instead.

A Block Nested-loop Join algorithm

  • Read the data of the driver table to the Join_Buffer, scan the driver table, and compare each row of the driver table with the data in the Join_Buffer.
EXPLAIN select * from t1 inner join t2 on t1.b= t2.b;
Copy the code
  • Using join buffer

    • The data from T2 is put into join_Buffer, and then the data from T1 is compared with the join_buffer. Finally, the data set that meets the conditions is returned.
    • In the whole process, t1 and T2 are all full table scans, so 10100 data are scanned. In addition, the data in the Join_buffer is unordered, so 100 judgments are made on t1 data. Therefore, the number of judgments in memory is 100 x 10000.
  • If the number of entries in the join_buffer is larger than the number of entries in the join_buffer table, mysql will use the number of entries in the join_buffer table. If the number of entries in the join_buffer table is larger than the number of entries in the join_buffer table, mysql will use the number of entries in the join_buffer table. In the join_buffer, only 1000 entries can be added into the join_buffer. T1 then emptying the join_buffer and emptying the remaining 1000 entries into T2. T1 then emptying the join_buffer and emptying the join_buffer again. So there’s an extra scan of T1.

  • If the BLJ algorithm is used here, then 1 million disk scans is very slow, so BNL algorithm is used here.

  • So mysql association optimization

    • Mysql join (NLJ); mysql join (NLJ); Then the small table drives the big table.
    • The definition of a small table is that the table with small data amount after filtering the associated table is a small table and should be used as a driver table.

In and Exsits optimization

  • In principle, small tables drive large tables.
Example 1:select * from A where id in (select id fromB) Example 2:select * from A where exists (select 1 from B where B.id = A.id)
Copy the code
  • In: When the data set of table B is smaller than that of table A, in takes precedence over EXISTS
  • Exists: When the data set of table A is smaller than that of table B, exists takes precedence over IN
  • Exists () only returns true or false. So select * in a subquery can be changed to select 1. Mysql officially ignores the select field in a subquery, so there is no difference.
  • The actual execution of a neutron query may be optimized rather than a comparison as we understand it.

Count (*) optimization

-- Temporarily turn off the mysql query cache in order to see the true time of multiple SQL executions
set global query_cache_size=0;
set global query_cache_type=0;

EXPLAIN select count(1) from employees;
EXPLAIN select count(id) from employees;
EXPLAIN select count(name) from employees;
EXPLAIN select count(*) from employees;
Copy the code

  • The results of the four operation plans are all the same, indicating that the efficiency is similar:
    • Count (*) = count(1) > count(field) > count(id); Count (id) > count(id); count(id) > count(id);
    • Field no index: count(*) = count(1) > count(id) > count(field); Count (id) > count(id); count(id) > count(id);
    • Count (1) is similar to count(field), but count(1) doesn’t fetch data, and count(field) does fetch data, so count(1) is theoretically faster than count(field).
  • Note: When using count(*), mysql does not take out all the fields. It is optimized to not add values by row, so you do not need to replace count(*) with count(field) or count(1).
  • Note: For count(ID), the secondary index stores less data than the primary key index, so the retrieval performance is better. Mysql internal optimization (after 5.7).

Mysql Data type

Numeric types

type The size of the Range (signed) Range (unsigned) role
tinyint 1 byte (128, 127) (0, 255) Small integer
SMALLINT 2 – (-32 768, 32 767) (0, 65, 535) Large integer value
MEDIUMINT 3 bytes (-8 388 608, 8 388 607) (0, 16, 777, 215) Large integer value
INT or an INTEGER 4 bytes (-2 147483 648, 2 147483 647) (0, 4 294 967 295) Large integer value
BIGINT 8 bytes (-9 233 372 036 854 775 808, 9 223 372 036 854 775 807) (0, 18 446 744 073 709 551 615) Maximum integer value
FLOAT 4 bytes (-3.402 823 466e +38, 1.175 494 351 E38), 0, (1.175 494 351 e-38, 3.402 823 466 351 E+38) 0, (1.175 494 35e-38, 3.402 823 466e +38) Single precision floating point value
DOUBLE 8 bytes (1.797 693 134 862 315 7 E+308, 2.225 073 858 507 201 4 e-308), 0, (2.225 073 858 507 201 4e-308), 1.797 693 134 862 315 7 E+308) 0, (2.225 073 858 507 201 4 E308, 1.797 693 134 862 315 7 E+308) Double precision floating point value
DECIMAL For DECIMAL(M,D), M+2 if M>D otherwise D+2 Depends on the values of M and D Depends on the values of M and D Small numerical
  • Optimization Suggestions
    1. If the integer does not have a negative number, such as an ID, you are advised to specify an UNSIGNED type. The size can be doubled.
    2. You are advised to use TINYINT instead of ENUM, BITENUM, and SET.
    3. Avoid using integer widths (see the end of the documentation). That is, do not specify field widths like INT(10). Use INT instead.
    4. DECIMAL is best suited for storing data that requires high accuracy and is used for computation, such as prices. But when using the DECIMAL type, pay attention to the length setting.
    5. It is recommended to use the integer type to operate and store real numbers by multiplying the real numbers by their corresponding multiples.
    6. Integers are often the best data type because they are fast and can use AUTO_INCREMENT.

Time to type

type Size (bytes) Range (signed) format use
DATE 3 The 1000-01-01 to 1000-01-01 YYYY-MM-DD Date value
TIME 3 ‘- 838:59:59’ to ‘838:59:59’ HH:MM:SS Time value or duration
YEAR 1 From 1901 to 2155 YYYY Year value
DATETIME 8 1000-01-01 00:00:00 to 9999-12-31 23:59:59 YYYY-MM-DD HH:MM:SS Mix date and time values
TIMESTAMP 4 1970-01-01 00:00:00 to 2038-01-19 03:14:07 YYYYMMDDhhmmss Mix date and time values, time stamps
  • Optimization Suggestions
    1. The minimum time granularity MySQL can store is seconds.
    2. It is recommended to use the DATE data type to store dates. The default date format in MySQL is YYYY-MM-DD.
    3. Use MySQL’s built-in DATE, TIME, and DATETIME types to store TIME instead of using strings.
    4. When the data formats are TIMESTAMP and DATETIME, you can use CURRENT_TIMESTAMP as the default (after MySQL5.6) and MySQL will automatically return the exact time the record was inserted.
    5. TIMESTAMP is a UTC TIMESTAMP and is related to the time zone.
    6. DATETIME is stored as an integer YYYYMMDD HH:MM:SS, no matter where the time zone is.
    7. Unless there is a special need, companies generally recommend using TIMESTAMP, it is more space saving than DATETIME, but companies like Ali will generally use DATETIME, because there is no need to worry about the future time limit of TIMESTAMP.
    8. Sometimes people save Unix timestamps as integer values, but this usually doesn’t do any good. This format is not convenient to handle, and we don’t recommend it.

String type

type Size (bytes) use
CHAR 0-255 bytes A fixed-length string, char(n) When the actual length of the inserted string is less than n, Spaces are added for supplementary saving. Trailing Spaces are removed during retrieval.
VARCHAR 0-65535 bytes A variable length string. N in varchar(n) represents the maximum column length. Spaces are not added when the actual length of the inserted string is less than N
TINYBLOB 0-255 bytes A binary string of up to 255 characters
TINYTEXT 0-255 bytes Short text string
BLOB A scale of 0-65 to 535 bytes Long text data in binary form
TEXT A scale of 0-65 to 535 bytes Long text data
MEDIUMBLOB 0-16 777 215 bytes Medium length text data in binary form
MEDIUMTEXT 0-16 777 215 bytes Medium length text data
LONGBLOB 0-4 294 967 295 bytes Large text data in binary form
LONGTEXT 0-4 294 967 295 bytes Maximum text data
  • Optimization Suggestions
    1. If the length of a string differs greatly, use VARCHAR. The string is short and all values are close to one length using CHAR.
    2. CHAR and VARCHAR apply to any alphanumeric combination that includes people’s names, zip codes, phone numbers, and up to 255 characters in length. Do not use VARCHAR for numbers that are to be computed, as this may cause computation-related problems. In other words, the accuracy and completeness of the calculation may be affected.
    3. Use BLOB and TEXT as little as possible. If you do, consider storing the BLOB and TEXT fields in separate tables and associating them with ids.
    4. The BLOB series stores binary strings, independent of character sets. The TEXT series stores non-binary strings and is related to the character set.
    5. Neither BLOB nor TEXT can have default values.

INT Display width

  • We often use a command to create a table and specify a length, as shown below. But the length here is not the maximum length stored by the TINYINT type, but the maximum length displayed.
1 CREATE TABLE `user` (2 `id` TINYINT(2) UNSIGNED
3 );
Copy the code
  • The id field of the user table is of type TINYINT, and the maximum value that can be stored is 255. Therefore, when storing data, if the stored value is less than or equal to 255, such as 200, although it is more than 2 bits, but does not exceed the length of the TINYINT type, it can be stored normally; If the saved value is greater than 255, such as 500, MySQL will automatically save it to the maximum value of type TINYINT 255.
  • When querying data, regardless of the value of the query result, the actual output. In TINYINT(2), ZEROFILL is used when a query result needs to be preceded by a zero. For example:
1 `id` TINYINT(2) UNSIGNED ZEROFILL
Copy the code
  • Thus, if the query result is 5, the output will be 05. If you specify TINYINT(5), the output will be 00005, but the actual value stored will be 5, and the data stored will not exceed 255, but MySQL will output the data with a 0 in front.
  • In other words, in MySQL commands, the type lengths of fields such as TINYINT(2) and INT(11) do not affect data insertion, and are only useful when using ZEROFILL, which prefixes query results with 0.

Other optimization methods

  • Query the total number of rows in a mysql table: SQL queries without WHERE perform best if the myISAM storage engine is used, since MyISam stores the total number of rows on disk without any additional statistical calculations. Innodb calculates the total number of rows in real time.
show table status like 'employees';
Copy the code

  • Show table status gives you the total number of rows, and it’s also very high-performance.

The last

  • Study with an open mind and make progress together