Note: This note is fromDesign and optimization of high-performance scalable MySQL database architectureSort it out by yourself.
I. Project Description: The background of this note is the e-commerce platform project. E-commerce project involves the most technical aspects among various project types due to its characteristics of high concurrency, multi-threading and high energy consumption. Therefore, this paper takes it as an example:
1. Main functional modules:
2. For the establishment of a project, from requirement analysis to the decision of technology stack, the business execution is inseparable from the operation of the database in the persistence layer. The specifications of database library and table design have a great influence on the efficiency and availability of the project. As follows, the relevant specifications of the database are analyzed and explained:
Database design specifications include:
Database naming conventions; Basic design specification of database; Database index design specification; Database field design specification; Database SQL development specification; Code of conduct for database operations.Copy the code
2. Database naming conventions
1. All database object names must be lowercase and underlined
Different database names DbName DbName Different Table names Table TableCopy the code
2, all database object names do not use MySQL reserved keyword
select id , username , `from` ,age from tb_user
Copy the code
Note: If reserved words must be used, then the ‘ ‘symbol is added to the reserved words. The database will judge them as normal characters during compilation to avoid errors
3, the database object name to be able to see the name of the meaning, and good group, not more than 32 characters
Example: User database user_DB User account table user_AccountCopy the code
Temporary library tables must be prefixed with TMP and suffixed with date
Backup libraries, backup tables must be prefixed with bak and suffixed with dateCopy the code
5. All column names and column types that store the same data must be the same
Iii. Basic design specification of database
1. All tables must use Innodb storage engine
5.6Later default engines support transactions, row-level locking, better recovery, and better performance at high concurrencyCopy the code
2. Use UTF8 for database and table character sets
Unified character set can avoid garbled characters caused by character set conversion in MySQL UTF8 Character set Chinese character points3Two bytes, occupied by ASCII codes1bytesCopy the code
All tables and fields need to be commented
Add comments to tables and columns using comment clausesCopy the code
4. Try to control the amount of data in a single table, and it is recommended to control it within 5 million
500The MySQL database is not limited; You can use historical data archiving, sub-database sub-table and other means to control the data volumeCopy the code
How many thousands of data can MySQL store?
This limitation depends on the storage Settings and file system.Copy the code
5, use MySQL partition table carefully
A partitioned table consists of multiple files physically and a single table logically. Select partition keys carefully. Cross-partition query efficiency may be lowerCopy the code
6. Try to separate hot and cold data and reduce the width of the table
Reduce disk I/O to ensure the memory cache hit ratio of hot data. Use the cache more efficiently and avoid reading useless cold data into the same tableCopy the code
7. Do not create reserved fields in the table
The reserved field cannot be named by name. The reserved field cannot confirm the data type of the storage. Therefore, the table will be locked if the reserved field type is changedCopy the code
8, prohibit the storage of pictures, files and other binary data in the database
9. Do not do database stress tests online
10. Do not directly connect the development environment and test environment to the production environment database
4. Database index design specification
Indexes are important for database query performance note: Do not abuse indexesCopy the code
1. Limit the number of indexes in each table. You are advised to limit the number of indexes in each table to 5
More indexes are not always better! Indexes can improve efficiency and can reduce efficiency by prohibiting the creation of separate indexes for each column in a single tableCopy the code
2. Every Innodb table must have a primary key
Do not use columns that are frequently updated as the primary key. Do not use multi-column primary keys Do not use UUID,MD5,HASH, and string columns as the primary key. You are advised to use the self-increasing ID value as the primary keyCopy the code
3. Suggestions for common index columns
SELECT, the UPDATE,DELETEThe statementWHEREThe columns in the clause are contained inORDER BY,GROUP BY,DISTINCTField in multiple tablesJOINThe associated columnCopy the code
4, how to select the index column order
Put the most differentiated columns at the far left of the union index and try to put the columns with the smallest field lengths at the far left of the union index and the columns that are used most frequently at the left of the union indexCopy the code
5. Avoid creating redundant and duplicate indexes
primaryThe key (id), the index (id),unique index(id) -- duplicate indexIndex (a,b,c); index(a,b);- The union index has duplicate, resulting in redundancy
Copy the code
6. Override indexes are preferred for frequent queries
Overwrite index: The query goes to the index and needs to return data that happens to be the constituent fields of the index, in other wordsselectIs called an overwrite index. Benefits: Avoid secondary index lookups in Innodb tables and change random IO to sequential IO to speed up query efficiencyCopy the code
7, try to avoid using foreign keys
Foreign keys can be used to ensure referential integrity of data, but it is recommended to implement foreign keys on the business side, which can affect parent and child table writes and thus reduce performanceCopy the code
5. Database field design specification
1. Select the smallest data type that meets storage requirements
Converts a string to a numeric type to store INET_ATON('255.255255.255.`) = 4294967295
INET_NTOA( 4294967295) = `255.255255.255.`
Copy the code
2. Select the smallest data type that meets storage requirements
Use unsigned integer to store SIGNED for non-negative dataINT - 214748.~2147483647
UNSIGNED INT 0 ~ 4294967295
Copy the code
3. Select the smallest data type that meets storage requirements
VARCHARThe N in (N) represents the number of characters, not bytes. UTF8 is used to store Chinese charactersVarchar(255) = 765Too many bytes consume more memoryCopy the code
4. Avoid TEXT and BLOB data types
Make a proposal toBLOBOr the TEXT column is separated into a separate extended table TEXT orBLOBType can only use prefix indexCopy the code
5. Define all columns as NOT NULL if possible
The indexNULLColumns require extra space to store, so take up more space to compare and calculate againstNULLValue to do special processingCopy the code
6. Avoid using ENUM data types
This parameter is required when modifying the ENUM valueALTERStatement ENUM OREDR of typeBYThe operation efficiency is low. Additional operations are required. Do not use numeric values as enumeration values of enUmsCopy the code
7. Avoid storing date data in strings (bad practice)
disadvantages1: Cannot use the date function to calculate and compare disadvantages2Storing dates as strings takes up more spaceCopy the code
8. Use TIMESTAMP or DATETIME to store time
TIMESTAMP 1970- 01- 01 00:00:01 ~ 2038- 01- 19 03:14:07
TIMESTAMPTake up4Byte andINTSame, but more thanINTreadableCopy the code
SQL development specification
1. You are advised to use precompiled statements for database operations
Benefits of using precompiled statements:
Pass only parameters, than passSQLStatements are more efficient; The same statement can be parsed once and used multiple times to improve processing efficiencyCopy the code
Avoid implicit conversions of data types
Implicit conversions cause index invalidationselect name,phone from customer where id = `111`
Copy the code
3. Make full use of existing indexes on the table
Avoid using double%Number query condition. Such as alike `%123%', will result in index invalidationSQLOnly one column in the matching index can be used for range queriesCopy the code
4, the program connected to different databases using different accounts, cross-library query is prohibited
Reduce the degree of service coupling for database migration and database and table outflow to avoid security risks caused by excessive permissionsCopy the code
5, do not use SELECT *, must use SELECT < field list > query
Consuming more CPU and IO and network bandwidth resources that are not available to cover indexes can reduce the impact of table structure changesCopy the code
6. Avoid subqueries and optimize them for JOIN operations
The result set of the subquery cannot use the index subquery. Temporary table operations may occur. If the subquery data is large, the efficiency is severely affected and excessive CPU and I/O resources are consumedCopy the code
7. Avoid joining too many tables
everyJOINA table occupies too much memory (join_BUFFer_SIZE). Temporary table operations occur, affecting query efficiency. MySQL allows maximum association61Table, recommended no more than5aCopy the code
8. Reduce the number of interactions with the database
A database is more suitable for batch operations. Combining multiple identical operations together improves processing efficiencyalter table t1 add column c1 int,change column c2 int.Copy the code
Disallow random sorting using order by rand()
A. Random sort loads all the eligible data in the table into memory for sorting; B. Random sorting will consume a lot of CPU and IO and memory resources. It is recommended to obtain a random value in the program and then obtain data from the databaseCopy the code
10. The WHERE clause forbids function conversions and calculations on columns
Functional conversions or calculations on columns can make indexes unavailablewhere date(createtime)=`20160901`
where createtime > =`20160901` and createtime <`20160902`
Copy the code
11. Use UNION ALL instead of UNION when it is clear there will be no duplicates
UNIONAll data is put into temporary tables before de-redoUNION ALLResults and de-redo operations are no longer performedCopy the code
Vii. Code of conduct for database operation
1. Batch write operations over 1 million lines should be performed several times in batches
A large number of operations may cause serious master-slave delay in binlog asrowFormat generates a large number of logs to avoid major errorsCopy the code
2. For large tables, run pt-online-schema-change to modify the table structure
Avoid master/slave delays due to qualified changes Avoid locking tables while table fields are being modifiedCopy the code
3. Do not grant super permission to the account used by the program
Because when the maximum connection limit is reached, it is still allowed1A user with super permission can only connect to the account that handles the problemCopy the code
4, for the program to connect to the database account, follow the principle of minimum permissions
The program can only use the database account under one DB, and the account that is not allowed to be used by the cross-library program is not allowed in principledroppermissionsCopy the code
Hash, RANGE, and LIST partitions
MySQL > alter table table_name
1, definition: database table partition is one of the basic design specifications of the database, partition table in the physical performance of multiple files, in the logical performance of a table;
2, the disadvantages of table partitioning: be careful to select partition keys, the wrong operation may lead to cross-partition query efficiency.
You are advised to manage big data in a physical table.
Check whether the MySQL server supports partitioned tables
Use SHOW PLUGINS; Mysql > select * from mysql;
Query results in the “partition | ACTIVE | STORAGE ENGINE | NULL | the GPL” this line represents the current database to database partition table operations.
4, the physical structure of ordinary database table and partition table physical structure difference:
Partition SQL statement inside a statementpartition by hash(customer_id) partitions 4; Partition by what partition? Hash: Customer_id: partitions by key4:4In the partition diagram, the physical structure of a common table is on the left, and the physical structure of a database table after partition is on the right.Copy the code
Hash partition table (Hash partition)
1. Features of HASH partitioning
Stores rows in different partitions of the table according to the value of MOD (partition key, partition number) so that data is evenly distributed among partitions.
Note: The HASH partition’s key must be an INT or can be converted to an INT by a function.
Create HASH partition
use hash;
CREATE TABLE `hash`.`customer_login_log`(
customer_id int UNSIGNED not null,
login_time TIMESTAMP,
login_ip int UNSIGNED,
login_type TINYINT NOT NULL
) PARTITION by hash(login_ip) PARTITIONS 6;- Partitions are performed based on IP addresses
Copy the code
Alter table customer_login_log;Customer_login_log Physical structure of a common non-partitioned tableAlter table customer_login_log alter table customer_login_log alter table customer_login_log
INSERT INTO customer_login_log(customer_id,login_time,login_ip,login_type)
VALUES (1,now(),11111.1);
Copy the code
SQL > alter table table_name;
RANGE partition table (partition by RANGE)
1. RANGE partition features:
The RANGE partition stores rows in different partitions of the table based on the RANGE of partition keys. The ranges of partitions must be contiguous but not overlapping.
Note: The VALUES LESS THAN attribute is used by default, meaning that each partition does not include the specified value
Create table RANGE;
create table `customer_login_log`(
`customer_id` int(10) UNSIGNED not null,
`login_time` TIMESTAMP not null,
`login_ip` int(10) UNSIGNED not null,
`login_type` TINYINT(4) NOT NULL
) ENGINE=INNODB
PARTITION BY RANGE( customer_id)(
PARTITION P0 VALUES LESS THAN (10000),
PARTITION P1 VALUES LESS THAN (20000),
PARTITION P2 VALUES LESS THAN (30000),
PARTITION P3 VALUES LESS THAN MAXVALUE
)
Copy the code
Partition scope description:
If the data range is 30000 to 40000, an error message is displayed if the data range is not 40000. However, when the MAXVALUE partition of p3 exists in the figure, data without specified partition range will be inserted into P3
3. Use scenario of RANGE partition
The partition key is of the date or time type. The partition key is included in all queries to periodically clear historical data by partition range
List partition (press partition key to select partition)
1, LIST partition features
Definition: LIST Partition is partitioned according to the LIST of partition key values. Like range partition, the LIST of each partition cannot be repeated
Note: Each row of data must be able to find the partition list; otherwise, data insertion fails
Select * from LIST where id = 1;
create table `customer_login_log_list`(
`customer_id` int(10) UNSIGNED not null,
`login_time` TIMESTAMP not null,
`login_ip` int(10) UNSIGNED not null,
`login_type` TINYINT(4) NOT NULL
) ENGINE=INNODB
PARTITION BY LIST (login_type)(
PARTITION P0 VALUES IN (1.3.5.7.9),
PARTITION P1 VALUES IN (2.4.6.8))Copy the code
Insert the value of a partition key that contains an uncreated partition, and return an error:
INSERT INTO customer_login_log_list(customer_id,login_time,login_ip,login_type)VALUES(100,now(),1.10)
Copy the code
Error screenshot:Partition p0 stores the value of login_type: 1,3,5,7,9. P1 stores data whose login_type is 2,4,6, and 8. The login_type of the inserted data is 10, which is not included in the p0 or P1 login_type range. Therefore, the data fails to be inserted, and an error message is displayed.
SQL execution plan and paging query optimization, partition key statistics
1. Analysis of execution plan
What does the execution plan tell us?
How can SQL query the contents of a scanned data row execution plan using an index to join the execution order of a query?
Analysis and examples of implementation plan contents
1, the ID column
The meaning of the ID column in the execution plan:
The data in the ID column is a group of numbers, indicating the order in which SELECT statements are executed. If the IDS are the same, the higher the ID value, the higher the priority and the earlier the SELECT statements are executedCopy the code
View the execution plan:
EXPLAIN SELECT
c.`category_name`,
a.`product_name`,
b.`title`
FROM
product_info a
JOIN product_comment b ON a.`product_id` = b.`product_id`
JOIN product_category c ON c.`category_id` = a.`one_category_id`;
Copy the code
Screenshots:Complex SQL view execution plan:
EXPLAIN
select title
from product_comment
WHERE product_id in (
SELECT max(product_id)
from product_info
WHERE one_category_id in (select min(category_id) from product_category)
)
Copy the code
Screenshots:
Id for1.2.3
Copy the code
Group query SQL query execution plan:
EXPLAIN
SELECT title
from (
SELECT one_category_id,max(product_id)AS pid
from product_info
GROUP BY one_category_id
) a JOIN product_comment b on a.pid = b.`product_id`
Copy the code
Screenshots:
Id for two1, a2
Copy the code
2, SELECT_TYPE columns
Execution Plan Case 1:
EXPLAIN
SELECT title
FROM product_comment
WHERE
product_id IN (
SELECT max(product_id)
FROM product_info
WHERE
one_category_id IN (
SELECT min(category_id)
FROM product_category
)
);
Copy the code
3, TABLE columns
Action: Specifies the name of the table in which the output row resides
<unionM,N>The value can be queried by the ids M and NunionThe resulting result set<derivedN>/<subqueryN>The result of the query with ID NCopy the code
Execution Plan Case 1:
EXPLAIN
select product_category.`category_name`,product_info.`product_name`,product_comment.`title`
from product_info
JOIN product_comment on product_info.`product_id`=product_comment.`product_id`
JOIN product_category on product_category.`category_id`=product_info.`one_category_id`
Copy the code
4, PARTITIONS are listed
Effect: For partitioned tables, the queried partition ID is NULL for non-partitioned tablesCopy the code
Execution plan Cases:
EXPLAIN
SELECT *
FROM `crn`.`customer_login_log`
where customer_id = 1
Copy the code
User login logs, for example, should use the user id of the user table as a condition of partition for data storage and archiving, this makes the same user all the data written to the same partition interval, to avoid the query log log will be carried out on the big table query for other users log on to log in the process of filtering and lead to efficiency loss!
5, the TYPE column
6, Extra columns
7, POSSIBLE_KEYS columns
MySQL can use these indexes to optimize queries
② The indexes on the columns involved in the query column will be listed, but may not be used
8, the KEY columns
① The query optimizer optimizes the actual index used in the query
② If no index is available, NULL is displayed
3. If an overwrite index is used in the query, the index only appears in the Key column
9, KEY_LEN columns
① Indicates the maximum length of the index field
② The length of Key_len is calculated from the field definition, not the actual length of the data
10, Ref
Indicates which columns or constants are used to find values on index columns
11, Rows of columns
① indicates the number of rows to be read estimated by MySQL based on index statistics
② The size of the Rows value is a statistical sampling result, which is not very accurate
12, Filtered column
① Represents the percentage of the number of rows returned to the number of rows to be read
② The greater the value of Filtered column, the better
③ Values of the Filtered column depend on statistics
Limitations on the implementation of the plan
1. Failed to show the effects of stored procedures, triggers, and UDFs on queries
2. Stored procedures cannot be analyzed using EXPLAIN
3. Earlier versions of MySQL only supported analysis of SELECT statements
15. Example of optimizing paging query
Requirements: create a federated index according to audit_STATUS and product_id. It is necessary to specify which value is placed to the left of the federated index. It is most appropriate to use product_id as the index to the left of the combined index.
Create an execution plan:
EXPLAIN
SELECT customer_id,title,content
from product_comment
where audit_status=1
AND product_id=199727
limit 0.5
Copy the code
Preliminary optimization to create a federated index:
CREATE INDEX idx_productID_auditStats on product_comment(product_id,audit_status)
Copy the code
Execution results of the optimized execution plan after adding indexes:The query efficiency is significantly improved. As can be seen from the type column, the original all-table scan query is reduced to non-unique index query.
Further optimize paging queries
SELECT t.customer_id,t.title,t.content
from (
SELECT `comment_id`
from product_comment
where product_id=199727 AND audit_status=1 LIMIT 0.5
) a JOIN product_comment t
ON a.comment_id = t.comment_id;
Copy the code
Optimized note: The comment_ID of the corresponding data is obtained by paging query. This query does not return other fields. By default, it can be queried by primary key index, which is extremely efficient. Metn_id is then used as a temporary sub-table to match comment_id with product_comment. In this case, comment_id is directly used to query comment_id and return other comment_id fields. This query method can save a lot of RESOURCES on I/O. When the amount of data is tens of thousands, the efficiency is not greatly affected.
How do I delete duplicate data
Delete the repeated comments for the same order and the same item from the comment table, save only the earliest one
1. Step 1: Check whether there are repeated comments on the same item of an order
select order_id,product_id,COUNT(*)
from product_comment
GROUP BY order_id,product_id HAVING COUNT(*)>1
Copy the code
Step 2: Backup the product_COMMENT table
Create backup table:
CREATE TABLE `mc_productdb`.bak_product_comment_200815
LIKE `mc_productdb`.product_comment;
Copy the code
Synchronize table data:
INSERT INTO `mc_productdb`.`bak_product_comment_200815`
select * from `mc_productdb`.`product_comment`;
Copy the code
3. Step 3: Delete repeated comments on the same order
DELETE a
FROM product_comment a
JOIN (
SELECT order_id,product_id,MIN(comment_id) AS comment_id
FROM product_comment
GROUP BY order_id,product_id
HAVING COUNT(*)> =2
) b ON a.order_id=b.order_id AND a.product_id=b.product_id
AND a.comment_id> b.comment_id
Copy the code
Xvii. Conduct inter-zone statistics
Demand: count the number of people who consume more than 1000 yuan, 800 to 1000 yuan, 500 to 800 yuan, and less than 500 yuan
SELECT count(CASE WHEN IFNULL(total_money,0) > =1000 THEN a.customer_id END) AS 'more than 1000'
,count(CASE WHEN IFNULL(total_money,0) > =800 AND IFNULL(total_money,0)<1000
THEN a.customer_id END) AS '800-1000'
,count(CASE WHEN IFNULL(total_money,0) > =800 AND IFNULL(total_money,0)<800
THEN a.customer_id END) AS '500-800'
,count(CASE WHEN IFNULL(total_money,0) <500 THEN a.customer_id END) 'less than 500'
from mc_userdb.customer_login a
LEFT JOIN
( SELECT customer_id,SUM(order_money) AS total_money
from mc_orderdb.`order_master` GROUP BY customer_id) b
ON a.customer_id=b.customer_id
Copy the code
Catch the problem SQL
Core: Optimize queries with execution plans
How do YOU find SQL that needs to be optimized? Answer: Slow query logs
Enable the mysql slow query log functionset global slow_query_log_file = /sql_log/slow_log.log;
set global log_queries_not_using_indexes = on; Not indexedSQLlogset global long_query_time = 0.001; How long does the crawl take to executeSQL(in seconds)set global low_query_log = on;
Copy the code