Database command specification
• All database object names must be separated by lowercase letters and underscores • All database object names must be disallowed using MySQL reserved keywords (if a table name contains a keyword query, it must be enclosed in single quotes) • Database object names must be named in such a way that they can be understood by name. • Temporary library tables must be prefixed with tMP_ and suffixed with date, and backup tables must be prefixed with bak_ and suffixed with date (timestamp). • All column names and column types that store the same data must be the same (typically as associated columns, If the associated column types are inconsistent during query, implicit data type conversion is automatically performed, which invalidates indexes on columns and reduces query efficiency.)
Basic database design specifications
1. All tables must use Innodb storage engine
All tables must use the Innodb storage engine (Myisam by default before MySQL5.5, Innodb by default after 5.6) if there are no special requirements (i.e. features Innodb cannot meet such as column storage, storage space data, etc.).
Innodb supports transactions, supports row-level locking, better recovery, and better performance under high concurrency.
2. Use UTF8 for the database and table character sets
The unified character set can avoid garbled characters caused by character set conversion. Conversion before comparing different character sets will cause index failure. If the database needs to store emoji, the character set should use UTF8MB4 character set.
3. All tables and fields need to be commented
Add comments to tables and columns using comment clauses, and maintain data dictionaries from the start
4. Limit the amount of data in a single table to less than 5 million.
5 million is not the limit of MySQL database, it will cause big problems in modifying table structure, backup and recovery.
You can control the amount of data by archiving historical data (for log data), dividing database tables (for business data), and so on
5. Use the MySQL partition table with caution
Partitioned tables physically represent multiple files and logically represent a single table.
Careful selection of partition keys may reduce cross-partition query efficiency.
You are advised to manage big data in a physical table.
6. Try to separate hot and cold data and reduce the width of the table
MySQL limits each table to a maximum of 4096 columns, and the size of each row cannot exceed 65535 bytes.
Reduce disk I/O to ensure memory cache hit ratio of hot data (the wider the table, the more memory it takes to load the table into the memory buffer pool, and the more I/O it consumes);
Make more efficient use of caching to avoid reading useless cold data;
Columns that are often used together are placed in one table (to avoid further association).
7. Do not create reserved fields in the table
The naming of reserved fields is difficult to recognize by name.
The reserved field cannot confirm the data type to be stored. Therefore, an appropriate data type cannot be selected.
Changes to the reserved field type lock the table.
8. Do not store large binary data such as pictures and files in the database
When a file is large, the amount of data increases rapidly in a short period of time. A large number of random I/O operations are performed when a database is being read.
Usually stored in the file server, the database only stores the file address information
9. Do not perform database stress tests online
10. Do not connect to the build environment database from the development environment or test environment
Database field design specification
1. Select the smallest data type that meets storage requirements
The reason:
The larger the field of a column is, the more space is needed to create an index. In this way, the number of inodes that can be stored in a page is smaller and smaller. The more I/O times required for traversal, the worse the index performance is.
Methods:
A. Convert a string to a number for storage, for example, converting an IP address to an integer
MySQL provides two methods to handle IP addresses
•inet_aton converts an IP address to an unsigned integer (4-8 bits) •inet_ntoa converts an IP address to an address
Before inserting data, use inet_aton to convert an IP address into an integer to save space. When displaying data, use inet_ntoa to convert an INTEGER IP address into an address.
B. For non-negative data (such as self-increasing IDS and integer IP addresses), unsigned integers are preferred
The reason:
Unsigned provides twice as much storage space as signed
UNSIGNED INT 0 to 4294967295 the N in VARCHAR(N) represents the number of characters, not the number of bytes, Use UTF8 to store 255 Characters Varchar(255)=765 bytes. Too large a length will consume more memory.
2. Avoid using TEXT,BLOB data types; the most common TEXT type can store up to 64K of data
A. Separate BLOB or TEXT columns into separate extended tables is recommended
MySQL memory temporary tables do not support large data types such as TEXT and BLOB. If such data is contained in a query, disk temporary tables are used instead of memory temporary tables for sorting operations. And for this kind of data, MySQL still has to perform a second query, which will make SQL performance become poor, but it is not to say that such data type must not be used.
If you must, it is recommended to separate BLOB or TEXT columns into separate extended tables, never use SELECT * for queries but only extract necessary columns, and never query a TEXT column when it is not needed.
TEXT or BLOB types can only use prefix indexes
Because MySQL[1] limits the length of index fields, only prefixed indexes can be used for TEXT types, and there are no default values for TEXT columns
3. Avoid the ENUM type
To change the ENUM value, use the ALTER statement
The ORDER BY operation of ENUM type is inefficient and requires additional operations
Do not use numeric values as enumeration values of enUms
4. Define all columns as NOT NULL if possible
The reason:
Index NULL columns require extra space to hold, so they take up more space
NULL values are treated specially for comparison and calculation
5. Use TIMESTAMP(4 bytes) or DATETIME (8 bytes) to store the time
TIMESTAMP Storage time range 1970-01-01 00:00:01 to 2038-01-19-03:14:07
TIMESTAMP takes 4 bytes as INT, but is more readable than INT
The value beyond the TIMESTAMP range is stored as DATETIME
It is common for people to use strings to store date-based data (incorrect)
• Disadvantage 1: You can’t use date functions for calculation and comparison • Disadvantage 2: Storing dates as strings takes up more space
6. Finance-related amount class data must be of type DECIMAL
• Imprecise float: float,double • Exact float: decimal
The Decimal type is an exact floating-point number that does not lose precision when evaluating
The space taken up is determined by the width of the definition. Each 4 bytes can store 9 digits, and the decimal point takes up one byte
Can be used to store integer data larger than BigInt
Index design specification
1. Limit the number of indexes in a table. You are advised to limit the number of indexes in a table to five
More indexes are not always better! Indexing can improve efficiency and also reduce efficiency.
Indexes can increase query efficiency, but they also reduce insert and update efficiency, and in some cases, query efficiency.
Because MySQL when choosing how to optimize the query optimizer, according to a unified information, for each index can be used to evaluate, to generate a best execution plan, if there are a lot of index can be used to query at the same time, it will increase the time of MySQL optimizer to generate the execution plan, also can reduce the query performance.
2. Do not create a separate index for each column in the table
Before version 5.6, a SQL query could only use one index in a table. After 5.6, even with the optimization of merged indexes, it is still not as good as using a single combined index.
3. Every Innodb table must have a primary key
Innodb is an indexed organized table: data is stored in the same logical order as the index order. Each table can have multiple indexes, but the table can be stored in only one order.
Innodb organizes tables by primary key index order
• Do not use UUID,MD5,HASH, or string columns as primary keys (sequential growth cannot be guaranteed) • Self-increasing ID values are recommended for primary keys
4. Common index column suggestions
• Columns that appear in THE WHERE clause of SELECT, UPDATE, or DELETE statements • columns that are contained in ORDER BY, GROUP BY, or DISTINCT • Columns that match 1 and 2 are not indexed It is usually better to create joint indexes for fields 1 and 2
5. How to select the index column order
The purpose of creating an index is to reduce random I/OS and improve query performance. The less data an index can filter out, the less data it can read from the disk.
• Place the most discriminant columns at the far left of the union index (discriminant = number of different values in the column/total number of rows in the column) • Try to place columns with the smallest field lengths at the far left of the union index (because smaller field lengths mean more data can be stored on a page, • The most frequently used columns are placed to the left of the joint index (this allows fewer indexes to be created)
6. Avoid creating redundant and duplicate indexes (increases the time for the query optimizer to generate the execution plan)
• Duplicate indexes: primary key(ID), INDEX (ID), and unique index(ID) • Redundant indexes: index(A, B, C), INDEX (a, B), and index(a)
7. Override indexes are preferred for frequent queries
Overwrite index: an index that contains all query fields (where, SELECT, Ordery by,group by)
Benefits of overwriting indexes:
• Avoid secondary queries for Innodb tables with indexes: Innodb is stored in the order of clustered indexes. For Innodb, secondary indexes store the primary key information of rows in leaf nodes. If we use secondary indexes to query data, after finding the corresponding key value, we need to conduct a secondary query through the primary key to obtain the data we really need. In an overwrite index, all data can be obtained from the key value of the secondary index, avoiding secondary query on the primary key, reducing I/O operations and improving query efficiency. •** Can change random I/o to sequential I/O to speed up query efficiency :** Since override indexes are stored sequentially by key values, IO intensive range lookups require much less I/O than randomly reading each row from disk. Therefore, it is also possible to convert random read IO from disk to sequential index lookups during access by using overwrite indexes.
8. Index SET specification
Try to avoid using foreign key constraints
• Foreign keys are not recommended, but must be indexed on the associated keys between tables. • Foreign keys can be used to ensure data referential integrity, but it is recommended to implement them on the business side. • Foreign keys can affect write operations on parent and child tables, thus reducing performance
Database SQL development specification
1. You are advised to use precompiled statements to perform database operations
Precompiled statements can reuse these plans, reduce the time required for SQL compilation, and solve the SQL injection problems that dynamic SQL presents.
Passing only parameters is more efficient than passing SQL statements.
The same statement can be parsed once and used multiple times to improve processing efficiency.
2. Avoid implicit conversions of data types
Implicit conversions can invalidate indexes as follows:
select name,phone from customer where id = ‘111’; 3. Make full use of indexes that already exist on the table
Avoid query conditions with double % numbers. A like ‘%123%’, a like ‘%123%’,
An SQL can only use one column in a composite index for range queries. For example, if there is a joint index on columns A, B, and C, and if there is a range query on columns A in the query condition, the indexes on columns B and C will not be used.
When defining a union index, if column A is used for range lookups, place column A to the right of the union index and use left Join or NOT EXISTS to optimize the NOT in operation, since not in also usually uses index invalidation.
4. The database should be designed with future extensions in mind
5. Programs connect to different databases using different accounts, base cross-library query
• Leave room for database migration and database fragmentation • Reduce business coupling • Avoid security risks caused by excessive permissions
6. Disable SELECT * You must use SELECT < field list > to query
The reason:
• Consumes more CPU and IO in network bandwidth resources • Cannot use overwrite indexes • Reduces the impact of table structure changes
7. Disallow INSERT statements without a list of fields
Such as:
insert into values ('a'.'b'.'c');
Copy the code
You should use:
insert into t(c1,c2,c3) values ('a'.'b'.'c');
Copy the code
8. Avoid subqueries and optimize them to join operations
Generally, subqueries can be converted into associated queries for optimization only when the subquery is in clause and the subquery is simple SQL(excluding union, group BY, Order BY, and limit clauses).
Reasons for poor sub-query performance:
The result set of a subquery cannot use an index. Usually, the result set of a subquery is stored in a temporary table. No index exists in a memory temporary table or a disk temporary table, so the query performance is affected. Especially for subqueries that return large result sets, the impact on query performance will be greater.
Because sub-queries generate a large number of temporary tables and no indexes, they consume too much CPU and I/O resources and generate a large number of slow queries.
9. Avoid joining too many tables
For MySQL, there is an associative cache, and the size of the cache can be set by the join_buffer_size parameter.
In MySQL, if more tables are joined to the same SQL, an additional association cache will be allocated. If more tables are associated with a SQL, more memory will be occupied.
If a large number of programs use the operation of multi-table association, while the JOIN_BUFFer_SIZE setting is not reasonable, it is easy to cause server memory overflow, will affect the stability of server database performance.
In addition, temporary table operations may occur for association operations, affecting query efficiency. The MySQL allows a maximum of 61 tables to be associated, and it is recommended that no more than 5 tables be associated.
10. Reduce the number of interactions with the database
A database is more suitable for handling batch operations. Combining multiple identical operations improves processing efficiency.
11. When judging or for the same column, use in instead of OR
The value of in should not exceed 500. In operations can make more efficient use of the index, and or can make less use of the index in most cases.
12. Disallow random sorting using order by rand()
Order by RAND () loads all the eligible data in the table into memory, then sorts all the data in memory according to randomly generated values, and may generate a random value for each row. If the eligible data set is very large, it consumes a lot of CPU and IO and memory resources.
The recommended way to get a random value in a program and then get the data from the database.
13. The WHERE clause disallows function conversions and calculations on columns
Functional conversions or calculations on columns result in unusable indexes
Is not recommended:
where date(create_time)='20190101'
Copy the code
Recommendation:
where create_time >= '20190101' and create_time < '20190102'
Copy the code
14. Use UNION ALL instead of UNION when it is clear there will be no duplicates
• The UNION will put ALL the data in the two result sets into temporary tables before it deduplicates them. • The UNION ALL will not deduplicates the result sets
15. Split complex large SQL into multiple small SQL
• Large SQL is logically complex and requires a large number of cpus for calculation • In MySQL, only one CPU can be used for calculation •SQL can be split and executed in parallel to improve processing efficiency
Code of conduct for database operations
1. Perform batch write operations (UPDATE,DELETE,INSERT) for more than 1 million rows
Large volume operations can cause significant master-slave delays
In a master/slave environment, a large number of operations may cause serious master/slave latency. A large number of write operations generally take a long time to execute. However, after the write operations are completed on the master library, they are executed on other slave libraries, resulting in a long delay between the master and slave libraries
Binlog Logs in ROW format generate a large number of logs
Large quantities of write operation will produce large amounts of log, especially for the binary data row format, because each line is recorded in the row format data change, the more data we a change, the log will have, the more amount of log of transmission and the longer the recovery time is needed to, this is also a cause of the master-slave delay
Avoid large transaction operations
A large number of data changes must be performed in a transaction, which will cause a large number of data locks in the table, resulting in a large number of blocks, which will have a significant impact on MySQL performance.
In particular, long blocks can fill up all available connections to the database, preventing other applications in the production environment from connecting to the database, so it is important to note that bulk writes are done in batches
2. Run the pt-online-schema-change command to modify the table structure for large tables
• Avoid master/slave delays due to large table modifications • Avoid table locking when table fields are modified
Changes to large table data structures must be made with caution. Serious table locking operations, especially in production environments, are not tolerated.
Pt-online-schema-change creates a new table with the same structure as the original table, changes the table structure on the new table, copies the data from the original table to the new table, and adds some triggers to the original table. Copy all new data from the original table to the new table. After all data is copied, name the new table as the original table and delete the original table. The original DDL operation is broken down into smaller batches.
3. Do not grant super permission to the account used by the program
• When the maximum number of connections is reached, 1 user with super permission is also run. • Super permission can only be reserved for the account that handles the problem by the DBA
4. Follow the principle of minimum permissions for application database access accounts
• The database account used by an application can only be used in one DB database and cannot be used across databases. • The account used by an application cannot have drop permission in principle