Database command specification

  • All database object names must be lowercase and underlined;
  • Disallow MySQL reserved keywords for all database object names (if the table name contains a keyword query, it needs to be enclosed in single quotes);
  • Database object names should be recognizable by name and preferably no longer than 32 characters;
  • Temporary library tables must be prefixed with tmp_ and suffixed with date, and backup tables must be prefixed with bak_ and suffixed with date (timestamp).
  • The names and types of all columns that store the same data must be the same. Generally, all columns are associated columns. If the associated column types are inconsistent, implicit data type conversion is automatically performed, which invalidates indexes on the columns and reduces the query efficiency.

Basic database design specifications

1. All tables must use Innodb storage engine

All tables must use the Innodb storage engine (Myisam by default before MySQL5.5, Innodb by default after 5.6) if there are no special requirements (i.e. features Innodb cannot meet such as column storage, storage space data, etc.).

Innodb supports transactions, supports row-level locking, better recovery, and better performance under high concurrency.

2. Use UTF8 for the database and table character sets

The unified character set can avoid garbled characters caused by character set conversion. Conversion before comparing different character sets will cause index failure. If the database needs to store emoji, the character set should use UTF8MB4 character set.

3. All tables and fields need to be commented

Add comments to tables and columns using comment clauses, and maintain data dictionaries from the start

4. Limit the amount of data in a single table to less than 5 million.

5 million is not the limit of MySQL database, it will cause big problems in modifying table structure, backup and recovery.

You can control the amount of data by archiving historical data (for log data), dividing database tables (for business data), and so on

5. Use the MySQL partition table with caution
  • Partitioned tables physically represent multiple files and logically represent a single table.
  • Careful selection of partition keys may reduce cross-partition query efficiency.
  • You are advised to manage big data in a physical table.
6. Try to separate hot and cold data and reduce the width of the table
  • MySQL limits each table to a maximum of 4096 columns, and the size of each row cannot exceed 65535 bytes.
  • Reduce disk I/O to ensure memory cache hit ratio of hot data (the wider the table, the more memory it takes to load the table into the memory buffer pool, and the more I/O it consumes);
  • Make more efficient use of caching to avoid reading useless cold data;
  • Columns that are often used together are placed in one table (to avoid further association).
7. Do not create reserved fields in the table
  • The naming of reserved fields is difficult to recognize by name.
  • The reserved field cannot confirm the data type to be stored. Therefore, an appropriate data type cannot be selected.
  • Changes to the reserved field type lock the table.
8. Do not store large binary data such as pictures and files in the database

When a file is large, the amount of data increases rapidly in a short period of time. A large number of random I/O operations are performed when a database is being read.

Usually stored in the file server, the database only stores the file address information

9. Do not perform database stress tests online
10. Do not connect to the build environment database from the development environment or test environment

Database field design specification

1. Select the smallest data type that meets storage requirements

The reason:

  • The larger the field of a column is, the more space is needed to create an index. In this way, the number of inodes that can be stored in a page is smaller and smaller. The more I/O times required for traversal, the worse the index performance is.

Methods:

A. Convert a string to a number for storage, for example, converting an IP address to an integerCopy the code

MySQL provides two methods to handle IP addresses

•inet_aton converts an IP address to an unsigned integer (4-8 bits) •inet_ntoa converts an IP address to an address

Before inserting data, use inet_aton to convert an IP address into an integer to save space. When displaying data, use inet_ntoa to convert an INTEGER IP address into an address.

B. For non-negative data (such as self-increasing IDS and integer IP addresses), unsigned integers are preferredCopy the code

The reason:

Unsigned provides twice as much storage space as signed

UNSIGNED INT 0 4294967295 VARCHAR(N) represents the number of characters, not the number of bytes, Use UTF8 to store 255 Characters Varchar(255)=765 bytes. Too large a length will consume more memory.

2. Avoid using TEXT,BLOB data types; the most common TEXT type can store up to 64K of data
A. Separate BLOB or TEXT columns into separate extended tables is recommendedCopy the code

MySQL memory temporary tables do not support large data types such as TEXT and BLOB. If such data is contained in a query, disk temporary tables are used instead of memory temporary tables for sorting operations. And for this kind of data, MySQL still has to perform a second query, which will make SQL performance become poor, but it is not to say that such data type must not be used.

If you must, it is recommended to separate BLOB or TEXT columns into separate extended tables, never use SELECT * for queries but only extract necessary columns, and never query a TEXT column when it is not needed.

B, TEXT, or BLOB types can only use prefix indexesCopy the code

Because MySQL has a limit on the length of index fields, the TEXT type can only use prefix indexes, and there is no default value on the TEXT column

3. Avoid the ENUM type

To change the ENUM value, use the ALTER statement

The ORDER BY operation of ENUM type is inefficient and requires additional operations

Do not use numeric values as enumeration values of enUms

4. Define all columns as NOT NULL if possible

The reason:

Index NULL columns require extra space to hold, so they take up more space

NULL values are treated specially for comparison and calculation

5. Use TIMESTAMP(4 bytes) or DATETIME (8 bytes) to store the time

TIMESTAMP Storage time range 1970-01-01 00:00:01 to 2038-01-19-03:14:07

TIMESTAMP takes 4 bytes as INT, but is more readable than INT

The value beyond the TIMESTAMP range is stored as DATETIME

It is common for people to use strings to store date-based data (incorrect)

• Disadvantage 1: You can’t use date functions for calculation and comparison • Disadvantage 2: Storing dates as strings takes up more space

6. Finance-related amount class data must be of type DECIMAL

• Imprecise float: float,double • Exact float: decimal

The Decimal type is an exact floating-point number that does not lose precision when evaluating

The space taken up is determined by the width of the definition. Each 4 bytes can store 9 digits, and the decimal point takes up one byte

Can be used to store integer data larger than BigInt

Index design specification

1. Limit the number of indexes in a table. You are advised to limit the number of indexes in a table to five
  • More indexes are not always better! Indexing can improve efficiency and also reduce efficiency.
  • Indexes can increase query efficiency, but they also reduce insert and update efficiency, and in some cases, query efficiency.
  • Because MySQL when choosing how to optimize the query optimizer, according to a unified information, for each index can be used to evaluate, to generate a best execution plan, if there are a lot of index can be used to query at the same time, it will increase the time of MySQL optimizer to generate the execution plan, also can reduce the query performance.
2. Do not create a separate index for each column in the table

Before version 5.6, a SQL query could only use one index in a table. After 5.6, even with the optimization of merged indexes, it is still not as good as using a single combined index.

3. Every Innodb table must have a primary key

Innodb is an indexed organized table: data is stored in the same logical order as the index order. Each table can have multiple indexes, but the table can be stored in only one order.

Innodb organizes tables by primary key index order

  • Do not use frequently updated columns as primary keys. Multiple column primary keys are not appropriate (equivalent to a joint index)
  • Do not use UUID,MD5,HASH, string columns as primary keys (sequential growth cannot be guaranteed)
  • You are advised to use the self-increasing ID for the primary key
4. Common index column suggestions

• Columns that appear in THE WHERE clause of SELECT, UPDATE, or DELETE statements • columns that are contained in ORDER BY, GROUP BY, or DISTINCT • Columns that match 1 and 2 are not indexed It is usually better to create joint indexes for fields 1 and 2

5. How to select the index column order

The purpose of creating an index is to reduce random I/OS and improve query performance. The less data an index can filter out, the less data it can read from the disk.

  • The most discriminant is placed at the far left of the joint index (discriminant = number of different values in the column/total number of rows in the column)
  • Try to place columns with small field lengths at the far left of the union index (because smaller field lengths mean more data can be stored on a page and better IO performance)
  • The most frequently used columns are placed to the left of the union index (so that fewer indexes can be created)
6. Avoid creating redundant and duplicate indexes (increases the time for the query optimizer to generate the execution plan)
  • Duplicate indexes: Primary key(ID), index(ID), unique index(ID)
  • Examples of redundant indexes: Index (a, B, C), index(a,b), index(a)
7. Override indexes are preferred for frequent queries

Overwrite index: an index that contains all query fields (where, SELECT, Ordery by,group by)

Benefits of overwriting indexes:

  • Avoid secondary queries for Innodb tables: Innodb is stored in the order of clustered indexes. For Innodb, secondary indexes store the primary key information of rows in leaf nodes. If we use secondary indexes to query data, after finding the corresponding key value, we need to conduct a secondary query through the primary key to obtain the data we really need. In an overwrite index, all data can be obtained from the key value of the secondary index, avoiding secondary query on the primary key, reducing I/O operations and improving query efficiency.
  • You can change random I/O to sequential I/O to speed up query efficiency: Because overwrite indexes are stored in order of key values, it is much less for IO intensive range searches than to read each row of data randomly from disk. Therefore, overwrite indexes can also be used to convert random read IO from disk to sequential index searches during access.
8. Index SET specification

Try to avoid using foreign key constraints

  • Foreign key constraints are not recommended, but indexes must be built on the associated keys between tables
  • Foreign keys can be used to ensure referential integrity of data, but are recommended to be implemented on the business side
  • Foreign keys can affect writes to parent and child tables and thus degrade performance

Database SQL development specification

1. You are advised to use precompiled statements to perform database operations
  • Precompiled statements can reuse these plans, reduce the time required for SQL compilation, and solve the SQL injection problems that dynamic SQL presents.
  • Passing only parameters is more efficient than passing SQL statements.
  • The same statement can be parsed once and used multiple times to improve processing efficiency.
2. Avoid implicit conversions of data types

Implicit conversions can invalidate indexes as follows:

select name,phone from customer where id = ‘111’;

3. Make full use of indexes that already exist on the table

Avoid query conditions with double % numbers. A like ‘%123%’, a like ‘%123%’,

An SQL can only use one column in a composite index for range queries. For example, if there is a joint index on columns A, B, and C, and if there is a range query on columns A in the query condition, the indexes on columns B and C will not be used.

When defining a union index, if column A is used for range lookups, place column A to the right of the union index and use left Join or NOT EXISTS to optimize the NOT in operation, since not in also usually uses index invalidation.

4. The database should be designed with future extensions in mind
5. The program connects to different databases using different accounts, forbidding library query
  • Leave room for database migration and table splitting
  • Reduce service coupling
  • Avoid security risks caused by excessive permissions
6. Disable SELECT * You must use SELECT < field list > to query

The reason:

  • Consumes more CPU and IO to network bandwidth resources
  • Unable to use overwrite index
  • Reduces the impact of table structure changes
7. Disallow INSERT statements without a list of fields

Such as:

insert into values (‘a’,’b’,’c’); You should use:

insert into t(c1,c2,c3) values (‘a’,’b’,’c’);

8. Avoid subqueries and optimize them to join operations

Generally, subqueries can be converted into associated queries for optimization only when the subquery is in clause and the subquery is simple SQL(excluding union, group BY, Order BY, and limit clauses).

Reasons for poor sub-query performance:

The result set of a subquery cannot use an index. Usually, the result set of a subquery is stored in a temporary table. No index exists in a memory temporary table or a disk temporary table, so the query performance is affected. Especially for subqueries that return large result sets, the impact on query performance will be greater.

Because sub-queries generate a large number of temporary tables and no indexes, they consume too much CPU and I/O resources and generate a large number of slow queries.

9. Avoid joining too many tables

For MySQL, there is an associative cache, and the size of the cache can be set by the join_buffer_size parameter.

In MySQL, if more tables are joined to the same SQL, an additional association cache will be allocated. If more tables are associated with a SQL, more memory will be occupied.

If a large number of programs use the operation of multi-table association, while the JOIN_BUFFer_SIZE setting is not reasonable, it is easy to cause server memory overflow, will affect the stability of server database performance.

In addition, temporary table operations may occur for association operations, affecting query efficiency. The MySQL allows a maximum of 61 tables to be associated, and it is recommended that no more than 5 tables be associated.

10. Reduce the number of interactions with the database

A database is more suitable for handling batch operations. Combining multiple identical operations improves processing efficiency.

11. When judging or for the same column, use in instead of OR

The value of in should not exceed 500. In operations can make more efficient use of the index, and or can make less use of the index in most cases.

12. Disallow random sorting using order by rand()

Order by RAND () loads all the eligible data in the table into memory, then sorts all the data in memory according to randomly generated values, and may generate a random value for each row. If the eligible data set is very large, it consumes a lot of CPU and IO and memory resources.

The recommended way to get a random value in a program and then get the data from the database.

13. The WHERE clause disallows function conversions and calculations on columns

Functional conversions or calculations on columns result in unusable indexes

Is not recommended:

Where date(create_time)=’20190101′

where create_time >= ‘20190101’ and create_time < ‘20190102’

14. Use UNION ALL instead of UNION when it is clear there will be no duplicates
  • The UNION will put all the data in the two result sets into temporary tables and then de-redo them
  • The UNION ALL will no longer deduplicate the result set
15. Split complex large SQL into multiple small SQL
  • Large SQL Logistically complex SQL that requires a large amount of CPU for calculation
  • In MySQL, only one CPU can be used for each SQL calculation
  • SQL splitting can be executed in parallel to improve processing efficiency

Code of conduct for database operations

1. Perform batch write operations (UPDATE,DELETE,INSERT) for more than 1 million rows

Large volume operations can cause significant master-slave delays

In a master/slave environment, a large number of operations may cause serious master/slave latency. A large number of write operations generally take a long time to execute. However, after the write operations are completed on the master library, they are executed on other slave libraries, resulting in a long delay between the master and slave libraries

Binlog Logs in ROW format generate a large number of logs

Large quantities of write operation will produce large amounts of log, especially for the binary data row format, because each line is recorded in the row format data change, the more data we a change, the log will have, the more amount of log of transmission and the longer the recovery time is needed to, this is also a cause of the master-slave delay

Avoid large transaction operations

A large number of data changes must be performed in a transaction, which will cause a large number of data locks in the table, resulting in a large number of blocks, which will have a significant impact on MySQL performance.

In particular, long blocks can fill up all available connections to the database, preventing other applications in the production environment from connecting to the database, so it is important to note that bulk writes are done in batches

2. Run the pt-online-schema-change command to modify the table structure for large tables
  • Avoid master/slave delays caused by large table changes
  • Avoid locking tables while table fields are being modified

Changes to large table data structures must be made with caution. Serious table locking operations, especially in production environments, are not tolerated.

Pt-online-schema-change creates a new table with the same structure as the original table, changes the table structure on the new table, copies the data from the original table to the new table, and adds some triggers to the original table. Copy all new data from the original table to the new table. After all data is copied, name the new table as the original table and delete the original table. The original DDL operation is broken down into smaller batches.

3. Do not grant super permission to the account used by the program
  • When the maximum number of connections is reached, one user connection with super privileges is also run
  • The super permission can only be reserved for the account that handles the problem
4. Follow the principle of minimum permissions for application database access accounts
  • Program use database account can only be used under a DB, not cross-library
  • Accounts used by programs are not allowed to have the DROP permission in principle