background
Due to my position, I was responsible for formulating the specifications for back-end group database. As the specifications for all product lines, after several editions of revision, the text below was finally formed.
Specification in the whole back-end implementation also has half a year time, for the whole team in the development stage to reduce inappropriate table construction sentences, wrong SQL, wrong index has a positive significance, so share it for everyone’s reference.
The following is divided into three parts of table building protocol, SQL protocol, index protocol, each part of each has mandatory, suggested two levels, we in reference, according to their own company to weigh.
First, set up the table protocol
[Mandatory] (1) Storage engine must use InnoDB
InnoDB supports transaction, row-level locking, concurrency, CPU and memory cache page optimization for higher resource utilization.
[Mandatory] (2) Each table must have a primary key ID, and this primary key ID uses the increment of the primary key (as short as possible), except in a separate database and separate table environment
InnoDB organizes data in a way that requires a primary key, and if the primary key ID is monotonically increasing, it can effectively improve insert performance, avoid excessive page splitting, reduce table fragmentation and improve space utilization. In a separate database and table environment, the primary key values in each table need to be uniformly allocated to avoid primary key duplication in the whole logical table.
(3) UtF8MB4 character set must be used
Mysql utF-8 is not a “true UTF-8”, but utF8MB4 is a “true UTF-8”.
[Mandatory] (4) Chinese annotations must be added to database tables and table fields
Interpretation: Don’t be lazy.
[Mandatory] (5) The database name, table name, field name are lowercase, underline style, no more than 32 characters, must see the name to know the meaning, prohibit pinyin English mixed
Interpretation: Convention.
(6) The number of columns in a single table must be less than 30. If the number exceeds 30, split the table
Interpretation: Too many columns in a single table makes it too expensive for Mysql server to handle the mapping between InnoDB returns.
[Mandatory] (7) Prohibit the use of foreign keys. If there are foreign key integrity constraints, application control is required
UPDATE and DELETE operations involve associated tables, affecting SQL performance and even causing deadlocks.
(8) You must define the field as NOT NULL and provide a default value
Reading:
NULL columns complicate index/index statistics/value comparisons and are more difficult to optimize for MySQL;
This type of Msql requires special internal processing, which increases the complexity of database processing records. Under the same conditions, when there are more empty fields in the table, the processing performance of the database will be reduced a lot;
NULL values require more storage space, and NULL columns in each row of a table or index require additional space to identify.
[Mandatory] (9) Disable reserved words, such as DESC, RANGE, and MARCH. For details, see Mysql official reserved words
[Mandatory] (10) If the stored strings are nearly equal in length, CHAR is used as a fixed-length string.
Interpretation: Can reduce space debris, save storage space.
[Suggestion] (11) Consider using TIMESTAMP instead of DATETIME in some scenarios
Reading:
Both types of TIMESTAMP can express the time in the format “YYYY-MM-DD HH: MM: SS”. TIMESTAMP takes only 4 bytes and can be stored in the range (1970-2038).
The DATETIME type takes up 8 bytes, is time-zone insensitive and can be stored in the range (1001-9999) years.
- (12) Beware of automatically generated schemas. It is recommended that all schemas be written manually
Interpretation: Don’t put too much trust in some database clients.
Second, SQL protocol
[Suggestion] (1) To make full use of cache, do not allow the use of custom functions, storage functions, and user variables
If a query contains any user-defined functions, storage functions, user variables, temporary tables, or system tables in the Mysql library, the query result will not be cached. The functions NOW() or CURRENT_DATE(), for example, return different results depending on the query time.
[Mandatory] (2) Specify the required columns in the query, instead of directly using “*” to return all columns
Reading:
Reading unwanted columns increases CPU, IO, and NET consumption.
Invalid utilization of an overwrite index.
[Mandatory] (3) Attribute implicit conversion is not allowed
What would happen if we added an index to the phone number column and then executed the following SQL? explain SELECT user_name FROM parent WHERE phone=13812345678; It is obvious that the index does not take effect, and the whole table will be scanned.
[Suggestion] (4) Use functions or expressions on attributes of WHERE conditions
Mysql cannot parse this expression automatically and cannot use indexes.
[Mandatory] (5) Prohibit the use of foreign keys and cascading, all foreign key concepts must be solved in the application layer
Interpretation: Foreign keys and cascading update are suitable for low-concurrency single machine, not for distributed, high-concurrency cluster. Cascading update is strongly blocked, and there is the risk of database update storm. Foreign keys affect database insert speed.
[Suggestion] (6) Avoid using OR as a join condition in the WHERE clause
According to the situation, UNION ALL can be used instead of OR.
[Mandatory] (7) Fuzzy query starting with % is not allowed
A fuzzy query starting with % does not use the index. A fuzzy query starting with % does not use the index.
Index protocol
[Suggestion] (1) Avoid creating indexes on columns with frequent updates and low differentiation
Interpretation: Creating separate indexes for less differentiated columns has little optimization effect, but frequent updates can make index maintenance more expensive.
[Mandatory] (2) No more than five JOIN tables are allowed. The data types of the fields to be joined must be absolutely consistent. When multiple tables are used for associated query, ensure that the associated fields have indexes
Interpretation: Too many joins on tables makes it harder for Mysql’s optimizer to weigh an “optimal” execution plan (possible as a factorial of the number of tables), as well as whether the associated fields are consistent in type, length, character encoding, and so on.
(3) In a joint index, if the distinction of the first column is equal to 1, then there is no need to create a joint index
Interpretation: Index data that can be fully located through the first column, so the later part of the joint index is not needed.
[Mandatory] (4) When creating a joint index, the field with higher differentiation must be placed on the left
Interpretation: The more differentiated columns are placed on the left, effectively filtering out useless data in the first place. To improve the efficiency of the index, we write SQL in Mapper WHERE conditions have multiple conditions, need to see if the current table has a ready-made joint index directly used, pay attention to the order of each condition as far as possible and the order of the index.
[Suggestion] (5) Use the overwrite index to carry out the query operation, avoid back to the table
Overwrite query is a query that only needs to get the required DATA through the index, and does not need to query back to the table again, so the efficiency is relatively high. When we use the EXPLAIN result, the extra column will appear: “using index”. It is also important not to use “SELECT *”, otherwise it is almost impossible to use an overwrite index.
[Suggestion] (6) When creating an index on a long VARCHAR field, such as VARCHAR(100), specify the index length. It is not necessary to create an index for the whole field
In general, for string data, if the length of the index is 20, the distinction will be more than 90%, so we can consider creating an index with the length of 20, rather than a full-field index. SELECT COUNT(DISTINCT LEFT(lesson_code, 20))/COUNT(*) FROM lesson; To determine the degree of text discrimination if the lesson_code field is 20 characters long.
[Suggestion] (7) If there is an ORDER BY scenario, please pay attention to the ORDER of the index. The fields at the end of the ORDER BY are part of the combined index and are placed at the end of the index combination ORDER to avoid file_sort, which affects query performance.
Reading:
Suppose there is a query condition WHERE a=? and b=? The ORDER BY c. If there is an index a_b_c, you can use index sort.
Counter example :WHERE a>10 ORDER BY b; WHERE a>10 ORDER BY B; Index A_b cannot be sorted.
[Suggestion] (8) In where, the index column cannot be part of an expression, nor can it be a parameter of a function
If an index is added to a column, Mysql cannot parse the column as part of an expression or as a parameter to a function, and the index does not take effect.
[Suggestion] (9) When we use the range query in the WHERE condition, the index can be used for a maximum of one range condition, if more than one range condition, the following index will not be used
Mysql can use the first leftmost range query in multiple range criteria, but cannot use the later range query.
[Suggestion] (10) When multiple tables are connected externally, the associated field types of the tables must be the same
When two tables are joined, if the column type is not identical, the index will not take effect. Complete consistency includes but is not limited to field type, field length, character set, collection, etc.