Code out of efficiency, code out of quality.
Between the lines of the code is the blood in the life of the software, the improvement of quality is to step on the pit as little as possible, put an end to the repeated pit, effectively improve the quality awareness. In addition, modern software architecture requires collaborative development. Efficient collaboration means to reduce collaborative costs and improve communication efficiency. As we all know, traffic laws are ostensibly designed to limit the right to drive, but in fact, to protect the personal safety of the public. Imagine if there were no speed limit, no traffic lights, who would dare to drive on the road. For software, proper specifications and standards are not about eliminating the creativity and elegance of code content, but about limiting excessive personalization, working together in a universally agreed and unified way, and improving collaboration efficiency.
The MySQL database
(1) Table construction protocol
1. [mandatory] A yes or no field must be named is_xxx and the data type is unsigned tinyint (1 indicates yes, 0 indicates no). Note: Any field that is non-negative must be unsigned.
For example, the field name for logical deletion is is_deleted. 1 indicates that the field is deleted, and 0 indicates that the field is not deleted.
Unsigned tinyint (deletedFlag); unsigned tinyint ()
2. [Mandatory] The table name and field name must contain lowercase letters or digits. Do not start with digits or contain only digits between two underscores (_). Database field names can be expensive to change because they cannot be pre-published, so they need to be considered carefully.
Example: getter_admin, task_config, level3_name
Counterexamples: GetterAdmin, taskConfig, Level_3_NAME
[Opinion] Good rules, strictly follow
3. [mandatory] Do not use plural nouns in the table name.
Note: the name of the table should only represent the contents of the table, not the number of entities, corresponding to the name of the DO class.
[Opinion] Good rules, strictly follow
4. [Mandatory] Disable reserved words, such as desc, range, match, delayed, etc. Please refer to the official reserved words of MySQL.
[Opinion] Good rules, strictly follow
5. [Mandatory] The primary key index name is pk_. Unique index name uk_ field name; The common index name is the idX_ field name. Pk_ is the primary key; Uk_ stands for unique key; Idx_ is short for index.
[Opinion] Good rules, strictly follow
6. [Mandatory] Decimal type is decimal; float and double are prohibited. Note: Float and double can be stored with a loss of precision, and can be compared incorrectly. If you store data beyond the range of Decimal, it is recommended to store the data separately as integers and decimals.
[Opinion] Good rules, strictly follow
7. [Mandatory] If the stored strings are nearly equal in length, char is used as a fixed-length string.
[Opinion] Good rules, strictly follow
8. [Mandatory] VARCHar is a string of variable length. Storage space is not allocated in advance.
If the storage length is greater than 5000, set the field type to TEXT, create an independent table, and use the primary key to map the table to avoid affecting the index efficiency of other fields.
[Opinion] Good rules, strictly follow
9. [Mandatory] The table must have three columns: ID, gmT_CREATE, gmT_modified.
Note: ID must be the primary key, the type is unsigned Bigint, and the step size is 1. Gmt_create and GMT_modified are date_time types.
Use create_time and modify_time
10. The best way to name a table is to add “business name _ Function of the table”.
Example: tiger_task/tiger_reader/mpp_config
[Opinion] Good rules, strictly follow
11. [Recommended] The library name should be the same as the application name.
[Opinion] Good rules, strictly follow
12. [Recommended] If the meaning of a field is changed or the status of a field is appended, you need to update the field comment in time.
[Opinion] Good rules, strictly follow
13. [Recommendation] Fields allow for some redundancy to improve query performance, but data consistency must be considered. Redundant fields should follow:
1) Fields that are not frequently modified.
2) VARCHAR is not a long field, let alone a text field.
Example: The category name is used frequently and the field length is short. The category name is basically unchanged. The category name can be stored in the associated table redundantly to avoid associated query.
[Opinion] Good rules, strictly follow
14. [recommendation] you are advised to divide databases and tables only when the number of rows in a single table exceeds 5 million or the capacity of a single table exceeds 2GB.
Note: Do not create a separate table from the database if the data volume is not expected to reach this level in three years.
[Opinion] Good rules, strictly follow
15. Appropriate character storage length not only saves database table space, saves index storage, but also improves retrieval speed.
Positive examples: in the following table, unsigned values can avoid the error of negative numbers and expand the range of representation.
object |
The age range |
type |
Said the scope of |
people |
Within a 150 – year – old |
unsigned tinyint |
Unsigned values: 0 to 255 |
turtle |
Hundreds of years old |
unsigned smallint |
Unsigned value: 0 to 65535 |
Dinosaur fossils |
Tens of millions of years |
unsigned int |
Unsigned values: 0 to about 4.29 billion |
The sun |
About 5 billion years |
unsigned bigint |
Unsigned values: 0 to approximately 10 ^ 19 |
[Opinion] Good rules, strictly follow
(2) Index protocol
1. [Mandatory] A unique index must be created for a field that has a unique feature, even if it is a combination of multiple fields.
Note: Do not assume that the unique index affects the insert speed, the speed loss can be ignored, but the increase in search speed is significant;
In addition, even with perfect validation control at the application layer, as long as there is no unique index, according to Murphy’s law, there will be dirty data.
[Opinion] Good rules, strictly follow
2. [Force] Prohibit join if more than three tables are used. The data types of the fields to be joined must be absolutely consistent. When multiple tables are used for associated query, ensure that the associated fields have indexes.
Note: Pay attention to table index, SQL performance even if double table JOIN.
[Opinion] Good rules, strictly follow
3. [Mandatory] When creating an index for a VARCHAR field, the index length must be specified. It is not necessary to create an index for all fields.
Note: Index length and distinction is a pair of contradictory body, generally for string type data, length of 20 index, the distinction will be as high as 90%,
Use count(distinct left(column name, index length))/count(*) to determine this.
[Opinion] Good rules, strictly follow
4. [Mandatory] Page search is strictly prohibited from left blur or full blur, if necessary, please go to the search engine to solve.
Note: The index file has the left-most prefix matching feature of b-tree. If the left value is not determined, the index cannot be used.
[Opinion] Good rules, strictly follow
5. If there is an order by scenario, please take advantage of the orderality of the index.
The last field of order BY is part of the composite index and placed at the end of the index combination order to avoid file_sort, which affects query performance.
Where a= where? and b=? order by c; Index: a_b_c
WHERE a>10 ORDER BY b; WHERE a>10 ORDER BY b; Index A_b cannot be sorted.
[Opinion] Good rules, strictly follow
6. [Recommended] Use overwrite indexes to perform query operations to avoid returning to the table.
Explanation: If a book needs to know the title of chapter 11, will it turn to the page corresponding to chapter 11? Just browse the directory, this directory is to play the role of overwriting the index.
Example: Can create index types: primary key index, unique index, normal index, and overwrite index is a query effect, with explain results, extra column will appear: using index.
[Opinion] Good rules, strictly follow
So here’s a table:
Copy the code
CREATE TABLE `user_group` (
`id` int(11) NOT NULL auto_increment,
`uid` int(11) NOT NULL,
`group_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `uid` (`uid`),
KEY `group_id` (`group_id`),
) ENGINE=InnoDB AUTO_INCREMENT=750366 DEFAULT CHARSET=utf8
Copy the code
If you look at AUTO_INCREMENT, that’s not a lot of data, 750,000. Then a simple query:
SELECT SQL_NO_CACHE uid FROM user_group WHERE group_id = 245;
Easy, right? Here’s the weird part:
If you use MyISAM as the storage engine, the query time is only 0.01s, but with InnoDB it is about 0.15s
This difference is not a big deal, but the real business needs are more complex than this, resulting in a big gap: MyISAM only 0.12s, InnoDB only 2.2s. Finally, the crux of the problem is located in this SQL.
Explain results are:
+—-+————-+————+——+—————+———-+———+——-+——+——-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+————+——+—————+———-+———+——-+——+——-+
| 1 | SIMPLE | user_group | ref | group_id | group_id | 4 | const | 5544 | |
+—-+————-+————+——+—————+———-+———+——-+——+——-+
It looks like I’m already using the index, and this SQL statement is so simple that I can’t optimize it anymore. Finally, I asked Gaston, a former colleague, for diagnosis. He thought: In data distribution, group_id is the same in many cases, UID hash is uniform, and the effect of adding index is not so good, but he still suggested me to try adding a multi-column index:
ALTER TABLE user_group ADD INDEX group_id_uid (group_id, uid);
Then the unthinkable happened… The performance of this SQL query has been greatly improved to around 0.00s. The optimized SQL, combined with real business requirements, was reduced from 2.2 seconds to 0.05 seconds.
Explain again:
+—-+————-+————+——+———————–+————–+———+——-+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+————+——+———————–+————–+———+——-+——+————-+
| 1 | SIMPLE | user_group | ref | group_id,group_id_uid | group_id_uid | 4 | const | 5378 | Using index |
+—-+————-+————+——+———————–+————–+———+——-+——+————-+
Originally is this called covering index (covering index), MySQL only need through the index can return the required data query, and do not have to check after the index to query data, so that is quite fast!! But at the same time, it also requires that the queried fields must be covered by the Index. In Explain, if there is “Using Index” in the output Extra information, it means that this query uses the covered Index.
However, there is an unexplained question as to why MyISAM is so much faster and InnoDB so much slower without overwriting indexes.
7. [Recommendation] Use deferred association or subquery to optimize the super multi-page scenario.
MySQL does not skip the offset row. Instead, MySQL takes the offset+N row and returns N rows.
Either control the total number of pages returned, or SQL overwrite the number of pages that exceed a specific threshold.
SELECT a.* FROM table 1 a, (SELECT id FROM table 1 where condition LIMIT 100000,20) b where a.id=b.id
[Opinion] Good rules, strictly follow
8. [Recommendation] Target of SQL performance optimization: at least range level, requirement is ref level, conSTS is the best.
Note: 1) Consts has at most one matching row (primary key or unique index) in a single table. Data can be read in the optimization stage.
2) ref means normal index is used.
3) range Retrieves the index in range.
Counter example: explain table results, type=index, index physical file full scan, speed is very slow, this index level comparison range is also low, and full table scan is nothing.
[Opinion] Good rules, strictly follow
9. When building a composite index, the most distinguishable index is at the far left.
If a=? and b=? , the value of column A is almost unique, so only the idX_A index is needed.
Note: If non-equal sign and equal sign mixed judgment condition exists, please place the equal sign condition column in front when building the index.
Such as: where a >? and b=? So even if A is more differentiated, B must be placed at the top of the index.
[Opinion] Good rules, strictly follow
10. [Recommendation] Prevent implicit conversion caused by different field types, resulting in index invalidation.
[Opinion] Good rules, strictly follow
11. Avoid the following extreme misunderstandings when creating indexes:
1) Better have too many than too few. Mistaking a query for an index.
2) Better than all. Mistaken indexes can consume space and significantly slow down updates and additions.
3) Resist unique indexes. Mistaken business uniqueness requires a “look – up – plug” approach at the application layer.
[Opinion] Good rules, strictly follow
(3) SQL statement
1. [mandatory] Do not use count(column name) or count(constant) instead of count(*). Count (*) is a standard row count syntax defined by SQL92.
Note: Count (*) counts rows with a NULL value, while count(column name) does not count rows with a NULL value.
[Opinion] Good rules, strictly follow
2. 【 mandatory 】count(distinct col) Count the number of distinct rows in this column except NULL.
Note that count(DISTINCT COL1, col2) If one of the columns is all NULL, 0 is returned even if the other column has a different value.
[Opinion] Good rules, strictly follow
3. [Mandatory] If all values in a column are NULL, count(col) returns 0, but sum(col) returns NULL. Therefore, pay attention to the NPE problem when using sum().
SELECT IF(ISNULL(sum (g)),0, sum (g)) FROM table;
[Opinion] Good rules, strictly follow
4. [mandatory] use ISNULL() to check whether the value ISNULL. Note: A direct comparison of NULL to any value is NULL.
Note: 1) NULL<>NULL returns NULL, not false.
2) NULL=NULL returns NULL instead of true.
3) NULL<>1 returns NULL, not true.
[Opinion] Good rules, strictly follow
5. [Mandatory] When writing paging query logic in code, if count is 0, it should be returned directly to avoid executing subsequent paging statements.
[Opinion] Good rules, strictly follow
6. [Mandatory] Do not use foreign keys and cascading. All foreign key concepts must be solved at the application layer.
Note (Concept description) Student_id in the student table is the primary key, and student_id in the score table is the foreign key.
If the student_ID in the student table is updated and the student_id in the score table is updated, the cascading update is performed.
Foreign keys and cascading update are suitable for low-concurrency single machine, but not for distributed and high-concurrency cluster.
Cascading update is strongly blocked, and there is the risk of database update storm. Foreign keys affect database insert speed.
[Opinion] Good rules, strictly follow
7. [Mandatory] Prohibit the use of stored procedures. Stored procedures are difficult to debug and expand, not to mention portability.
[Opinion] Good rules, strictly follow
8. [Mandatory] When deleting or modifying records in data revision, select first to avoid misdeletion and execute update statements only after confirmation.
[Opinion] Good rules, strictly follow
9. If the in operation can be avoided, it should be avoided. If it really cannot be avoided, the number of set elements following in should be carefully evaluated and controlled within 1000.
[Opinion] Good rules, strictly follow
10. If there is a need for globalization, all characters are stored and represented in UTF-8 encoding. Note the difference of character statistics function.
SELECT LENGTH(” easy to work “); Return to 12;
SELECT CHARACTER_LENGTH(” CHARACTER_LENGTH “); Return to 4
If you want to use emoticons, use UTFMB4 for storage, and note the difference between this and utF-8 encoding.
[Opinion] Good rules, strictly follow
TRUNCATE TABLE is faster than DELETE and uses fewer system and transaction log resources. However, TRUNCATE has no transaction and does not trigger, which may cause accidents. Therefore, it is not recommended to use this statement in code development.
TRUNCATE TABLE is functionally the same as a DELETE statement without a WHERE clause.
[Opinion] Good rules, strictly follow
(4) ORM mapping
1. [Mandatory] In table query, do not use * as the query field list, which fields must be clearly indicated.
Note: 1) Increase query analyzer parsing cost.
2) Adding or subtracting fields is likely to be inconsistent with resultMap configuration.
[Opinion] Good rules, strictly follow
2. [Mandatory] POJO class Boolean attributes cannot add IS, but database fields must add IS_, requiring mapping between fields and attributes in resultMap.
Note: See defining POJO classes and database field definitions, adding mappings in <resultMap> is required.
In the code generated by MyBatis Generator, the corresponding modification is required.
[comment] Rule in doubt, use flag to end field
3. [Mandatory] Do not use resultClass when returning parameters, even if all class attribute names correspond to database fields one by one, they need to be defined.
Conversely, there must be a corresponding table for each table. Configure the mapping relationship, so that the field and DO class decouple, easy maintenance.
[Opinion] Good rules, strictly follow
4. [mandatory] SQL. XML configuration parameters use: #{}, #param# do not use ${} this way is easy to create SQL injection.
[Opinion] Good rules, strictly follow
IBATIS queryForList(String statementName,int start,int size) is not recommended.
Its implementation is to get all records of SQL statements corresponding to statementName in the database, and then get start,size subsets through subList.
Map<String, Object> Map = new HashMap<String, Object>();
map.put(“start”, start);
map.put(“size”, size);
[Opinion] Good rules, strictly follow
6. [Mandatory] Do not use HashMap and Hashtable as the output of the query result set.
ResultClass = “Hashtable”, the field name and attribute value will be placed, but the value type is not controllable.
[Opinion] Good rules, strictly follow
7. [Mandatory] When updating a table record, the gmT_MODIFIED field must be updated to the current time.
[Opinion] Good rules, strictly follow
Update table set c1=value1, C2 = Value2,c3=value3;
That’s not right. When executing SQL, do not update unchanged fields,
One is error-prone;
Second, low efficiency;
Third, add binlog storage.
[Opinion] Good rules, strictly follow
Don’t abuse @transactional transactions.
Transactions affect the QPS of the database,
Other places where transactions are used need to consider various rollback schemes, including cache rollback, search engine rollback, message compensation, statistical correction, and so on.
[Opinion] Good rules, strictly follow
<isEqual> <isEqual> <isEqual> <isEqual> <isEqual> <isEqual> <isEqual> <isEqual>
<isNotEmpty> is executed when the value is not null and null. <isNotNull> is executed when the value isNotNull.
[Opinion] Good rules, strictly follow