MySQL performance
Maximum data volume
Regardless of the amount of data and concurrency, performance is rogue. MySQL does not limit the maximum number of records in a table, it depends on the operating system’s file size limit.
According to The Alibaba Java Development Manual, only when the number of rows in a single table exceeds 5 million or the capacity of a single table exceeds 2GB is it recommended to divide database into tables. Performance is determined by comprehensive factors. Regardless of service complexity, hardware configuration, MySQL configuration, data table design, and index optimization are in the order of influence. The value of 5 million is for reference only, not an iron rule. The blogger has operated a single table with more than 400 million rows of data. It takes 0.6 seconds to query the latest 20 records in pages, and the SQL statement is roughly
PrePageMinId is the minimum ID for the previous data record. While the query speed was fine at the time, it will surely become overwhelmed as the data continues to grow. Dividing database and table is a big job with a long period and high risk. It should be optimized on the current structure as much as possible, such as upgrading hardware, migrating historical data, etc., and then dividing it when there is no way out. Those who are interested in sub-tables can read the basic idea of sub-tables. Maximum concurrency
The number of concurrent requests is how many requests the database can process at any one time, as determined by max_connections and max_user_connections. Max_connections specifies the maximum number of connections to a MySQL instance. The upper limit is 16384. Max_user_connections specifies the maximum number of connections to a database user. MySQL provides buffers for each connection, which means more memory consumption. If the connection number is set too high for the hardware, too low for the hardware. Generally, the ratio of the two is more than 10%, and the calculation method is as follows:
You are advised to limit the query time to less than 0.5 seconds, which is an empirical value derived from the three-second rule in user experience. If the user does not respond to an operation within 3 seconds, the user will be bored or even quit. Response time = client UI rendering time + network request time + application processing time + query database time, 0.5 seconds is 1/6 of the processing time left for the database. Implementation of the principle of
Compared to NoSQL databases, MySQL is a delicate creature. It is like the female students in PE class, a little dispute and classmates will be uncomfortable (difficult to expand), run two steps will be out of breath (small capacity and low), often sick to ask for leave (SQL constraints too much). Everyone is a bit distributed these days, and application scaling is much easier than database scaling, so the principle is less database work, more application work. Make the most of indexes, but don’t abuse them. Indexes also consume disk and CPU. It is not recommended to use database functions to format data for application processing. The use of foreign key constraints is not recommended; data accuracy is guaranteed by the application. Unique indexes are not recommended in the scenario where many data is written but few data is read. With appropriate redundancy of fields, try creating intermediate tables, calculating intermediate results with applications, and trading space for time. Extremely time-consuming transactions are not allowed to be executed and the application is split into smaller transactions. Anticipate load and data growth trends for important data sheets (such as order sheets) and optimize them in advance. Data table design
The data type
Choice of data type: simpler or less space. If the length is sufficient, use tinyint, Smallint, medium_int instead of int. If the length of the string is specified, use the char type. If the vARCHar meets this requirement, the text type is not used. For high precision, use decimal or BIGINT, such as multiplying two decimal places by 100 and saving. Use timestamp instead of datetime whenever possible.
Timestamp takes up less space than datetime and is stored in THE AUTOMATIC time zone conversion format UTC. Avoid empty value
MySQL still occupies space when a field is NULL, which makes indexes and index statistics more complicated. Updates from NULL values to non-NULL values cannot be performed in situ, which is prone to index splitting and affects performance. If possible, replace the NULL value with a meaningful value to avoid the SQL statement containing is not NULL. Text type optimization
Because the text field stores a large amount of data, the table size can increase very early, affecting the query performance of other fields. It is recommended to extract it and place it in a sub-table and associate it with the business primary key. The index optimization
The index classification
Plain index: The most basic index. Composite index: An index created on multiple fields to speed up the retrieval of composite query conditions. Unique index: Similar to a normal index, but the value of the index column must be unique, allowing empty values. Combination unique index: The combination of column values must be unique. Primary key index: a special unique index that uniquely identifies a record in a data table. Empty values are not allowed and are usually constrained by the primary key. Full text index: Used to query massive text. InnoDB and MyISAM support full text index after MySQL5.6. Due to poor query accuracy and scalability, more enterprises choose Elasticsearch. The index optimization
Paging queries are important, MYSQL does not use indexes if the volume of the query exceeds 30%. The number of indexes in a single table does not exceed five, and the number of index fields does not exceed five. The string can be indexed by prefix, and the prefix length is limited to 5-8 characters. The uniqueness of the field is too low, so it is meaningless to add indexes, such as whether to delete, gender. Fair use coverage index, as follows:
batch
When the blogger was young, he saw a fish pond with a small opening to release water, and there were all kinds of floating objects on the surface. Duckweed and leaves always make it through the outlet, while branches block other objects and sometimes get stuck and need to be cleaned manually. MySQL is the fish pond, maximum concurrency and network bandwidth is the outlet, user SQL is floating. Queries without paging parameters or update and DELETE operations that affect a large amount of data are branches that need to be broken up and processed in batches. For example: The SQL statement:
If a large number of coupons need to be updated to the unavailable state, executing this SQL may block other SQL, batch processing pseudocode as follows:
The <> operator optimizes
In general, the <> operator cannot use the index, as in the following example, to query an order less than $100:
Or cannot use composite indexes under Innodb engine, for example:
IN is suitable for the main table big small table, EXIST is suitable for the main table small table big table. Due to the continuous upgrading of the query optimizer, there are many scenarios where the performance is almost the same. Try changing to join query as follows:
Index invalidation is usually performed in the query condition column, as shown below: Query the order of the day
If you do not query all columns in the table, avoid using SELECT *, which does a full table scan and does not make efficient use of the index. Like to optimize
Like is used for fuzzy queries, for example (field is indexed) :
Join is implemented by the Nested Loop join algorithm, which drives the result set of the table as the basic data, uses the junction data as the filtering condition to query the data in the next table, and then merges the results. If there are multiple joins, the previous result set is used as circular data and the data is queried again in the latter table. The driven and driven tables should increase the query conditions as much as possible, satisfy the condition of ON and use less Where, and use small result sets to drive large result sets. If an index is added to the join field of the driven table, set an adequate join Buffer Size if the index cannot be established. Do not join more than three tables and try to add redundant fields. Limit optimization
Limit is used for paging query. The performance deteriorates if you scroll backwards. The solution is as follows:
Other databases
As a back-end developer, it is important to be proficient in MySQL or SQL Server as the storage core, and also to take an active interest in NoSQL databases, which are mature enough and widely adopted to solve performance bottlenecks in specific scenarios.