When it comes to databases, the first idea is how to optimize, how to make the query operation faster. I think the best way to do this is to try to be as thoughtful as possible from the beginning of your database design. If it’s an old project, you have to start with optimization. Next from the design and optimization to talk about some of my understanding and experience.




1: Database design

A good database design scheme for the performance of the database will often get twice the result with half the effort. Database design includes database architecture and business table design. I just organized a set of 2018 latest 0 basic introductory and advanced tutorial, unselfiedsharing, plus Java learning slip: 6 7 8, 2 41, 5 63 can be obtained, included: development tools and installation package, as well as the system learning roadmap

1) Database architecture

Design different architectures based on different data volumes and visits. What suits you is the best.

Singleton: Data reads and writes are a single database instance. (Backup instances are not counted). This applies to small enterprise internal systems. The disadvantage is that it only applies to scenarios with a small amount of data. The advantage is that it can achieve strong data consistency.




Vertical split, multiple instances. Different services go to different instances. It also applies to a single service, with a small amount of data, and each service is independent without association.




Read/write separation, master-slave architecture. Through the master-slave structure, the master library can resist the write pressure, and the slave library can share the read pressure. This method is suitable for applications where the data consistency is not required in real time.




Master/slave, cluster structure. This is suitable for applications with a lot of writing and reading. Implementation is the most complex one, which needs to consider how to route data, and later capacity expansion is also relatively troublesome. At the initial stage of design, you need to select an appropriate routing policy based on the understanding of services. Such as range, hash, etc




Cloud database: Ali Cloud RDS, etc. Suitable for no professional database operation and maintenance team of the company, the use of very convenient and simple.

2) Business table design

Common business table classification:

(1) Configuration table

This type of table usually stores some basic configuration information or dictionary information of the business. The amount of data in a table is generally small, and the modification operation is not frequent. It is usually a Select query operation.

(2) Status table

This kind of table is usually stored in the business system entity read image state information, the common user information table, order information table, etc.. The amount of data in this table is directly related to the size of entity read images, such as how many registered users an APP has and how many records the user table of this APP usually has. The status table changes frequently. Insert, Update, and Select operations occur. The existence of Delete operations depends on service requirements.

(3) Log table

This kind of table is usually used to record the state information of some entity in the business system, such as user login table, recharge information record table and so on. The data size of the table is usually large. If the service status changes frequently and a lot of change information is recorded, the data volume and insert performance of the table are required to be high. Log table operations are mainly Insert operations, and individual services will query the log table. High performance write architecture in MySQL five special requirements architecture is mainly the application of this table requirements.

(4) Archiving form

This table is used to archive or separate the data of the above three OLTP service tables. On the one hand, the data scale of the online business table can be controlled to ensure the performance of the business table. On the other hand, archiving can be used to better reflect and support archiving historical data. The amount of data in an archiving table is related to the size of the online table and the archiving period. Archiving table operations, in addition to the archiving process of data loading, mainly is the Select query operation, archiving is read-only table.

(5) Statistical data table

Statistical data table refers to the original data table of various on-line tables and archived tables that needs to be transferred to the on-line OLAP statistical analysis system through ETL process when the business needs off-line statistical analysis. Such tables are usually very large, and an OLAP statistical analysis platform aggregates data from multiple online business systems for statistical analysis. In addition to data flow, the operation of statistical data table is mainly the access calculation of various statistical analysis programs.

(6) Statistical results table

The statistical result table is the result data that various statistical analysis processes access in the statistical data table and perform statistical analysis according to certain logic when services have offline statistical analysis requirements. This kind of statistical result data, usually the data amount is relatively small. The operation of the statistical result table, besides processing the result flow action, is mainly for accessing the interface for Select query.

The sorting of business table types can make a general division of all business systems, so as to have a good idea.

Here are some lessons from designing tables:

Split a table with many fields into multiple tables: For a table with many fields, if some fields are used infrequently, you can separate them out, because a table with a large amount of data will be slowed down by the presence of infrequently used fields.

Adding intermediate tables: For tables that need frequent joint queries, you can create intermediate tables to improve query efficiency.

Adding redundant fields: Adding redundant fields improves the query speed. Redundant fields can cause problems. For example, if the value of a redundant field is changed in a table, the associated table must be synchronized or data inconsistency will result. This should be based on the actual situation, balance the database performance, the design of redundant fields.

All fields are defined NOT NULL unless you really want to store NULL.

Make a good estimate of data volume in advance and design sub-tables. Do not wait for the need to split and then split, the general table data volume control in ten million level. When the amount of data in a single table reaches a certain level (the performance inflection point in mysql5.x era is 1KW-2kW row level, which needs to be tested according to the actual situation), the most commonly used method to improve performance is to separate tables. The strategy for table splitting can be either vertical splitting (for example, splitting orders with different order states into different tables) or horizontal splitting (for example, splitting orders into different tables by month). If you divide the tables in the business layer, the logic becomes complicated and scattered. It is possible to introduce split table middleware to mask the details behind the split table and let the business layer query the data behind the split table as if it were a single table. Such as Mycat. (The page view is not large, but a lot of table data table, we can take partition table, implementation is also relatively simple)

Select a uniform character set. MySQL uses a “ladder” approach to setting character set defaults. Each database, each table has its own default values, which are inherited layer by layer. Ultimately, the default Settings at the bottom will affect the objects you create. Conversion between different character sets and collation rules may introduce additional overhead and affect database performance.

Set primary keys and indexes properly.

Primary keys are auto-added primary keys and service primary keys.

Autoadd primary key: High write efficiency, high query efficiency, and high disk utilization. However, two levels of indexes are required for each query because online services do not directly use primary key columns.

Service primary key: The write and query efficiency and disk utilization are low. However, level-1 indexes can be used to override indexes. In some cases, non-primary key indexes can be used to complete the query once

Generally, the service primary key is used.

Indexes fall into three categories.

Primary key index: InnoDB automatically creates indexes on primary keys of tables. Data structures use B+Tree.

Non-primary key indexes: Indexes on non-primary key columns are secondary indexes (because two index trees need to be looked up in one query)

Joint index: A joint index is also called a multi-column index. The key of the index structure contains multiple fields. When sorting, the first column is compared, and the second column is compared if they are the same.

2: Database optimization (mysql)

Speaking of mysql optimization, it is important to understand the principles of mysql so that you can deeply understand the SQL rules. The following figure shows the logical architecture of MySQL.




The MySQL client/server communication protocol is “half-duplex” : at any given moment, either the server sends data to the client or the client sends data to the server. The two actions cannot occur at the same time. Once one end starts sending a message, the other end needs to receive the entire message before it can respond to it, so there is no way or need to cut a message into small pieces and send it independently, and there is no way to control the flow. The client sends the query request to the server in a single packet, so the max_allowed_packet parameter needs to be set when the query statement is long. When the server responds to a client request, the client must receive the entire result, not simply take the first few results and tell the server to stop sending. Therefore, in the actual development, it is a very good habit to keep the query simple and only return the necessary data, and reduce the size and number of packets between communications. This is also one of the reasons to avoid the use of SELECT * and LIMIT in the query.

1) OPTIMIZATION of SQL. Analyze the execution plan through Explain and Show Profiles to identify problems and optimize accordingly. Among them, creating efficient indexes is one of the most effective means.

Multi-column index and index order:

When multiple indexes intersect (multiple AND conditions), an index that contains all related columns is generally preferable to multiple independent indexes.

Creating indexes on highly selective fields allows MySQL to filter out more rows in queries. For multi-column indexes, which index field comes first depends on how selective the index is. Indexes with high selectivity are ranked first to improve query efficiency. For example, groups of users in a federated index (user_group_id, trade_amount) must be more selective than the transaction amount of an order.

Overwrite index: If an index contains or overwrites the values of all the fields to be queried, there is no need to query back to the table. This is called an overwrite index. Overwriting an index is a very useful tool that can greatly improve performance, as a query that only needs to scan the index provides many benefits:

Optimize associated query: Drive large tables from small tables.

Replace subqueries with joins as much as possible. This is because of join, MySQL does not need to create temporary tables in memory to fulfill this logical requirement.

Ensure that any expressions in GROUP BY and ORDER BY refer to only one column in a table so that MySQL can use indexes for optimization.

Optimization of LIMIT paging: A common problem is when the offset is very large, such as LIMIT 10000 20, MySQL needs to query 10020 records and only return 20 records, the first 10000 will be discarded, which is very expensive. One of the easiest ways to optimize such queries is to use overwrite index scans whenever possible, rather than querying all columns. Then do an associated query as needed and return all columns. For large offsets, the efficiency gains are huge. Consider the following query:

SELECT film_id,description FROM film ORDER BY title LIMIT 50,5;

Revised:

SELECT film.film_id,film.description

FROM film INNER JOIN (

SELECT film_id FROM film ORDER BY title LIMIT 50,5

) AS tmp USING(film_id);

Optimize UNION: Unless you really need to de-duplicate the server, use UNION ALL. Without the ALL keyword, MySQL will add a DISTINCT option to temporary tables. This will result in a unique check for the data in the entire temporary table, which can be very expensive.

Avoid writing that invalidates indexes

(1) Do not use an index for negative conditional queries

Mysql > select * from * where id = ‘XX%’;

(3) Indexes should not be used for fields with little data differentiation

(4) The index cannot be matched by the calculation on the attribute

(5) the left prefix of composite index is not satisfied

A cast will scan the entire table

Limit 1 can improve efficiency if you know that only one result will be returned

2) Set some parameters of mysql reasonably to achieve the most efficient.

Thread_pool_size: If the primary storage engine is InnoDB, the optimal setting for thread_pool_size is probably between 16 and 36, with the most common optimizations preferring 24 to 36.

Thread_pool_stall_limit: Ensures that the server is not completely blocked by handling blocked and long-running statements. Setting too long can cause threads to block, causing performance problems.

Tmp_table_size: Increases the size of a temporary table by setting the tmp_table_size option, such as order by,GROUP by. If the value is increased, MySQL will increase the size of the heap table, which improves the speed of join queries. It is recommended to optimize the query as much as possible. Ensure that temporary tables generated during the query are in memory, so as not to generate hard disk based MyISAM tables due to large temporary tables