preface

No preface preface, mainly want to write about Mysql overall optimization of the article, can not think of a write to write more, will be divided into several write, you learn to waste it

First talk about the level and ideas to do optimization 👇

Optimization of levels and ideas

From the whole system and architecture to see the optimization level, the optimization level is decreasing upward, from the lowest level of requirements from a small adjustment to the highest level of hardware may present a geometric amplification effect

Product demand

The source of all technical implementations is product requirements, and a single requirement change in a product can magnify large-scale changes at the top of the pyramid.

For example, if the product adds the ability for each user to like an item, the underlying amount of stored data might be ‘user X item’, which quickly leads to an increase in storage hardware.

For example, because products do not understand the technology, they often put forward some brain-ache requirements, such as requiring all data to be updated in real time, but they do not know how much cost and difficulty this will add to the technology. At this time, the technology needs to put forward some technical solutions relative to product tuning, such as quasi-real-time or perceptive real-time data in specific business scenarios, which can not only meet the requirements but also solve or reduce the complexity.

A simple product needs to make a few business acceptable changes is better than various SQL queries and hardware optimizations later.

Therefore, we have a saying that technology must understand business. This “business understanding” does not require technicians to be able to drive the business or product model, but to understand the needs of product business and optimize and adjust the product while realizing technology.

System Architecture Design

Database is only an important part of the whole system but not all, we should know the positioning of database in the whole system before considering database design

What data should be stored in the database

First of all, non-data classes such as binary files, pictures and text should not be stored in the database.

In the narrow sense, data are also divided into two categories, namely online transaction processing (OLTP) and online analytical processing (OLAP). OLTP is the application of traditional relational database, which mainly deals with basic and daily transactions, such as commodity listing and order closing in our mall. OLAP is the main application of data warehouse system, supporting complex analysis operations, mainly used to support enterprise decision management analysis.

Mysql is a relational database. It is designed for OLTP, and Mysql can support OLAP data even if the OLAP data volume is small. The simple version uses Elasticsearch and Druid for heterogeneous storage, and the updated version uses HBase big data to support data.

What data can be used as a Buffer to reduce the strain on the database

How to reduce the pressure of the database, the simplest optimization is to reduce the frequency of database use, even if the data must be stored in Mysql, whether it is real-time storage, whether it is acceptable to accept data inconsistency at a certain time.

For example, in the above example, the user does not need to Update the user to Mysql in real time for each “like”. The user can first Update the user to the memory as a Buffer, and then synchronize the Update to Mysql after a certain amount of time or data.

What data can be used to Cache and reduce database stress

The same as the above data buffer, we will update unusual data can also be used as a cache, stored in memory, so that users can access the data in memory, reduce the number of direct access to the database.

What data to pick and how to synchronize database data with cache data is more a matter of policy.

Library table design

In the development and design stage, there is a stage to design the library tables, which is extremely important in the view of developers, because a lot of time after design is spent on CURD code development against these library tables.

Library table design can be divided into two levels of design.

One is the logical level, according to the business needs to design the split of the table, this time to meet the paradigm and break the paradigm, because in order to eliminate the dependency of transfer, it is necessary to consider the business query can not have too many table associations to maintain a certain degree of redundancy.

One is the physical level. During development, what storage engine should be selected, the naming convention for table names and field names, and what category should be used for the storage of table fields

SQL index

The stage of SQL index optimization is the most familiar stage for developers, that is, the stage of ENTERING CRUD. In fact, CURD is not simple at all. Operation and maintenance DBA will throw a large number of slow query logs and even deadlocked logs every day. They are very familiar with locks.

System configuration/hardware

There are a large number of system parameter configurations in Mysql, most of which can be used by default, but there are probably a small number of parameters that need to be tuned according to the actual situation.

By modifying the system parameters of Mysql according to the current status of the server, the existing resources of the service can be rationally utilized and the performance of Mysql can be reasonably improved.

Data preparation

sakila

There is always a need to have a batch of sample data for testing. Mysql official has provided Sakila simulation DVD rental sample database for us, which is a good choice for some experiments we will conduct next.

Landing dev.mysql.com/doc/sakila/… To download

Unzip and import our Mysql to view the relationship of the data gallery model with WorkBench

Storage engine selection

The most common and most commonly used ones are MyISAM and InnoDB, as well as others such as Memory, Merge, etc., but because they are so rarely used, they are out of the scope of our discussion.

MyISAM

MyISAM is the default storage engine for Mysql prior to version 5.5. It has the advantage of fast read speed, but it does not have transactions or foreign keys.

Storage structure

Each MyISAM table has three files on disk with the same filename and table name, the difference is the expansion name, respectively:

  • . FRM: storage definition
  • Myd: stores data
  • Myi: stores indexes

FRM is the structure that defines tables, not really the storage engine part. Myd file is the storage index, but only to save the pointer to the page, and MYI can be separated in different directories, evenly distributed IO, speed up access.

Locking and concurrency

MyISAM does not exist dirty read or phantom read phenomenon, that MyISAM to lock the entire table, is the so-called table level lock.

In concurrent read operations, all tables can acquire a shared lock (read lock), and each connection is independent of each other.

While writing data, an exclusive lock (write lock) is obtained and the entire table is locked, while all other operations, including read and write operations, wait.

Bad watch and repair

In many years ago, I often encounter the MyISAM table bad situation, MyISAM the cause of the bad table or table more, for example: the server crashed, disk failure, in writing mysqld process was kill off…

Commonly used several commands to solve

check table `actor`         //Check table repairtable `actor`        //Repair table optimizetable `actor`      //Optimize the tableCopy the code

InnoDB

InnoDB is the default storage engine for Mysql 5.5 and later. It has transactions and can achieve data consistency under high concurrency conditions.

Storage structure

InnoDB stores tables and separates table definitions and data indexes into 2 classes of files

  • .frm Storage definition
  • Ibd stores data index

FRM is defined storage, in fact, no matter what storage engine exists.

InnoDB stores data and indexes in both shared and exclusive tablespaces, which can be controlled by innodb_file_per_table

Innodb_file_per_table: ON indicates exclusive space. Each table has its own IBD file, which stores the data and index of the table. This configuration is more flexible, can achieve flexible table in the physical migration, performance and efficiency will be relatively better.

Innodb_file_per_table OFF indicates shared space. All table data and indexes in each database are stored in a file named IBdatA1. Having all the data and indexes in one file over a long period of time can result in large gaps in the table space.

Locking method

InnoDB automatically adds intentional locks without user intervention. InnoDB automatically adds exclusive locks (X) to design data sets for single Update, Insert, or Delete statements. Autocommit is turned on by default, and each SQL statement is wrapped as a transaction by default.

InnoDB does not add any locks to normal Select statements.

Shared/exclusive locks

  1. A transaction can read a row only after it has acquired the shared S lock for that row
  2. A transaction cannot modify or delete a row until it has an exclusive X lock on that row
S X
S Compatible with The mutex
X The mutex The mutex

That is:

  • Multiple transactions can acquire an S lock and read in parallel
  • Only one transaction can get the X lock, and write/read is mutually exclusive

The problem with locking is that reads and writes must be mutually exclusive, so multiple threads cannot be fully parallel, and InnoDB is extremely inefficient if this is the case.

The first transaction updates the row and the second transaction reads the row. We can do this without blocking. This is actually MVCC (multiple versions of data) to solve.

Row-level locks

InnoDB supports row-level locks, but not every query can fall into row-level locks. Queries must hit indexes to use row-level locks. If not, it will automatically degenerate to table level locks.

InnoDB uses row locks for all indexes, whether they are unique, unique, or plain. Don’t forget to check the explain plan. In some cases, InnoDB uses row locks for indexes. However, it is possible that Mysql decides that full table scanning is more efficient, so it does not use indexes. In this case InnoDb uses table-level locks.

InnoDB locks rows for indexes. InnoDB locks rows for indexes. It is possible for multiple sessions to access different rows of records, but if they use the same index key, lock conflicts can occur.

Take a test with the Sakila sample library:

part1.

part2.

part3.

Clearance lock

From the above example, we know that the so-called row-level locking does not really lock a row of records, but locks a range, based on the condition of locking part of the range, rather than mapping on a row, hence the scientific name: gap locking.

Such as:

select * from customer where store_id >= 2 and store_id<=10 LOCK IN SHARE MODE

It will lock store_id in the range from 2 to 10 and cannot insert any record in the range from 2 to 10.

Contrast and selection

contrast
MyISAM InnoDB
Storage structure Each table is stored in separate files Data and indexes are stored together, including shared tables and exclusive table Spaces
Transaction support Transactions not supported Support transactions
Lock the difference Table level lock Row-level locks are provided, but not absolutely, and may be upgraded to table locks if Sql cannot determine scan scope
Table primary key Allows no primary key to exist and indexes to hold row addresses If there is no primary key or unique index, a 6-byte primary key invisible to the user is automatically generated
The number of rows in the table Save table total number of rows, directly obtain You need to traverse the entire table to get
CRUD operations If you do a lot of Select, MyISAM is a better choice InnoDB is a better choice if there are lots of updates/inserts
A foreign key Does not support support
The index A non-clustered index that only holds the address Using clustered indexes, indexes are data, sequentially stored, so you can cache indexes, cache data
Clear the table MyISAM directly rebuilds the table The deletion is slow
choose

InnoDB has been used by Mysql as the default storage engine since Mysql 5.5.x. If there are no special requirements, we can just choose InnoDB.

Of course, MyISAM can be used for special needs, such as businesses where there are a lot of reads but few, such as news blogs, or for master-slave read/write separation, where the secondary library uses MyISAM and the primary library uses InnoDB.

Library table design

In general, our database table design needs to go through stages

  1. Requirements analysis: A comprehensive understanding of the storage requirements of the product design, what data needs to be stored, what the characteristics of the data are, data processing refers to how to read and write to the database and what the requirements are for the response time of the data.
  2. Do logical design and physical design

Logic design

After understanding the requirements, the main point of logical design lies in how to carry out normal form design and anti-normal form design. If the logic design is not good, it will greatly affect the coding efficiency of developers and reduce the efficiency of system execution.

The first paradigm

The so-called first normal form (1NF) is that the columns of the table are atomic and cannot be broken down again. As long as they can be put into Mysql, basically they can serve the first normal form. There are exceptions, for example:

This does not belong to 1NF. Each column of the database must be indivisible, not an array of collections. It needs to be split into name, age and Score fields. But in fact, this kind of collection in a field in our current development is also often happened, because in the database only need to regard this collection as a field, this collection is actually split in the front end or business layer, so in modern development does not need to adhere to the paradigm.

The second paradigm

The second normal form (2NF) is established on the basis of the first normal form (1NF). To satisfy the second normal form, the first normal form must be satisfied first. The second normal form requires that each row in a database table must be uniquely distinguished.

That is, the first normal form condition is satisfied, and only when there are multiple primary keys can the situation not conform to the second normal form occur. If you have two primary keys, you can’t have a property that only depends on one of them, that’s not second normal form. A common understanding is that any field depends only on the same field in the table.

👆 can’t read it? It doesn’t matter, I don’t understand it either. The popular way is to make sure that each table only describes one thing.

Here’s a popular example:

The table above meets the first normal form, but it is not well designed. The table describes two things, student information and course information. “Credit” depends on “course”, and “name” and “age” depends on “student number”. The consequences of this are:

  1. Data redundancy: If there are N students taking the same course, the student’s information is repeated N times
  2. Update exception: If credits for a course are adjusted, all rows in the database table related to that course must be updated, otherwise data inconsistencies will occur.
  3. Insert exception: If a new course is offered, but no one is taking it temporarily, the course cannot be inserted into the database because there is no key field student id.
  4. Deletion exception: If a course is deleted, all relevant lines need to be deleted, but at the same time, records that the relevant student has taken will also be deleted.

Make adjustments:

An intermediate table was added between the students and the course to solve the above problem, that is, ensure that each table describes only one thing.

The third paradigm

The attributes of the third normal form must first satisfy the second normal form. The attributes of the third normal form do not depend on other non-primary attributes. The third normal form requires that a database table does not contain non-primary key information in other tables, and non-PK fields cannot have a dependency relationship.

Bad example: Address dependence on department

Improved:

Antiparadigm design

The paradigm can avoid data redundancy, reduce the space of database tables, and reduce the trouble of maintaining data integrity. However, the higher the level of the paradigm, the more tables are designed, which may lead to the design of more tables in our business queries, requiring multi-table association.

So, from a business perspective, you have to deal with the anti-paradigm. The anti-paradigm is designed to violate the database 3 paradigm for performance and efficiency, allowing for a small amount of data redundancy, in other words, trading space for time.

Compare the advantages and disadvantages of normal and anti-normal

  • Advantages of paradigms:
    1. Reduce data redundancy
    2. A normalized table is smaller than an antinormalized table
    3. Stylized update operations force antiparadigms faster
  • Disadvantages of stereotype:
    1. Add more associated queries
    2. Indexes are difficult to optimize
  • Advantages of antiparadigm:
    1. Reduce table associations
    2. Better index optimization
  • Antiparadigm disadvantages:
    1. There is more data redundancy
    2. Modifying data requires modifying more tables

Understanding paradigms, but not sticking to them, and understanding the advantages and disadvantages of paradigms and antiparadigms to apply them more flexibly to practical business practices is more important.

Physical design

Once the logical design of Mysql is complete, we move on to the actual development phase where we create databases and tables. Select an appropriate storage engine for the table, appropriate character set, standardize the table and field naming rules, select the field type, create relevant foreign key constraints…

Select the appropriate storage engine

We have compared InnoDB and MyISAM’s features in the previous section, which engine should be selected to follow the business features, if there is no definite requirement, please choose InnoDB, if it is very obvious that read too much and write too little table, you can consider using MyISAM, which is a news portal, blog post website, Also consider using MyISAM from libraries as appropriate.

The author used to use MySIAM in one of the three major portals, and the version was 5.5X, because it was a portal website that read was an order of magnitude higher than write. Under the 5.5 version, I did a test and found that the read efficiency of MyISAM was much higher than InnoDB, but there were some bad tables in the process. Often manual repairs are required. MyISAM is rarely used after this.

Suitable character set selection

A character set is a set of coincidence and encoding. A collation is a set of rules used to compare characters in a character set, that is, the collation rules of a character set. MySQL can organize characters using cross-species character sets and validation rules.

When creating a new database or table, you need to specify the character set of the database. Utf8mb4 is usually selected.

utf8mb4

Mysql utF8 supports up to three bytes of UTF-8 characters, which is the “basic multitext plane (U 0000 to U FFFF)” in Unicode. It contains most international characters, but not all, including control characters, Latin, Chinese, Japanese, and Korean characters. Utf8mb4 (most BYTS 4) is a superset of UTF8 and fully compatible with UTF8, which can store more characters in 4 bytes.

How to choose utF8MB4unicodeci from UTF8MB4GeneralCI

In addition to storage, characters also need to be collated or sized, involving collation corresponding to the encoding character set. The collated character set corresponding to UT8MB4 is UTF8MB4_unicode_CI and UTF8MB4_general_CI

Mainly from two aspects of sorting accuracy and performance:

  • accuracy

Utf8mb4_unicode_ci is based on standard Unicode to sort and compare, and can sort accurately between various languages. Utf8mb4_general_ci does not implement Unicode collation rules, and in the case of certain special languages or characters, sorting results may not be expected.

  • performance

Utf8mb4_general_ci Faster when comparing and sorting UTF8MB4_unicode_CI In special cases, Unicode collation implements a slightly more complex sorting algorithm in order to be able to handle special characters.

In theory, utF8MB4_general_CI performance is better, but in practice, the performance of today’s computers is far from being a decisive factor in query efficiency. It is SQL statements and indexes and optimizations that really determine query efficiency. The recommendation is to use UTf8MB4_unicode_ci, although character ordering is important, it is not guaranteed to be messed up in some particular queries.

Select the appropriate field data type

One rule: When a column has multiple data types to choose from, the integer numeric type should be preferred, followed by the real numeric type, followed by the date or binary type, and finally the character type. Data types that occupy less space should be given priority over data types of the same level.

Number type field selection

Integer types

Mysql has 5 integer types

For example, tinyint, which takes up 1 byte, or 8 bits, can have 255 signed and unsigned integers, where signed stores negative integers (-128 to 127) and unsigned stores positive integers (0 to 255). If you don’t need negative numbers, use unsigned to double the number.

Wrong int (x)

Int (M) specifies the maximum real width of an Integer. The value of M does not depend on how much space int(M) occupies. Int (2),int(3), and int(4) all take up 4bytes of storage space on the disk.

If you only want 2-bit decimal, use the tinyint type directly.

Real type

The FLOAT and DOUBLE types use standard floating-point calculations for approximations, which can result in unexpected results due to loss of precision.

The MySQL DECIMAL data type is used to store exact values in the database. We often use DECIMAL data types to retain accurate and precise columns, such as monetary data in accounting systems.

To define columns of data type DECIMAL, use the following syntax

column_name DECIMAL(P,D);

  • P is the precision of the significant digit number. P ranges from 1 to 65.
  • D is the number of decimal places. The value of D ranges from 0 to 30. MySQL requires D to be less than or equal to (<=)P.
DECIMAL monetary data

Monetary data of the DECIMAL data type, such as price, salary, account balance, and so on, are often used. If you want to design a database that processes currency data, refer to the following syntax

The amount a DECIMAL (19, 2);

If you follow GENERALLY Accepted Accounting Principles (GAAP) rules, the currency column must contain at least 4 decimal places to ensure that the rounded value does not exceed $0.01. In this case, columns with 4 decimal places should be defined:

The amount a DECIMAL (19, 4);

Character type field selection

VarChar and Char behave differently in different storage engines, and since we’re mostly using InnoDB, we’ll only talk about using InnoDB.

VARCHAR

Varchar (M) this is used to indicate occupancy of characters rather than bytes, and many students will mistake this for occupancy of bytes. The number of bytes will depend on the character set we choose. For example, utF8 will take up to 3 bytes for a UTF8 character. Varchar (10) indicates that the field will take up to 30 bytes.

  • The VARCHAR type is used to store variable-length strings and is the most common string data type. It is more space-efficient than the fixed-length type because it only uses the necessary space, and although we defined varchar(55) we only use 10 characters, at most 10 characters.
  • If the length of the vARCHar is less than 255, an additional byte is used to record the length, and if the length is greater than 255, an additional two bytes are used to record the length.

In fact, there is no difference between vARCHAR (10) and VARCHar (200) as long as it is controlled within 255. In some scenarios, VARCHAR is also presented in a fixed-length way in Mysql memory temporary table to optimize performance, so we try to define the length close to the business.

VarChar uses the following scenarios:

  • The maximum length of a string column is much larger than the average length
  • Columns are updated sparingly, Mysql updates columns by splitting pages.
  • Complex character sets like UTF-8 are used, where each character is stored in a different number of bytes
CHAR

The CHAR type is fixed length: MySQL always allocates enough space based on the length of the string defined.

Char is suitable for short storage and all values are close to the same length. For example, Char is great for storing MD5 values or fixed-digit codes such as UUID or mobile phone numbers. If you do not rent a specified length for the inserted data, char fills in after it.

Why is Char efficient but probably wastes more space

For example, Char(1) is used for storage, and if it is a single-byte character it takes only one byte, but VarhChar(1) takes two bytes because there is one byte for the length.

We can see that vARCHAR allocates 1 to 2 bytes at the beginning of each data segment to record the length of the data, and there is a space segment (1 byte) at the end of the data segment to mark the bytes of the end record field. When Mysql reads data, it first reads the length of the data segment, and then reads the number of bytes of the response. So when Mysql traverses, magnetic needles need to read more disks than char columns to get data.

Mysql > select * from table where char is set to a fixed length; Mysql > select * from table where char is set to a fixed length; Mysql > select * from table where char is set to a fixed length;

Date and time type field selection

  • Time precision

After version 5.6.5, the default second accuracy can be decimal, up to 6 decimal places, which is accurate to microseconds (6 digits) precision

  • Time range

The time range of TIMESTAMP can only be from 1970-01-01 to 2038-01-19, while DATATIME can be from 1000-01-01 to 9999-12-31

  • Time zone difference

TIMESTAMP, which converts the time inserted by the client from the current time zone to UTC (Universal Standard Time) for storage. When querying, it is returned as the current time zone of the client. For DATETIME, nothing is changed and the input and output are basically as-is

  • Take up the byte

What I found on the Internet is: (TIMESTAMP occupies 4 bytes, DATETIME occupies 8 bytes), but this is actually the information of the previous version of 5.6.4. 5.6.4 From the beginning of DATETIME, the non-fractional time part occupies only 5 bytes. If there are seconds, the fractional part will occupy 0-3 bytes. V5.6.4 TIMESTAMP the non-decimal part occupies 4 bytes and the decimal part occupies 0-3 bytes

The preferred format is DATETIME, which supports a larger date range (2038 should be noisy soon) and does not convert for time zones, although it is 1+ bytes more than TIMESTAMP. Computers are not a bottleneck in modern times.

Several principles of field

Try to represent strings with numbers
  • Storage IP uses shaping Mysql provides two ways to handle IP addresses

    Inet_aton Converts IP to an unsigned integer (4-8 bits)

    Inet_ntoa Converts an integer to an IP address

  • If the amount storage does not require precision and requires only two decimal digits, it is recommended to use integer storage

    100.01 yuan -> 10001 cents

Select the smallest data type possible and specify the length of the segment

In general, you should try to use the smallest data type that can store data correctly. Smaller data types are generally faster because they stand on smaller disks, memory, and CPU caches, and require fewer CPU cycles to process

Use not NULL whenever possible

It is usually best to specify NOT NULL columns unless you really need to store NULL values. Because queries contain nullable columns, optimization is more difficult for MySQL because nullable columns complicate indexes, statistical indexes, and value comparisons. Also, nullable columns that are indexed require an additional byte per index record. In general, changing a nullable column to NOT NULL results in less promotion

Field comments should be complete

If there is no documentation and no data dictionary, field annotations are the last line of defense.