MySQL Logical Architecture

The top-level services are not unique to MySQL; most web-based client/server tools or services have a similar architecture. Such as connection handling, authentication, security, and so on.

The layer 2 architecture is the interesting part of MySQL. Most of MySQL’s core service functions are in this layer, including query resolution, analysis, optimization, caching, and all of the built-in functions (for example, date, time, math, and encryption functions). All of the cross-storage engine functions are implemented in this layer: stored procedures, triggers, views, and so on.

The third layer contains the storage engine. The storage engine is responsible for storing and extracting data from MySQL. Like the various file systems under GNU/Linux, each storage engine has its advantages and disadvantages. The server communicates with the storage engine through apis. These interfaces mask the differences between storage engines and make these differences transparent to the upper-layer query process. The storage engine API contains dozens of low-level functions that perform operations such as “start a transaction” or “extract a row based on a primary key.” But the storage engine does not parse the SQL, and the different storage engines do not communicate with each other, but simply respond to requests from the upper-layer server.

Connection management and security

Each client connection will have a thread in the server process, and queries for this connection will only be executed in this single thread, which can only take turns running in the CPU. The server is responsible for caching threads, so there is no need to create or destroy threads for each new connection. When the client (application) connects to the MySQL server, the server must authenticate it. Authentication is based on the user name, original host information, and password.

Optimization and execution

MySQL parses queries, creates internal data structures (parse trees), and then optimizes them in a variety of ways, including rewriting queries, determining the order in which tables are read, and selecting appropriate indexes. The user can influence its decision process through a special hint optimizer. The optimizer can also be asked to explain the various elements of the optimization process so that users can see how the server makes optimization decisions and provide a reference base for refactoring queries and schemas and modifying configurations to make the application run as efficiently as possible. The optimizer doesn’t care what storage engine the table uses, but the storage engine does have an impact on optimizing the query. The optimizer asks the storage engine for information on capacity or the cost of a specific operation, as well as statistics on table data. For SELECT statements, the server checks the Query Cache before parsing the Query. If a Query is found in the Query Cache, the server does not have to perform the entire process of Query parsing, optimization, and execution. Instead, the server directly returns the result set in the Query Cache.

The storage engine

InnoDB storage engine

InnoDB is MySQL’s default transactional engine and the most important and widely used storage engine. It is designed to handle a large number of short-lived transactions, which are mostly committed normally and rarely rolled back. InnoDB’s performance and automatic crash recovery features make it popular for non-transactional storage requirements as well. MVCC is used to support high concurrency, and four standard isolation levels are implemented. In addition, the next-key locking strategy is adopted to prevent phantom reading. Gap locking allows InnoDB to lock not only the rows involved in the query, but also gaps in the index to prevent phantom rows from being inserted.

MyISAM storage engine

In MySQL5.1 and earlier, MyISAM was the default storage engine. MyISAM offers a number of features, including full-text indexing, compression, spatial functions (GIS), and so on, but MyISAM does not support transaction and row-level locking, and has the unquestionable drawback of not being able to safely recover from a crash.

The data type

MySQL supports a wide variety of data types, and choosing the right data type is critical to achieving high performance. No matter what type of data you store, a few simple rules can help you make a better choice.

  • Smaller is usually better: In general, you should try to use the smallest data type that can store data correctly. Smaller data types are generally faster because they take up less disk, memory, and CPU cache, and require fewer CPU cycles to process.
  • Simple is good: Operations with simple data types generally require fewer CPU cycles. For example, integer manipulation is less expensive than character manipulation because character sets and collation rules (collation rules) make character comparison more complex than integer comparison.
  • Try to avoid NULL: If the query contains nullable columns, it is more difficult for MySQL to optimize because nullable columns complicate indexes, index statistics, and value comparisons. Nullable columns use more storage space and require special handling in MySQL. When nullable columns are indexed, one extra byte is required for each index record

Integer types

There are two types of numbers: whole numbers and real numbers.

The integer

If you store integers, you can use these types of integers:

  • TINYINT: 8-bit storage space
  • SMALLINT: 16-bit storage space
  • MEDIUMINT: 24-bit storage space
  • INT: 32-bit storage space
  • BIGINT: 64-bit storage space

Values stored range from -2(n-1) to 2(n-1), where N is the number of bits of storage space. The integer type has an optional UNSIGNED property that allows no negative values and doubles the maximum number of positive numbers. For example, tinyint. UNSIGNED can be stored in a range of 0-255, while TINYINT can be stored in a range of -128-127. Signed and unsigned types use the same storage space and have the same performance. MySQL can specify widths for integer types, such as INT(11), but for most applications this doesn’t make sense: it doesn’t limit the legal range of values, just the number of characters that some of MySQL’s interactive tools can display. INT (1) and INT (20) are the same for storage and computation.

The set of real Numbers

A real number is a number with a fractional part. However, they’re not just for storing decimal parts; You can also use DECIMAL to store integers larger than BIGINT. MySQL supports both exact and imprecise types.

The DECIMAL type is used to store exact decimals and supports exact calculations. You can specify the maximum number of digits allowed before and after the decimal point. This affects column space consumption. For example, DECIMAL (18,9) stores 9 numbers on either side of the DECIMAL point, using a total of 9 bytes: 4 bytes for the number before the DECIMAL point, 4 bytes for the number after the DECIMAL point, and 1 byte for the DECIMAL point itself. The DECIMAL type allows up to 65 digits.

Floating-point types typically use less space than DECIMAL to store the same range of values. FLOAT is stored in four bytes. DOUBLE takes up eight bytes and has greater precision and scope than FLOAT. As with integer types, you can only choose the storage type. MySQL uses DOUBLE as the type for internal floating-point calculations.

Because of the extra space and computation overhead, you should try to use DECIMAL only for precise calculations of decimals — for example, for storing financial data. However, for large amounts of data, consider using BIGINT instead of DECIMAL, multiplying the monetary units that need to be stored by the number of DECIMAL places. Assuming that you want to store financial data to 1/10,000th accuracy, you can multiply all amounts by a million and store the result in BIGINT, which avoids both the imprecision of floating-point storage and the high cost of DECIMAL precision.

String type

MySQL supports multiple string types, and there are many variations of each type.

VARCHAR

The VARCHAR type is used to store variable-length strings and is the most common string data type. It is more space-efficient than fixed-length types because it uses only as much space as is necessary (for example, shorter strings use less space). VARCHAR requires 1 or 2 extra bytes to record the length of the string: if the maximum length of the column is less than or equal to 255 bytes, only 1 byte is used, otherwise 2 bytes are used. VARCHAR saves storage space, so it also helps performance. However, because the row is lengthened, the UPDATE may make the row longer than it originally was, resulting in additional work. If a row takes up more space and there is no more space to store on the page, InnoDB splits the page to make the row fit on the page. Using VARCHAR is appropriate in the following situations:

  • The maximum length of a string column is much larger than the average length
  • Column updates are rare, so fragmentation is not a problem
  • Complex character sets like UTF-8 are used, where each character is stored in a different number of bytes.

CHAR

The CHAR type is fixed length: MySQL always allocates enough space based on the length of the string defined. When storing a CHAR value, MySQL removes all trailing whitespace. CHAR values are padded with Spaces as needed for comparison. CHAR is good for storing very short strings, or all values close to the same length. For example, CHAR is good for storing the MD5 value of a password because it is a fixed-length value.

CHAR is also better than VARCHAR for frequently changing data because fixed-length CHAR types are less prone to fragmentation. For very short columns, CHAR is also more storage efficient than VARCHAR. For example, using CHAR(1) to store only Y and N values requires only one byte if you use a single-byte character set to generate 5, but VARCHAR(1) requires two bytes because there is an extra byte of record length.

BINARY and VARBINARY

Types similar to CHAR and VARCHAR, which store binary strings. Binary strings are very similar to regular strings, except that binary strings store bytecode instead of characters. The padding is also different: MySQL populates BINARY with \0 (zero bytes) instead of Spaces, and does not remove padding values during retrieval. These types are useful when you need to store binary data and you want MySQL to use bytecodes instead of characters for comparison. The advantages of binary comparison are not limited to case sensitivity. MySQL compares BINARY strings one byte at a time and by the value of the byte. Therefore, binary comparisons are much simpler than characters, and therefore much faster.

BLOB and TEXT types

A string data type designed to store very large amounts of data in binary and character formats. In fact, they belong to two different families of data types: the character types are TINYTEXT, SMALLTEXT, TEXT, MEDIUMTEXT, and EONGTEXT; The corresponding binary types are TINYBLOB, SMALLBLOB, BLOB, MEDIUMBLOB,LONGBLOB. BLOB is a synonym for SMALLBLOB, and TEXT is a synonym for SMALLTEXT.

Unlike other types, MySQL treats each BLOB and TEXT value as a separate object. Storage engines usually do special processing for storage. When BLOB and TEXT values become too large, InnoDB uses a dedicated “external” storage area, where each value takes 1 to 4 bytes to store a pointer in a row, and then stores the actual value in the external storage area.

The only difference between BLOB and the TEXT family is that BLOB types store binary data without collation or character sets, whereas TEXT types have character sets and collations.

Date and time types

DATETIME

This type can hold a wide range of values, from 1001 to 9999 years, with an accuracy of seconds. It encapsulates the date and time into an integer format YYYYMMDDHHMMSS, irrespective of time zone. Use 8 bytes of storage space. By default, MySQL displays DATETIME values in a sortable, unambiguous format, such as “2008-01-16 22:37:08”. This is the date and time representation as defined by the ANSI standard.

TIMESTAMP

The TIMETAMP type holds the number of seconds since midnight GMT on January 1, 1970, which is the same as the UNIX timestamp. TIMESTAMP uses only four bytes of storage, so it has a much smaller range than DATETIME: it can only represent the years from 1970 to 2038.

Except for special behavior, TIMESTAMP should generally be used as much as possible because it is more spatially efficient than DATETIME. Sometimes people intercept Unix times as integer values, but that doesn’t bring any benefit. The integer format for saving time cuts is usually cumbersome, so we don’t recommend it.

Bit data type

MySQL has a few storage types that use compact bits to store data. All of these bit types, regardless of the underlying storage format and processing, are technically strings.

BIT

You can use the BIT column to store one or more true/false values in a column. BIT(1) defines a field containing a single BIT, BIT(2) stores two bits, and so on. The maximum length of the BIT column is 64 bits. In InnoDB, each BIT column is stored with a minimum integer type that is sufficient for storage, so storage space is not saved. MySQL treats BIT as a string type, not a number type. When the value of BIT(1) is retrieved, the result is a string containing binary 0 or 1 values, rather than the ASCII “0” or “1”. So we think the BIT type should be used with caution. For most applications, it is best to avoid using this type. If you want to store a true/false value in the storage space of a bit, another way is to create a CHAR (0) column that can be null. This column can hold either a NULL value (NULL) or a string of length zero (empty string).

SET

If you need to store many true/false values, consider merging these columns into a SET data type, which is represented within MySQL as a collection of packaged bits. This makes efficient use of storage space. The main disadvantage is that it is expensive to change the column definition: ALTER TABLE is required, which is a very expensive operation for large tables. In general, there is no index lookup on the SET column.

Table design

While there are some general good or bad design principles, there are also some problems caused by MySQL’s implementation mechanism, which means it is possible to make specific mistakes that only happen under MySQL. Such as:

Too many columns

MySQL’s storage engine API works by copying data in row buffer format between the server layer and storage engine layer, and decoding the cached contents into columns at the server layer. Converting encoded columns from row buffers to row data structures can be very costly. The cost of conversion depends on the number of columns.

Too many connections

The so-called entity-property-value (EAV) design pattern is a common bad design pattern that doesn’t work properly, especially with MySQL. MySQL limits each association operation to a maximum of 61 tables, but EAV databases require many self-associations. We’ve seen quite a few EAV databases end up exceeding this limit. In fact, the cost of parsing and optimizing queries can also become an issue for MySQL in many cases where there are fewer than 61 tables associated. As a rough rule of thumb, it is best to associate a single query within 12 tables if you want fast and concurrent query execution.

A foreign key

Foreign key restrictions put constraints into MySQL, which is better for scenarios where foreign keys must be maintained. However, this also introduces additional complexity and index consumption, as well as increased interaction between multiple tables, leading to more locks and contention in the system. Foreign keys can be thought of as an additional feature to ensure system integrity, but if you’re designing a high-performance system, foreign keys can become bloated. Many people do not use foreign keys when they are more concerned with the performance of the system, and maintain them through the application.

Simple rules:

  • Try to avoid excessive design, such as table design that results in extremely complex queries, or table design with many columns (many means between a little and a lot).
  • Use small and simple data types that are appropriate, and you should avoid NULL values as much as possible unless you have a specific need in your real data model.
  • Try to use the same data type to store similar or related values, especially columns to use in an association condition.
  • Watch out for variable-length strings, which can lead to pessimistic maximum-length memory allocation in temporary tables and sorts.
  • Use integer definitions for identifying columns whenever possible.
  • Avoid using features that MySQL has abandoned, such as specifying the precision of floating point numbers, or the display width of integers.
  • Be careful with ENUM and SET. While they are convenient to use, don’t abuse them or they can sometimes become a trap. It’s best to avoid using bits.

Paradigm and antiparadigm

Advantages and disadvantages of paradigms

When asking for help with performance problems, it is often recommended to stylize the design of database tables, especially in write-intensive scenarios. This is usually good advice. Formalization often pays off for the following reasons:

  • Stylized update operations are usually faster than antistylized.
  • When the data is well formalized, there is little or no duplicate data, so less data needs to be modified.
  • Canonical tables are usually smaller and fit better in memory, so operations are faster to perform.
  • Less redundant data means fewer DISTINCT or GROUPBY statements are needed to retrieve list data.

The disadvantage is that:

  • Associations are usually required. Slightly more complex queries may require at least one association, and perhaps more, on a normalized table.
  • Some index policies may be invalidated. For example, a stereotype might store columns in different tables that could have belonged to the same index if they were in one table.

The advantages and disadvantages of antiparadigm

Antistereotype design is a good way to avoid associations because all the data is in one table. If no associated tables are required, the worst-case scenario for most queries — even if the table does not use an index — is a full table scan. This can be much faster than correlation when the data is larger than memory, because it avoids random I/O.

Fully formalized and fully antiformalized design are concepts that often need to be mixed in practice, possibly using partially formalized design, cached tables, and other techniques.

MySQL Benchmark

A benchmark is a stress test designed for a system. The usual goal is to master the behavior of the system. Benchmarks can do the following,

  • Verify some assumptions based on the system and verify that these assumptions are true. Reproduce some abnormal behavior in the system to resolve these exceptions.
  • Test the current operating status of the system. If you don’t know the current performance of your system, you can’t be sure how well certain optimizations are working. Historical benchmark results can also be used to analyze and diagnose unpredictable problems.
  • Simulate a higher load than the current system to identify scalability bottlenecks that the system may encounter as the pressure increases.
  • Plan for future business growth. Benchmarks can assess what kind of hardware will be needed, what capacity of the network will be required, and other related resources under the project’s future load. This helps reduce the risk of system upgrades and major changes.
  • Test your application’s ability to adapt to a variable environment. For example, benchmarks can be used to discover how a system performs at random concurrency spikes, or across servers with different configurations. Benchmarks can also test a system’s ability to handle different data distributions.

Strategies for benchmarking

There are two main strategies for benchmarking: testing the system as a whole, and testing MySQL alone. These two strategies are also known as full-stack and single-component benchmarks. There are several reasons for doing integrated testing for the entire system, rather than testing MySQL alone:

  • Testing the entire application, including the Web server, application code, network, and database, is very useful because users are not just interested in the performance of MySQL itself, but the performance of the application as a whole.
  • MySQL is not always a bottleneck, as the overall testing shows.
  • The impact of caching between parts can only be found when tested as a whole. Integrated testing of the whole application reveals more about the true performance of the application than testing of individual components.

Integrated test tools

http_load

You can provide multiple urls from a single input file, and http_LOAD selects randomly from these urls for testing. You can also customize http_load to test on a time ratio rather than just maximum request processing capacity.

JMeter

JMeter is a Java application that can load other applications and test their performance. Although it is designed to test Web applications, it can also be used to test other applications such as FTP servers, or to test database queries through JDBC. JMeter is much more complex than EITHER AB or HTTP Load. For example, it can simulate real user access more flexibly by controlling parameters such as warm-up time. JMeter has a graphing interface (with built-in graphical processing capabilities) and can record tests and then replay the results offline.

Single component test tool

mysqlslap

Mysqlslap simulates server load and outputs timing information. The number of concurrent connections can be executed and SQL statements can be specified (either on the command line or written to a parameter file). If no SQL statement is specified, mysqlSLAP automatically generates a SELECT statement for the query schema.

MySQL Benchmark Suite (sql-bench)

The MySQL distribution also provides a benchmark suite of its own that can be used to compare tests across different database servers. It is single-threaded and primarily used to test how fast the server can execute queries. The results show which types of operations are performed faster on the server.

sysbench    

A multithreaded system pressure measurement tool. It can evaluate system performance based on various factors that affect database server performance. For example, you can test file I/O, operating system schedulers, memory allocation and transfer speeds, POSIX threads, and database servers. Sysbench supports the Lua scripting language, and Lua can be very flexible for a variety of test scenarios. Sysbench is a versatile testing tool that supports hardware testing for MySQL, operating systems, and hardware.

reference

High performance MySQL