Transfer: https://www.cnblogs.com/daxian2012/articles/2767989.html
Today, database operations are becoming a performance bottleneck for the entire application, especially for Web applications. Database performance is not just something for DBAs to worry about, it’s something for us programmers to worry about. When we design the structure of the database table, we need to pay attention to the performance of the data operation when we operate the database (especially the SQL statement when looking up the table). Here, we won’t talk much about SQL statement optimization, but only for MySQL, the most widely used database on the Web. Hopefully these optimization tips are useful to you.
1. Optimize your query for query caching
Most MySQL servers have query caching enabled. This is one of the most effective ways to improve performance and is handled by MySQL’s database engine. When many of the same queries are executed multiple times, the query results are placed in a cache so that subsequent queries can access the cache results without manipulating the table.
The main problem here is that it’s easy for programmers to overlook this. Because some of our queries will make MySQL not use caching. Take a look at the following example:
The difference between the above two SQL statements is the CURDATE() function. MySQL’s query cache does not apply to this function. Therefore, SQL functions such as NOW() and RAND() and others do not enable query caching because their returns are variable. So, all you need is a variable instead of a MySQL function to enable caching.
EXPLAIN your SELECT query
Use the EXPLAIN keyword to let you know how MySQL handles your SQL statements. This can help you analyze performance bottlenecks in your query or table structure.
The EXPLAIN query results will also tell you how your index primary keys are utilized, how your tables are searched and sorted… Wait, wait, wait.
Pick one of your SELECT statements (the most complex one with multiple table joins is recommended) and add the keyword EXPLAIN in front of it. You can do this using phpMyadmin. Then, you’ll see a table. In the following example, we forgot to add the group_id index and have a table join:
When we index the group_id field:
We can see that the first result shows 7883 rows searched, and the second one just searches rows 9 and 16 of the two tables. Looking at the Rows columns allows us to identify potential performance issues.
3. Use LIMIT 1 when only one row is required
Sometimes when you query a table, you already know that the result will only have one result, but because you might need to fetch the cursor, or you might check the number of records returned.
In this case, adding LIMIT 1 can increase performance. In this case, the MySQL database engine will stop searching after it finds a piece of data, rather than continuing to search for the next piece of data that matches the record.
The following example, just to find out if there are “Chinese” users, is obviously more efficient than the previous one. (Note that the first is Select * and the second is Select 1)
4. Index the search field
Indexes do not have to be assigned to primary keys or unique fields. If you have a field in your table that you want to search frequently, index it.
From the above you can see that the search string “last_name LIKE ‘a%'” is about 4 times worse than that of the search string (last_name LIKE ‘a%’).
In addition, you should also know what searches cannot use the normal index. For example, when you need to search for a word in a large article, such as: “WHERE post_content LIKE ‘%apple%'”, the index may be meaningless. You may need to use the MySQL full-text index or create your own index (e.g., search keywords or tags).
5. Join tables with similar types of examples and index them
If your application has a lot of JOIN queries, you should make sure that the JOIN fields in both tables are indexed. In this way, MySQL internally starts the SQL statement that optimizes the Join for you.
Furthermore, the fields used to Join should be of the same type. For example, if you want to Join DECIMAL fields with an INT field, MySQL cannot use their indexes. For those strings, the same character set is required. (The character set of the two tables may be different)
6. Don’t ORDER BY RAND()
Want to scramble the returned rows? Pick a random number, right? I don’t know who invented this, but many novices enjoy it. But what you don’t understand is how terrible the performance problem is.
If you really want to scramble the rows returned, there are N ways you can do this. Using it like this only degrades your database performance exponentially. The problem here is that MySQL has to execute the RAND() function (which is CPU intensive) to remember rows for each row and then sort them. Even if you use Limit 1, it doesn’t help.
The following example picks a record at random
7. Avoid SELECT *
The more data is read from the database, the slower the query becomes. Also, if your database server and WEB server are two separate servers, this can increase the load of network traffic.
Therefore, you should form a good habit of taking what you need.
8. Always set an ID for each table
Each table in the database should have an ID as its primary key, preferably an INT (UNSIGNED is recommended) and an AUTO_INCREMENT flag on it.
Even if your Users table has a field with a primary key called “email”, do not make it a primary key. Using the VARCHAR type as a primary key degrades performance. In addition, in your program, you should use table ids to construct your data structures.
In addition, there are some operations that require primary keys under the MySQL data engine. In these cases, the performance and Settings of primary keys become very important, for example, clustering, partitioning…
There is only one exception here, and that is the “foreign key” of an “associated table”, that is, the primary key of this table is formed by the primary key of several individual tables. We call this a “foreign key.” For example, if there is a “student table” with a student ID and a “course table” with a course ID, then the “score table” is the “association table”, which is associated with the student table and the course table. In the score table, the student ID and the course ID are called “foreign keys”, which together form the primary key.
9. Use ENUM instead of VARCHAR
ENUM types are very fast and compact. In fact, it holds a TINYINT, but it looks like a string. This makes it perfect for making a list of options with this field.
If you have a field such as “gender”, “country”, “ethnicity”, “status” or “department” and you know that the values of these fields are finite and fixed, then you should use ENUM instead of VARCHAR.
MySQL also has a “suggestion” (see # 10) that tells you how to reorganize your table structure. When you have a VARCHAR field, this advice tells you to change it to an ENUM type. You can get advice using PROCEDURE ANALYSE().
Get advice from PROCEDURE ANALYSE()
PROCEDURE ANALYSE() will let MySQL analyze your fields and their actual data for you, and will give you useful advice. These suggestions will only be useful if there is actual data in the table, because you need data to make big decisions.
For example, if you create an INT field as your primary key and don’t have much data, PROCEDURE ANALYSE() suggests changing the field’s type to MEDIUMINT. Or if you use a VARCHAR field, you might get a suggestion to change it to ENUM because there isn’t much data. These suggestions may not be accurate because there is not enough data.
In PHPMyAdmin, you can view these suggestions by clicking “Choose Table Structure” while viewing the table
It is important to note that these are only suggestions and will only become accurate as you add more data to your table. Always remember that you are the one who makes the final decision.
11. Use NOT NULL whenever possible
Unless you have a very specific reason to use NULL, you should always keep your fields NOT NULL. This may seem controversial, but read on.
First, ask yourself how much difference is there between “Empty” and “NULL” (0 and NULL if it’s INT)? If you think there is no difference between them, then you should not use NULL. You know what? In Oracle, NULL is the same as Empty!
Don’t assume NULL doesn’t require space, it requires extra space, and your program will be more complex when you make comparisons. Of course, this is not to say that you can’t use NULL, the reality is complicated, and there will still be situations where you need to use NULL.
12. Prepared Statements
Prepared Statements, much like stored procedures, are collections of SQL Statements that run in the background, and there are many benefits to be gained from using Prepared Statements, both in terms of performance and security issues.
Prepared Statements can check some of the variables you have bound to protect your program from “SQL injection” attacks. Of course, you can check your variables manually; however, manual checking is problematic and often forgotten by programmers. This problem is better when we use some framework or ORM.
In terms of performance, this gives you a significant performance advantage when the same query is used multiple times. You can define some parameters for these Prepared Statements and MySQL will only parse them once.
Although the latest version of MySQL uses binary schema in transmitting Prepared Statements, this makes network transmission very efficient.
Of course, there are situations where we need to avoid using Prepared Statements because they do not support query caching. But it is said to be supported after version 5.1.
To use Prepared statements in PHP, you can refer to their manual: the mysqli extension or use a database abstraction layer such as PDO.
13. Unbuffered query
Normally, when you execute an SQL statement in your script, your program will stop there until no SQL statement is returned, and then your program will continue to execute. You can change this behavior with unbuffered queries.
Mysql_unbuffered_query () sends an SQL statement to MySQL rather than automatically fethch and cache the results as mysql_query() does. This saves a lot of memory, especially for queries that produce a lot of results, and you don’t need to wait for all the results to return, just the first row of data, before you start working on the results.
However, there are some limitations. Either you read all the rows, or you call mysql_free_result() to clear the result before the next query. Also, mysql_num_rows() or mysql_data_seek() will not be available. So, you need to think carefully about whether to use unbuffered queries.
14. Save the IP address as an UNSIGNED INT
Many programmers create a VARCHAR(15) field to hold the IP as a string rather than an integer. If you store it with an integer, it only takes 4 bytes, and you can have fields of fixed length. Also, this gives you a query advantage, especially if you need to use WHERE conditions like IP between ip1 and ip2.
We must use UNSIGNED INT because IP addresses use the entire 32-bit UNSIGNED integer.
For your query, you can use INET_ATON() to turn a string IP into an integer, and INET_NTOA() to turn an integer into a string IP. In PHP, there are also functions like ip2long() and long2ip().
Fixed length watches are faster
If all fields in a table are “fixed length”, the entire table is considered “static” or “fixed-length”. For example, the table does not have the following types of fields: VARCHAR, TEXT, BLOB. As long as you include one of these fields, the table is no longer a “fixed-length static table” and the MySQL engine handles it in a different way.
Fixed length tables improve performance because MySQL searches faster, and because these fixed lengths make it easier to calculate the offset of the next data, the read is naturally faster. If the field is not fixed length, the program needs to find the primary key each time it has to find the next one.
Also, fixed-length tables are easier to cache and rebuild. The only side effect, however, is that fixed-length fields waste some space, because they have to allocate that much space whether you use them or not.
Using the “vertical split” technique (see next article), you can split your table into two, one of fixed length and one of indefinite length.
16. Vertical segmentation
Vertical partitioning is a method of dividing a table in a database into several tables by column, which can reduce the complexity of the table and the number of fields for optimization purposes. (BEFORE, I worked on a project in a bank, and I saw a table with more than 100 fields, which was horrible)
Example 1: In the Users table there is a field called the home address. This field is optional, in contrast, and you do not need to read or write this field very often except for personal information when operating in the database. So why not put him on another list? This is going to give you better performance for your table, and if you think about it, most of the time, for my user table, only user ID, user name, password, user role and so on are used frequently. Smaller tables will always have better performance.
Example 2: You have a “last_login” field that is updated every time the user logs in. However, each update causes the query cache for this table to be cleared. So, you can put this field in another table so that you don’t have to read the user ID, user name, and user role over and over again, because the query cache will add a lot of performance.
In addition, you need to note that you will not often Join the partitioned columns in the table, otherwise the performance will be worse than without partitioning, and there will be a drop in the number of poles.
17. Split large DELETE or INSERT statements
If you need to perform a large DELETE or INSERT query on an online site, you need to be very careful that your operation does not bring your entire site to a halt. Because these two operations will lock the table, the table is locked, other operations can not enter.
Apache has many child processes or threads. So, it works pretty efficiently, and our servers don’t want too many child processes, threads, and database links, which take up a lot of server resources, especially memory.
If you lock your table for a period of, say, 30 seconds, the amount of access processes/threads, database links, and open files accumulated in those 30 seconds for a highly trafficed site may not only cause you to Crash your WEB service, but also cause your entire server to Crash immediately.
So, if you have a big process and you’re sure you must split it up, using LIMIT conditions is a good way to do it. Here is an example:
18. The smaller the column, the faster
Hard disk operations are probably the most significant bottleneck for most database engines. So, making your data compact can be very helpful in this case, because it reduces access to your hard drive.
Refer to the MySQL documentation Storage Requirements for all data types.
If a table has only a few columns (such as dictionary tables, configuration tables), there is no reason to use ints as primary keys. MEDIUMINT, SMALLINT, or smaller TinyInts are more economical. If you don’t need to keep track of the time, DATE is much better than DATETIME.
Of course, you also need to leave enough room for expansion, otherwise you will die a horrible death if you do this later. See Slashdot’s example (November 06, 2009), where a simple ALTER TABLE statement took more than 3 hours because there were 16 million entries in it.
19. Select a correct storage engine
There are two storage engines in MySQL, MyISAM and InnoDB, and each has its pros and cons. MySQL: InnoDB or MyISAM? Discuss and discuss the matter.
MyISAM is good for some applications that require a lot of queries, but it’s not good for a lot of writes. Even if you update a single field, the entire table is locked, and no other process, even the reader process, can operate until the read is complete. In addition, MyISAM is super fast for computations like SELECT COUNT(*).
InnoDB tends to be a very complex storage engine, slower than MyISAM for small applications. He argues that it supports “row locking”, so it is better when there are more writes. It also supports more advanced applications, such as transactions.
Here is the MySQL manual
Target = “_blank” MyISAM Storage Engine
InnoDB Storage Engine
20. Use an Object Relational Mapper
Using An Object Relational Mapper (ORM), you can expect reliable performance gains. Everything an ORM can do can also be written manually. However, this requires a high-level expert.
The most important aspect of ORM is “Lazy Loading”, that is, it only does the value when it needs to be done. However, you need to be careful of the side effects of this mechanism, as it is possible to reduce performance by creating many, many small queries.
ORM can also package your SQL statements into a transaction, which is much faster than executing them individually.
Currently, personal favorite of PHP’s ORM is: Doctrine.
21. Watch out for permalink
The purpose of permalink is to reduce the number of times MySQL links are recreated. When a link is created, it remains connected forever, even after the database operation has ended. Moreover, after reusing its child process from our Apache — that is, the next HTTP request will reuse the Apache child process and reuse the same MySQL link.
Mysql_pconnect ()
In theory, that sounds pretty good. But from personal (and most people’s) experience, this feature creates more headaches. You only have a limited number of links, memory problems, file handles, etc.
Also, Apache runs in an extremely parallel environment, creating many, many more processes. This is why the permalink mechanism doesn’t work well. Before you decide to use permalink, you need to think about the architecture of your entire system.
Add: mysql enforces and disallows an index
Mysql > select * from user where id = ‘PRI’;
Such as:
select * from table force index(PRI) limit 2; (Enforces primary keys)
select * from table force index(ziduan1_index) limit 2; (Force index “ziduan1_index”)
select * from table force index(PRI,ziduan1_index) limit 2; (Forced use of indexes PRI and ziduan1_index)
Mysql > disallow index PRI (primary key PRI)
Such as:
select * from table ignore index(PRI) limit 2; (Do not use primary keys)
select * from table ignore index(ziduan1_index) limit 2; (Disallow index “ziduan1_index”)
select * from table ignore index(PRI,ziduan1_index) limit 2; (Disallow index “PRI,ziduan1_index”)