To make PHP more efficient, programmers not only need to write logical, efficient code, but also need to be able to optimize query statements. Although we can’t control the speed of database reads and writes, PHP can also achieve faster access speed with the help of some database class extensions such as memcache, mongodb, redis and other data storage servers, so it is necessary to understand these extensions.
Optimization for large storage
Database master/slave replication and read/write separation
1. The master records the changes to the binary log, and the slave copies the binary of the master to its relay log and returns the data to its own data to replicate the data of the master server. Primary/secondary replication can be used for database load balancing, database backup, and read/write separation. Ini /my.conf [mysqld] log-bin=mysql-bin // Enable binary log server-id=102 // configure the unique ID of the server Log-bin =mysql-bin // Enable binary log server-id=226 // Unique ID of the server 4. GRANT REPLICATION SLAVE ON *.* to ON the primary server 'slavename'@'IP' identified by 'root' 5, change master to master_host="masterip", Master_user = "masteruser," master_password = "masterpasswd"; 6. Then run the start slave command to start the slave replication. Do not forget to restart the server after each configuration change, and then you can run show master/slave status on the master/slave server to check the master/slave status. The read-write separation of database depends on MySQL middleware, such as mysql_proxy, Atlas and so on. By configuring these middleware to separate read and write from the master and slave servers, the slave server takes the responsibility of being read, thus reducing the burden of the master server.Copy the code
Sharding of database When the amount of data in the data table in the database is very large, both the index and the cache and other pressures are very large, sharding of the database, so that it is stored in multiple database servers or multiple tables, to reduce the query pressure. There are vertical segmentation, horizontal segmentation and joint segmentation.
Vertical sharding: When there are too many tables, the tables in the database that are closely related (like a module, often join queries) are sharded and placed on different master and slave servers.
Horizontal segmentation: in the table is not many, and the amount of data in the table is very large, in order to speed up the query, you can use hash algorithm, divide a data table into several, respectively put on different servers, speed up the query. The difference between horizontal shard and table partitioning lies in their storage media.
Joint sharding: In more cases, the amount of data in the data table and the data in the table is very large, so it is necessary to carry out joint sharding, that is, to divide the database into a distributed matrix for storage. The optimization of these databases, each of which can be taken out to write an article, is extensive and profound, understand and remember these ways, when there is a need for purposeful selection optimization, to achieve efficient database efficiency.
Index aspect optimization
In MySQL, index belongs to the concept of storage engine level. Different storage engines implement index in different ways. The following mainly discusses the index implementation of MyISAM and InnoDB storage engines.
MyISAM index implementation
MyISAM engine uses B+Tree as the index structure, and the data field of the leaf node stores the address of the data record. Here is the MyISAM index schematic:
Figure 1 shows the Primary key of a MyISAM table, assuming we use Col1 as the Primary key. You can see that MyISAM’s index file only holds the address of the data record. In MyISAM, there is no difference in structure between primary and Secondary keys, except that the primary index requires a unique key, while Secondary keys can be repeated. If we create a secondary index on Col2, the structure of this index is as follows:
Also a B+Tree, data field holds the address of the data record. Therefore, the index retrieval algorithm in MyISAM is to search the index according to THE B+Tree search algorithm. If the specified Key exists, the value of its data field is extracted, and then the corresponding data record is read with the value of the data field as the address.
MyISAM’s index is also called “non-clustered” to distinguish it from InnoDB’s clustered index.
InnoDB index implementation
InnoDB also uses B+Tree as an index structure, but the implementation is quite different from MyISAM.
The first big difference is that InnoDB’s data files are themselves index files. MyISAM index files and data files are separate, and the index file only holds the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+Tree, and the data field of the leaf node of this Tree holds complete data records. The key of this index is the primary key of the table, so the InnoDB table data file itself is the primary index.
Figure 3 is a diagram of the Main index (which is also a data file) of InnoDB. You can see that the leaf node contains the complete data record. This kind of index is called a clustered index. InnoDB requires tables to have primary keys (MyISAM does not have primary keys) because InnoDB data files themselves are aggregated by primary keys. If this is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key. MySQL automatically generates an implied field for the InnoDB table as the primary key. This field is 6 bytes long and of type long integer.
The second difference from MyISAM index is that InnoDB’s secondary index data field stores the value of the primary key of the corresponding record rather than the address. In other words, all InnoDB secondary indexes refer to the primary key as the data field. For example, Figure 4 shows a secondary index defined on Col3:
The ASCII code of English characters is used as the comparison criterion. Clustered index implementations make searching by primary key very efficient, but secondary index searches require retrieving the index twice: first, retrieving the secondary index for the primary key, and then using the primary key to retrieve records from the primary index.
Understand the different storage engines index is implemented for the proper use of and optimize the index are very helpful, know the InnoDB index after implementation, for example, it is easy to understand why not suggest using long field as the primary key, because all the auxiliary index reference primary index, long primary index will make auxiliary index becomes too large.
For example, using non-monotonic fields as primary keys is not a good idea in InnoDB, because InnoDB data files themselves are a B+Tree. Non-monotonic primary keys cause data files to be constantly split and adjusted to maintain B+Tree features when new records are inserted, which is inefficient. Using an increment field as a primary key is a good choice.
Data query optimization
In every time-consuming query case, you see some unnecessary extra operations, some extra operations being repeated many times, some operations being executed too slowly, and so on. The purpose of optimizing queries is to reduce and eliminate the time spent on these operations.
First, optimize data access
The most basic reason for query performance is that too much data is accessed. Therefore, inefficient queries are generally analyzed in two steps:
Verify that your application is retrieving more data than it needs. This usually means that too many rows are accessed, but sometimes it can also mean that too many columns are accessed. Verify that the MySQL server layer is parsing a large number of more rows than necessary.
1.1 whether unnecessary data is requested from the database
When accessing the database, only required rows and columns should be requested. Requesting extra rows and columns can consume CPU and memory resources of the MySQL server and increase network overhead.
1. When dealing with paging, use limits to LIMIT MySQL to return only the data it needs, rather than returning all the data to the application and having the application filter the rows that are not needed.
2. Avoid using SELECT * mindlessly when associating multiple tables or getting data from a single table
3. When some data is used multiple times, you can consider caching the data to avoid the need to query MySQL every time you use it.
1.2, MySQL is scanning for additional records, let MySQL use the most appropriate way to query the data
With MySQL, the simplest way to measure query overhead is by three metrics: response time, number of rows scanned, and number of rows returned. The main consideration here is to improve the scanning method, that is, the way the data is queried.
The data can be queried by full table scan, index scan, range scan, unique index query, constant reference, etc. The speed of these queries varies from slow to fast, and the number of rows scanned varies from large to small. The type column in the EXPLAIN statement reflects which way the query is run.
You can usually improve the way you query data by adding appropriate indexes to minimize the number of scanned rows and speed up the query.
For example, if you find that a query scans a large number of rows but returns only a small number of rows, you can consider using an overwrite index, where all the columns needed are placed in the index. This allows the storage engine to return the result without going back to the table to retrieve the corresponding row.
Second, the method of reconstructing the query
When designing queries, consider whether you need to split a complex query into multiple simple queries. I remember hearing a rule of thumb many times: Do not put in an application what you can do in a database. Databases are much more powerful than we think. This rule of thumb was given to me by one of the Oracle gurus when China Asset Management was using Oracle to write SQL. I later applied it to MySQL and had a hard time using it.
Of course, this is because Oracle and MySQL do not have the same processing logic, and the cost of network communication, query parsing and optimization is not as high as before. Again, the rule of thumb only works in a certain cage.
Decomposing complex queries:
You can split a large query into several smaller queries, each of which performs only a small portion of the overall query task and returns only a small portion of the results at a time.
Deleting old data is a good example.
If only one statement is used to perform a large delete operation at a time, you may need to lock a lot of data at once, occupying the entire transaction log, depleting system resources, and blocking many small but important queries. Splitting a large delete operation into several smaller deletes can spread the pressure on the server over multiple operations, minimize MySQL performance, reduce lock wait time during deletes, and reduce the delay of MySQL master/slave replication. I use this method all the time.
Another example is decomposing a relational query, which is a single table query for each table to be associated and then correlating the results in the application. That’s what a long-time Ali colleague did when I coded with him at a previous company. Later, I silently despised him, because I had a rule of thumb in mind (what can be done in the database should not be put in the application, the database is much more powerful than we think), and I never use two SQL to do what can be done in one SQL.
Another reason to do this, of course, is to keep the logic for retrieving the data as separate from the business code as possible, making it easier to switch databases later. Is that actually the case? Not necessarily. That ignorance made me suffer, and I later broke down most of the SQL I wrote because of performance problems.
Refactoring queries by decomposing associated queries has the following advantages:
Make caching more efficient. Many applications can easily cache the corresponding result object of a single table query. Breaking down the query and executing a single query reduces lock contention. Association at the application layer makes it easier to split the database and achieve high performance and scalability. The query itself may also be more efficient. Can reduce redundant record queries. Do associative query at the application layer,
This means that an application only needs to query a record once, whereas an associated query in a database may require repeated access to some data. From this point of view, such refactoring may also reduce network and memory consumption. Further, this is equivalent to implementing hash association in the application, rather than using MySQL’s nested loop association. Some scenarios hash associations much more efficiently.
Optimization of database design
1, the database design conforms to the third normal form, for the convenience of query can have certain data redundancy. Int > date,time > enum,char> vARCHar > blob. When selecting data type, you can consider substitution, for example, IP address can be converted to unsign int by ip2LONG () function. 3. For char(n) types, keep the value of n as small as possible if the data is complete. MySQL supports RANGE, LIST, HASH, and KEY partition types, of which RANGE is the most commonly used. The partitioning method is as follows: CREATE TABLE tablename{}ENGINE Innodb /myisam CHARSET utf8 // Select the database ENGINE and PARTITION BY RANGE/LIST(column),// PARTITION by RANGE and predefined LIST PARTITION partName VALUES LESS THAN /IN(n),// Name the PARTITION and specify the scope of the PARTITION Myisam difference. Storage structure: MyISAM stores three files on disk. InnoDB keeps all tables in the same datafile and generally has 2GB transaction support: MyISAM does not provide transaction support. InnoDB provides transactional support for transactions. Table lock differences: MyISAM only supports table level locks. InnoDB supports transaction and row-level locking. Full-text indexing: MyISAM supports full-text indexing of type FULLTEXT (not for Chinese, so use sphinx full-text indexing engine). InnoDB does not support this. MyISAM stores the total number of rows in the table. Query count(*) is quick. InnoDB does not save the total number of rows in the table and needs to recalculate. Foreign key: MyISAM is not supported. InnoDB supportCopy the code
A few MySQL tips
First, it is easy to distinguish keywords and operation objects. Second, when SQL statements are executed, MySQL will convert them to uppercase. Manually writing uppercase can increase query efficiency (although very small).
ALTER TABLE tablename AUTO_INCREMENT=N ALTER TABLE tablename AUTO_INCREMENT=N ALTER TABLE tablename AUTO_INCREMENT=N
3. Add ZEROFILL to int to add ZEROFILL to int
Mysql > update index (s); update index (s);
When writing SQL statements, we can create a file with the suffix of.sql in the IDE. The IDE will recognize the SQL syntax and will be easier to write. More importantly, if your database is lost, you can still find this file, Run /path/ mysql-uusername -ppassword databasename < filename. SQL in the current directory to execute the SQL statement for the entire file (note that -u and -p are followed by the username and password without Spaces).
I hope the above content can help you. Many PHPer will encounter some problems and bottlenecks when they are advanced, and they have no sense of direction when writing too many business codes. I have sorted out some information, including but not limited to: Distributed architecture, high scalability, high performance, high concurrency, server performance tuning, TP6, Laravel, Redis, Swoft, Kafka, Mysql optimization, shell scripting, Docker, microservices, Nginx, etc. Many knowledge points can be free to share with you