Recently, THERE is a requirement to modify the existing storage structure, which involves consideration of query conditions and query efficiency. I read several articles related to index and HBase, recalled relevant knowledge, and discussed my understanding and summary based on project requirements.
The general table of contents is as follows. The first three sections were introduced in the previous part, which analyzed why the index was fast, summarized its advantages and classification, as well as the evolution process of the index. The middle part will mainly introduce index analysis methods and common index optimization:
-
Why you need an index
-
Category of index
-
MySQL > alter database alter database
-
MySQL > select * from database;
-
HBase is introduced
-
HBase Storage Structure
-
HBase Index
-
Business requirements and design
Part of the content is excerpted from several bloggers’ posts, with links at the end thanking them for their excellent analysis.
Through the introduction of the middle part, you will know:
-
MySQL > select * from ‘MySQL’;
-
Advanced query related concepts
-
Explain command details
-
Index optimization recommendations
MySQL > select * from ‘MySQL’;
To better optimize the query, we must first understand its overall query process, from the client to send the query request, to receive the query result, MySQL server has done a lot of work.
Logical architecture
The overall logical architecture of MySQL is divided into three layers, namely the client layer, the core service layer and the storage engine layer.
The top layer is the client layer, such as connection processing, authorization, security and other functions.
The middle layer is the core services of MySQL, including query parsing, analysis, optimization, caching, built-in functions (e.g., timing, math, encryption, etc.), in addition, all cross-storage engine functions are implemented in this layer: stored procedures, triggers, views, etc.
The lowest layer is the storage engine, responsible for data storage and extraction, and the service layer in the middle communicates with the storage engine through apis that mask the differences between the different storage engines.
Specific Execution process
Take a look at how MySQL optimizes and executes queries. A lot of query optimization is just following a few principles to make the MySQL optimizer work the way it’s supposed to.
Let’s start with the overall process:
-
The client sends a query SQL to the server;
-
The server checks the query cache first and returns the results stored in the cache immediately if a hit is made.
-
SQL parsing and preprocessing are performed on the server, and then the optimizer generates the corresponding execution plan.
-
The query execution engine invokes the storage engine’s API to execute the query according to the execution plan generated by the optimizer.
-
Returns the result to the client;
1. Client/server communication protocol
The communication protocol between the MySQL client and server is “half duplex” : at any one time, either the server sends data to the client or the client sends data to the server, which means that there is no traffic control.
Client with a single packet will query requests sent to the server, the server response to the user data is usually a lot of, is composed of multiple data packets, it is important to note when the server response to a client request, the client must complete receiving the returned results, not just only take a few front as a result, then stop the server to send.
2. Query the cache
If the query cache is open, the query statement is checked to see if it matches the data in the query cache, and if so, the result in the cache is directly returned after the user permissions are checked.
The query cache system keeps track of every table involved in the query, and MySQL must invalidate all caches of the corresponding table during any write operation. If the query cache is very large or fragmented, this operation can be very costly to the system.
In addition, any query statement must be checked before it is started. Even if the SQL statement will never hit the cache, if the query result can be cached, it will be cached after execution, resulting in additional system consumption.
Therefore, be careful when enabling caching. Caching improves system performance only when it saves more resources than it consumes. You can set query_cache_type to DEMAND.
3. Syntax parsing and preprocessing
The SQL statement is parsed by keywords to generate a parse tree. Pre-processing further checks whether the parse tree is valid according to MySQL rules.
4. Query optimization
A query can be executed in many ways, and the optimizer’s job is to find the best execution plan. MySQL uses a cost-based optimizer, which tries to predict the cost of a query using a certain execution plan and selects the one with the least cost.
5. Query the execution engine
The storage engine interface provides a lot of functionality, but there are only a few dozen interfaces underneath that do most of the work of a query like building blocks.
6. Return the result to the client
The return of the result set to the client is an incremental and gradual process, so that the server does not have to store too many results and consume too much memory, and the client can get the results in the first place.
SELECT execution order
Let’s look at the order in which the SQL query is executed. Each step generates a virtual temporary table as input to the next step.
The standard SQL syntax is as follows:
SELECT DISTINCT < select_list >FROM < left_table > < join_type >JOIN < right_table > ON < join_condition >WHERE < where_condition >GROUP BY < group_by_list >HAVING < having_condition >ORDER BY < order_by_condition >LIMIT < limit_number >
Copy the code
But this is the order of execution:
FROM <left_table>ON <join_condition> <join_type> JOIN <right_table>WHERE <where_condition>GROUP BY <group_by_list>HAVING <having_condition>SELECT DISTINCT <select_list>ORDER BY <order_by_condition>LIMIT <limit_number>
Copy the code
1.FROM
When multiple tables are involved, the output of the left table is used as the input of the right table, and a virtual table VT1 is generated:
-
Calculate the Cartesian product (CROSS JOIN) of the two associated tables to generate the virtual table vt1-j1.
-
Based ON virtual table vt1-j1, the virtual table VT1-j2 is generated by filtering out all rows that meet ON predicate conditions.
-
If LEFT,RIGHT, and FULL join is used, the main table (reserved table) columns that do not meet the ON condition will be added to vt1-j2, resulting in virtual table vt1-j3.
2.WHERE
Temporary tables generated during VT1 are filtered, and columns that satisfy the WHERE clause are inserted into the VT2 table:
-
If there is an external join, ON filters the associated table. The main table returns all columns. If there is no external join, the effect is the same.
-
Filtering for the main table should be placed in WHERE;
-
For associated tables, ON is used for conditional query followed by join, and WHERE is used for conditional query followed by join.
3.GROUP BY
This clause groups the tables generated in VT2 BY the columns in GROUP BY to generate the VT3 table.
Columns used in subsequent processing statements, such as SELECT and HAVING, must be included in GROUP BY. For those that are not present, aggregate functions must be used.
4.HAVING
Different groups in VT3 table are filtered only for grouped data. Clauses that meet the HAVING condition are added to VT4 table.
5.SELECT
This clause processes the elements in the SELECT clause to generate the VT5 table:
-
Evaluate the expression in the SELECT clause to generate vt5-j1;
-
DISTINCT: A temporary memory table VT5-J2 is created, which is the same as the virtual table VT5-J1. The difference is that a unique index is added to the DISTINCT columns to divide the duplicate data.
6.ORDER BY
From the tables in VT5-j2, the VT6 table is generated BY sorting the results according to the conditions of the ORDER BY clause, which is the only place where aliases in SELECT can be used.
7.LIMIT
Selects the specified row starting at the specified location from the VT6 virtual table obtained in the previous step.
Advanced query related concepts
This section describes common advanced query concepts.
Join queries
A number of tables in accordance with a specified condition for data splicing, SQL will join query into four categories: internal join, external join, natural join, cross join, which natural join and cross join rarely used, but more introduction.
1. Inner join
Obtain each record from the left table and match it with all records in the right table. The matching record must meet the conditions in both the left and right tables. The matching record is retained; otherwise, the matching record is not retained.
2. Left /right join
There are two types of external connections:
-
Left join: left outer join (left join), where the left table is the primary table.
-
Right join: right outer join (right join), with the right table as the primary table;
Take a certain table as the main table, fetch all records inside, no matter whether the condition can match, the main table will be retained, and then connect with another table, if not match, other table fields are NULL.
The subquery
A query is performed on top of a query result, that is, a SELECT statement contains another SELECT statement.
According to the location of the sub-query, it can be divided into:
-
From subquery: the subquery follows From;
-
Where subquery: subquery Where condition;
-
Exists subquery: a subquery occurs in exists.
Here are a few examples:
Find all employees whose department name prefix is “xiaomi” :
SELECT name, sex, sal FROM emp WHERE no in (SELECT no FROM dept WHERE name LIKE '%');
Copy the code
View the salaries of all employees and sort them by salary:
SELECT name, sal FROM (SELECT name, sal FROM emp ORDER BY sal);
Copy the code
The joint query
The query will be repeated, the result will be concatenated, the number of fields will not increase, each SELECT statement must obtain the same number of fields.
The syntax is as follows:
Select statement 1Union [union option]Select statement 2...
Copy the code
The Union options:
-
All: keep All;
-
Distinct: deduplication, default option.
Write much again, add one more, medium to be continued…
Reference article:
-
MySQL Optimization Principle
-
Step by step: SQL parsing order
END
If you feel that you have gained something after reading it, please click “like”, “follow” and add the official account “Ingenuity Zero” to check out more wonderful history!!