1. What is the experience of slow MySQL query? Most Internet application scenarios are read more than write, and business logic is more distributed in writing. The requirement for reading is probably to be fast. So what are the reasons for a good slow query?
1.1 Index When the amount of data is not very large, most slow queries can be solved by indexes. Most slow queries are also caused by unreasonable indexes.
MySQL index is based on B+ tree. MySQL index is based on B+ tree. MySQL index is based on B+ tree.
When it comes to the left-most prefix, it is actually the use rule of composite index. Using reasonable composite index can effectively improve the query speed. Why?
Because the index pushes down. If the query condition is included in the combined index, for example, the combined index (A, B), the system directly checks whether the record that meets the requirements of A meets the requirements of B in the index to reduce the number of table returns.
At the same time, if the column in the query happens to be contained in the composite index, it is an overwrite index and does not need to return to the table. Index rules are probably known and will be created and used in real development. The question may be more: why is it slow to build an index?
1.1.1 What causes the index failure? It is slow to build an index. Most indexes are invalid (unused) and can be analyzed by Explain. Common causes of index failure are as follows:
Where! = or <> or or expression or function (left)
The like statement begins with %
String without ‘ ‘
Index field distinction is too low, such as gender
The leftmost prefix is not matched
Mature MySQL has its own ideas as to why these practices fail.
If MySQL were to give a reason, it would still be the same B+ tree.
Function operation
When using an expression or function on the left side of the query where =, for example, where A is A string with an index, and where length(A) = 6, pass an index tree from 6 to A, and you can imagine getting lost at the first level of the tree.
Implicit conversion
Implicit type conversions and implicit character encoding conversions can also cause this problem.
Implicit conversions do not normally occur in frameworks like JOOQ.
Implicit character encoding conversions can occur in a concatenated table query, where the concatenated table fields have the same type but different character encodings.
It destroys order
MySQL > select * from ‘%’; MySQL > select * from ‘%’; MySQL > select * from ‘%’;
However, for fields with too little differentiation, such as gender, the index fails for this reason.
1.1.3 Why Not Add Indexes to Gender Fields Why Not Add indexes to fields with low index differentiation? Blind guessing is inefficient, and it is inefficient, and sometimes it even amounts to nothing.
For non-clustered indexes, the table is called back. If there are 100 rows, index the sex field and scan 51 male rows, then go back to the table and scan 51 rows. Might as well just run a full table scan.
So, InnoDB engine will not use index for this kind of scenario, and it will not use index for a certain type of data when it is about 30% of the total. If you are interested, you can try it.
1.1.4 What is easy to use and simple index method? Most of the slow queries are from the index, how to build and use the index. Here are some simple rules.
Index push-down: What if the gender field is not suitable for indexing, but there are query scenarios? If it is a multi-condition query, you can create a joint index to optimize the feature.
Overwrite index: also a federated index, the information required by the query is already contained in the index, and will not be returned to the table.
Prefix indexes: For strings, you can add indexes only to the first N bits to avoid unnecessary overhead. If you really need such a keyword query, it may be better to hand over to a more appropriate such as ES.
Do not operate on index fields
Index maintenance costs should be considered for certain, over-write, under-read tables, or frequently updated fields.
Sometimes you create an index that looks right at first glance, but things don’t go according to plan. As in “why does XXX have an index, query by it or slow query”.
This is the moment to be confident: my code can’t be buggy, it must be MySQL. MySQL might have a problem.
This is often the case when you build a bunch of indexes and query criteria. Instead of using the one you want it to use, you choose a less differentiated one, resulting in too many scans. There are basically two reasons for this:
If the statistics are incorrect, you can use the Analyze Table X to analyze again. Optimizer error: force index can be enforced. Or modify the statement bootstrap optimizer to add or remove index bypasses. But in my humble experience, it’s more likely that you created unnecessary indexes. Does anyone really think MySQL isn’t as smart as it is?
In addition to the above index reasons, there are also the following reasons that are less common or difficult to judge.
SQL > alter table read lock (MDL); SQL > alter table read lock (MDL); Add MDL write locks to table structures when changes are made. Read-write locks and write locks are mutually exclusive.
Waiting for table metadata lock Waiting for table metadata lock Waiting for table metadata lock Waiting for table metadata lock Waiting for table metadata lock Waiting for table metadata lock Waiting for table metadata lock
This is because the flush command is blocked by another statement, which blocks select. Running the show processlist command, you can find that the state is Waiting for table flush.
1.4 Equal line lock Something holds a write lock that is not committed.
1.5 Current Read InnoDB default level is repeatable. Imagine A scenario where transaction A starts A transaction and transaction B starts performing A large number of updates. B commits first, A is the current read, and undo log is executed successively until the value before transaction B is found.
1.6 Scenario of Large Tables In MYSQL without secondary development, hundreds of millions of tables must be considered as large tables. In this case, even if indexes and queries are well implemented, I/O or CPU bottlenecks may occur in the face of frequent aggregation operations, and the efficiency of simple queries will decrease.
In addition, Innodb has a storage capacity of 16 KB per B+ tree node, which can theoretically store about 2kw rows. At this time, the tree height is 3 layers. We know that Innodb_buffer_pool is used to cache tables and indexes. If the index data is large, the cache hit ratio is worrying. Meanwhile, Innodb_buffer_pool uses LRU algorithm for page culling. Queries on old or non-hot data can crowd out hot data.
Therefore, the common optimization for large tables is the separation of library table and read and write.
1.6.1 Database and table scheme
Is it separate database or separate table? It depends on specific analysis.
If the disk or network has IO bottlenecks, split libraries and vertical split tables are required. If the CPU bottleneck occurs, the query efficiency is low. Horizontal means to shard data and spread the original data into more database tables. Vertical means shard the library by business and the table by field.
Tools include Sharding-Sphere, TDDL and Mycat. To start work, we need to first evaluate the number of branches and tables, formulate sharding rules and select keys, then develop and migrate data, and consider capacity expansion.
The problem
In the actual operation, the write problem is not big, the main problems are unique ID generation, non-partition key query, capacity expansion.
There are many unique ID methods, such as DB increment, Snowflake, number segment, a large number of GUID algorithms, etc. Non-partition key query is usually resolved by mapping tables to overwrite indexes. Or you can combine it with other DB’s. Expansion depends on the sharding strategy. Range sharding is easy, while random modulus sharding migrates data. We can also use the mode sharding of range + module, module first and then range, which can avoid some degree of data migration. Of course, there are also issues such as transaction consistency and cross-library join.
1.6.2 Read/Write Separation Why Do I separate read/Write
Separate table for large table to solve the CPU bottleneck, separate library to solve the IO bottleneck, the two will solve the storage pressure. But the query is not.
If the QPS of DB is still very high and the read is much larger than the write, read/write separation can be considered. Based on the master-slave mode, the read load can be shared to avoid high single-machine load and ensure high availability to achieve load balancing.
The problem
The main problems are overdue reads and allocation mechanisms.
Expired read, which is the master-slave delay problem, this is for. Allocation mechanism, whether to go master or slave library. You can switch directly in your code based on the statement type or use middleware.
1.7 summary
The above lists common causes and solutions for slow query in MySQL and describes common methods to deal with large data scenarios.
Database table and read/write separation are designed for big data or concurrent scenarios, and also to improve system stability and scalability. But not all problems are best solved this way.
You can use ES to query for ElasticSearch. So let’s talk about ES.
ES is a near real-time distributed search engine based on Lucene. The application scenarios include full-text search, NoSQL Json document database, monitoring logs, and data collection and analysis.
For non-data development, full-text search and logging are commonly used. In the use of ES, it is often combined with Logstash and Kibana, also known as ELK. Let’s see how the log works.
The following is a search operation of our company’s log system: Open Kibana and enter the format of “XXX” on Discover page to query.
This action can be replaced in the Dev Tools console with:
GET yourIndex/_search
{
"from" : 0, "size" : 10,
"query" : {
"match_phrase" : {
"log" : "xxx"
}
}
}
Copy the code
What do you mean? The addition of “” in Discover and match_phrase in console both indicate that this is a phrase match, meaning that only documents containing all search terms are kept in the same location as the search terms.
Index -> Document = Index -> Type -> Document = database-table-id = Index -> Document = Index -> Type -> Document = Index -> Document = Index -> Type -> Document After 7.0, Type was deprecated and index was used as a table for now.
You can view some basic information on the Dev Tools Console using the following command. You can also use the CRUl command instead.
GET /_cat/health? V&pretty: Check cluster health status GET /_cat/shards? V: View the fragment status. GET yourindex/_mapping :index mapping structure GET yourindex/_settings :index setting structure GET /_cat/indices? V: Displays all indexes of the current nodeCopy the code
The emphasis is on mapping and setting. Mapping can be understood as the structure definition of MySQL table, and setting is responsible for controlling such as the number of fragments and the number of copies.
Here is a partial mapping structure for a log index. ES defaults to text for the string type and defines a subfield called keyword for it. The difference is that text does the word segmentation, while keyword does not.
"******": {
"mappings": {
"doc": {
"properties": {
"appname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
Copy the code
2.3 Why is ES query fast? What do participles mean? Look at the indexing principle of ES and you’ll get it.
ES is based on an inverted index. It mean? Traditional indexes are generally indexed by document ID and recorded by content. Reverse indexing Instead, find the corresponding row based on the existing attribute value, that is, use the word or content as the index and the document ID as the record.
Below is a schematic of the ES inverted index, consisting of Term Index, Team Dictionary, and Posting List.
Ada and Sara in the picture are called term, which is actually the word after the participle. If you remove Term Index from the graph, doesn’t it look a bit like MySQL? Term Dictionaries are like secondary indexes, but MySQL is kept on disk, and retrieving a Term requires several Random Access disk operations.
On the basis of Term Dictionary, ES has a layer of Term Index, which is stored in the memory in the form of FST and the prefix of Term, so that the offset of the current Term in Term Dictionary can be quickly located. Moreover, both FST and Term Dictionary block storage save memory and disk space.
Now you can see why it’s fast, because you have Term Index in memory, which is the Index of Term, Term Dictionary.
However, it is not true that ES is faster than MySQL in all queries. Searches fall into two broad categories.
2.3.1 The index of ES after segmentation stores the results after segmentation sorting. For example, Ada, in MySQL, %da% sweeps the entire table, but for ES, it can be quickly located
2.3.2 In fact, there is little difference in the exact retrieval of this situation, because the advantage of Term Index is gone, but it is necessary to find the position in the Term Dictionary. Maybe it’s a little faster because MySQL overwrites the index without returning to the table.
2.4 When to Use ES As mentioned earlier, when to use ES for query scenarios in the service? I think there are two kinds.
Fuzzy query of string types based on keywords is a disaster in MySQL, but it is a piece of cake for ES. Specific scenarios, such as fuzzy query of message contents by message tables, that is, query of chat records.
Note, however, that if you need a keyword query like a broad search engine rather than a phrase matching query like a log, you need word segmentation for Chinese, the most widely used is IK. Ik segmentation installation is not detailed here.
What does that mean?
participles
An initial query of the log, typing “I’m such a smart cookie,” only gets an exact match.
And if you remove “”, you will get according to” I “, “can”, “true”… . All the information matched by the word segmentation, which will obviously return a lot of information, is not consistent with Chinese semantics. The actual expected word segmentation effect is probably “I”, “can”, “really”, “smart ghost”, and then according to this word segmentation result to match the query.
This is caused by the fact that the default segmentation strategy of ES is not friendly to The support of Chinese, which is based on the letters of English words, but there are Spaces between English words. This is also one of the reasons why many foreign software Chinese search effect is not nice.
For this problem, you can use the command below console to test the word segmentation effect of the current index.
POST yourindex/_analyze {"field":"yourfield", "text":" I am really smart "}Copy the code
2.4.2 Combined Query If the amount of data is large enough and the table fields are large enough. It does not make sense to throw all field information into ES to create an index. If you use MySQL, you can only separate libraries from tables and reads and writes. Why don’t you put it together.
MySQL > select * from ‘ES’; select * from ‘ES’; Put the full information into MySQL for quick retrieval by ID.
In ES + HBASE, if you want to eliminate the need for databases and tables, you may abandon MySQL and choose a distributed database, such as HBASE. For such NOSQL, it has massive storage capacity, easy expansion, and fast query based on rowkey.
The above idea is the classic index and data store isolation scheme.
Of course, the bigger the business, the more prone to accidents, will also face more problems. With ES as the index layer, data synchronization, timing, mapping design, high availability, and more need to be considered.
After all, compared to a pure logging system, logs can wait, but users cannot.
This section briefly introduces why ES is fast and where this fast can be used. Now you can open the Kibana console and give it a try.
If you want to access Java projects, with SpringBoot support, in the ES environment under the premise of OK, completely out of the box, just a dependency. Basic CRUD support is perfectly OK.
3. HBASE HBASE is mentioned before. What is HBASE?
3.1 Storage Structure Relational databases such as MySQL are row based.
The following figure shows the HBASE table model structure.
Row key is the primary key and is sorted lexicographically. TimeStamp is the version number. Info and area are column families that slice the table horizontally. Name and age belong to a column cluster and can be dynamically added. A Cell is a specific Value.
3.2 OLTP and OLAP data processing can be roughly divided into two categories: ON-LINE Transaction Processing (OLTP) and On-Line Analytical processing (OLAP).
OLTP is the main application of traditional relational database, mainly for basic, daily transaction processing.
OLAP is the main application of data warehouse system. It supports complex analysis, focuses on decision support, and provides intuitive and understandable query results. Column-oriented is suitable for OLAP, and row-oriented is suitable for online transaction processing (OLTP). However, HBASE is not OLAP. It has no transaction and is actually CF oriented. Not many people use HBASE for OLAP.
3.3 RowKey HBASE table design depends on RowKey design. This is because HBASE supports only three query methods
1. Rowkey-based single row query 2. Rowkey-based range scan 3
HBASE does not support complex queries.
3.4 Application Scenarios HBASE is not suitable for quick query in real time. It is more suitable for write-intensive scenarios, with fast write capabilities, and queries are OK for single or small area queries, of course, based on rowkeys only. But its performance and reliability are very high and there is no single point of failure.
4. In my opinion, software development is progressive, technology serves the project, and appropriateness is more important than novelty and complexity.
How to complete a quick query? The best thing to do is to find your own bugs, solve the current problem and create new problems.
Most of the schemes listed in this paper have been briefly mentioned for the specific implementation. In fact, both the MySQL sub-table and the ES business integration will face many details and difficult problems, and those engaged in engineering should always know this and practice it.