How to complete a quick query

This is the 20th day of my participation in the August More Text Challenge

Who doesn’t want to complete a quick query?

1. What is the experience of slow MySQL query?

Most Internet application scenarios are read more than write, and business logic is more distributed in writing. The requirement for reading is probably to be fast. So what are the reasons for a good slow query?

1.1 the index

When the amount of data is not very large, most of the slow queries can be solved by indexes, and most of the slow queries are also caused by unreasonable indexes.

MySQL index is based on B+ tree. MySQL index is based on B+ tree. MySQL index is based on B+ tree.

When it comes to the left-most prefix, it is actually the use rule of composite index. Using reasonable composite index can effectively improve the query speed. Why?

Because the index pushes down. If the query condition is included in the combined index, for example, the combined index (A, B), the system directly checks whether the record that meets the requirements of A meets the requirements of B in the index to reduce the number of table returns. At the same time, if the column in the query happens to be contained in the composite index, it is an overwrite index and does not need to return to the table. Index rules are probably known and will be created and used in real development. The question may be more: why is it slow to build an index?

1.1.1 What Causes Index Failure

Build index is also slow, mostly index failure (unused), available to explain analysis. Common causes of index failure are as follows:

Where! = or <> or or expression or function (left)
The like statement begins with %
String without ‘ ‘
Index field distinction is too low, such as gender
The leftmost prefix is not matched

Mature MySQL has its own ideas as to why these practices fail.

1.1.2 Why do these Reasons cause Index Failure

If YOU want MySQL to give you a reason, it’s the same B+ tree.

Function operation

When using an expression or function on the left side of the query where =, for example, where A is A string with an index, and where length(A) = 6, pass an index tree from 6 to A, and you can imagine getting lost at the first level of the tree.

Implicit conversion

Implicit type conversions and implicit character encoding conversions can also cause this problem.

Implicit conversions do not normally occur in frameworks like JOOQ.
Implicit character encoding conversions can occur in a concatenated table query, where the concatenated table fields have the same type but different character encodings.

It destroys order

MySQL > select * from ‘%’; MySQL > select * from ‘%’; MySQL > select * from ‘%’;

However, for fields with too little differentiation, such as gender, the index fails for this reason.

1.1.3 Why not index the gender field

Why not add indexes to fields with low index differentiation? Blind guessing is inefficient, and it is inefficient, and sometimes it even amounts to nothing.

For non-clustered indexes, the table is called back. If there are 100 rows, index the sex field and scan 51 male rows, then go back to the table and scan 51 rows. Might as well just run a full table scan.

So, InnoDB engine will not use index for this kind of scenario, and it will not use index for a certain type of data when it is about 30% of the total. If you are interested, you can try it.

1.1.4 Is there a useful and simple index method

Most slow queries originate from indexes, how to create and use good indexes. Here are some simple rules.

Index push-down: What if the gender field is not suitable for indexing, but there are query scenarios? If it is a multi-condition query, you can create a joint index to optimize the feature.
Overwrite index: also a federated index, the information required by the query is already contained in the index, and will not be returned to the table.
Prefix indexes: For strings, you can add indexes only to the first N bits to avoid unnecessary overhead. If you really need such a keyword query, it may be better to hand over to a more appropriate such as ES.
Do not operate on index fields
Index maintenance costs should be considered for certain, over-write, under-read tables, or frequently updated fields.

1.1.5 What can I do if the MySQL database selects the wrong index

Sometimes, indexes are set up that seem right at first glance, but things don’t go according to plan. As in “why does XXX have an index, query by it or slow query”. Top architects reply “Clean architecture” on the official account to get a surprise package.

This is the moment to be confident: my code can’t be buggy, it must be MySQL. MySQL might have a problem.

This is often the case when you build a bunch of indexes and query criteria. Instead of using the one you want it to use, you choose a less differentiated one, resulting in too many scans. There are basically two reasons for this:

If the statistics are incorrect, you can use the Analyze Table X to analyze again.
Optimizer error: force index can be enforced. Or modify the statement bootstrap optimizer to add or remove index bypasses.

But in my humble experience, it’s more likely that you created unnecessary indexes. Does anyone really think MySQL isn’t as smart as it is?

In addition to the above index reasons, there are also the following reasons that are less common or difficult to judge.

1.2 MDL lock, etc

In MySQL 5.5, MDL is introduced. When performing CRUD operations on a table, MDL lock is automatically added. Add MDL write locks to table structures when changes are made. Read-write locks and write locks are mutually exclusive.

Waiting for table metadata lock Waiting for table metadata lock Waiting for table metadata lock Waiting for table metadata lock Waiting for table metadata lock Waiting for table metadata lock Waiting for table metadata lock

1.3 flush

Flush is fast, mostly because the flush command was blocked by other statements, which blocked select. Running the show processlist command, you can find that the state is Waiting for table flush.

1.4 row locks, etc

Something holds a write lock that is not committed.

1.5 the current reading

InnoDB’s default level is repeatable. Imagine A scenario where transaction A starts A transaction and transaction B starts performing A large number of updates. B commits first, A is the current read, and undo log is executed successively until the value before transaction B is found.

1.6 Large table scenario

In MYSQL without secondary development, hundreds of millions of tables must be considered as large tables. Even if indexes and queries are well implemented, IO or CPU bottlenecks may occur in the face of frequent aggregation operations, and even simple queries will reduce efficiency.

In addition, Innodb has a storage capacity of 16 KB per B+ tree node, which can theoretically store about 2kw rows. At this time, the tree height is 3 layers. We know that Innodb_buffer_pool is used to cache tables and indexes. If the index data is large, the cache hit ratio is worrying. Meanwhile, Innodb_buffer_pool uses LRU algorithm for page culling. Queries on old or non-hot data can crowd out hot data.

Therefore, the common optimization for large tables is the separation of library table and read and write.

1.6.1 Database and table

plan

Is it separate database or separate table? It depends on specific analysis.

If the disk or network has IO bottlenecks, split libraries and vertical split tables are required.
If the CPU bottleneck occurs, the query efficiency is low.

Horizontal means to shard data and spread the original data into more database tables.

Vertical means shard the library by business and the table by field.

Tools include Sharding-Sphere, TDDL and Mycat. To start work, we need to first evaluate the number of branches and tables, formulate sharding rules and select keys, then develop and migrate data, and consider capacity expansion.

The problem

In the actual operation, the write problem is not big, the main problems are unique ID generation, non-partition key query, capacity expansion.

There are many unique ID methods, such as DB increment, Snowflake, number segment, a large number of GUID algorithms, etc.
Non-partition key query is usually resolved by mapping tables to overwrite indexes. Or you can combine it with other DB’s.
Expansion depends on the sharding strategy. Range sharding is easy, while random modulus sharding migrates data. We can also use the mode sharding of range + module, module first and then range, which can avoid some degree of data migration.

Of course, there are also issues such as transaction consistency and cross-library join.

1.6.2 Read/write Separation

Why read-write separation

Separate table for large table to solve the CPU bottleneck, separate library to solve the IO bottleneck, the two will solve the storage pressure. But the query is not.

If the QPS of DB is still very high and the read is much larger than the write, read/write separation can be considered. Based on the master-slave mode, the read load can be shared to avoid high single-machine load and ensure high availability to achieve load balancing.

The problem

The main problems are overdue reads and allocation mechanisms.

Expired read, which is the master-slave delay problem, this is for.
Allocation mechanism, whether to go master or slave library. You can switch directly in your code based on the statement type or use middleware.

1.7 summary

The above lists common causes and solutions for slow query in MySQL and describes common methods to deal with large data scenarios.

Database table and read/write separation are designed for big data or concurrent scenarios, and also to improve system stability and scalability. But not all problems are best solved this way.

2. How to evaluate ElasticSearch

As mentioned earlier, you can use ES for keyword queries. So let’s talk about ES.

2.1 What can BE Done

ES is a near real-time distributed search engine based on Lucene. The application scenarios include full-text search, NoSQL Json document database, monitoring logs, and data collection and analysis.

For non-data development, full-text search and logging are commonly used. In the use of ES, it is often combined with Logstash and Kibana, also known as ELK. Let’s see how the log works.

The following is a search operation of our company’s log system: Open Kibana and enter the format of “XXX” on Discover page to query.

This action can be replaced in the Dev Tools console with:

GET yourIndex/_search  
{    
 "from" : 0, "size" : 10,  
 "query" : {  
        "match_phrase" : {  
            "log" : "xxx"       
        }  
    }  
}  
Copy the code

What do you mean? The addition of “” in Discover and match_phrase in console both indicate that this is a phrase match, meaning that only documents containing all search terms are kept in the same location as the search terms.

2.2 Structure of ES

Before ES 7.0, the Index -> Type -> Document storage structure was database-table-id. After 7.0, Type was deprecated and index was used as a table for now.

You can view some basic information on the Dev Tools Console using the following command. You can also use the CRUl command instead.

GET /_cat/health? V&pretty: View the cluster health status
GET /_cat/shards? V: View the fragment status
GET yourindex/_mapping: indicates the index mapping structure
GET yourIndex /_settings :index setting
GET /_cat/indices? V: Displays all indexes of the current node

The emphasis is on mapping and setting. Mapping can be understood as the structure definition of MySQL table, and setting is responsible for controlling such as the number of fragments and the number of copies.

Here is a partial mapping structure for a log index. ES defaults to text for the string type and defines a subfield called keyword for it. The difference is that text does the word segmentation, while keyword does not.

"******": {  
    "mappings": {  
      "doc": {  
        "properties": {  
          "appname": {  
            "type": "text",  
            "fields": {  
              "keyword": {  
                "type": "keyword",  
                "ignore_above": 256  
              }  
            }  
Copy the code

2.3 Why is ES query fast?

What do participles mean? Look at the indexing principle of ES and you’ll get it.

ES is based on an inverted index. It mean? Traditional indexes are generally indexed by document ID and recorded by content. Reverse indexing Instead, find the corresponding row based on the existing attribute value, that is, use the word or content as the index and the document ID as the record. Reply “Java” in the background of the programming circle of the public account to get the Surprise gift package of Java interview questions and answers.

Below is a schematic of the ES inverted index, consisting of Term Index, Team Dictionary, and Posting List.

Ada and Sara in the picture are called term, which is actually the word after the participle. If you remove Term Index from the graph, doesn’t it look a bit like MySQL? Term Dictionaries are like secondary indexes, but MySQL is kept on disk, and retrieving a Term requires several Random Access disk operations.

On the basis of Term Dictionary, ES has a layer of Term Index, which is stored in the memory in the form of FST and the prefix of Term, so that the offset of the current Term in Term Dictionary can be quickly located. Moreover, both FST and Term Dictionary block storage save memory and disk space.

Now you can see why it’s fast, because you have Term Index in memory, which is the Index of Term, Term Dictionary.

However, it is not true that ES is faster than MySQL in all queries. Searches fall into two broad categories.

2.3.1 Post-word segmentation retrieval

The index of ES stores the result of segmentation sort. For example, Ada, in MySQL, %da% sweeps the entire table, but for ES, it can be quickly located

2.3.2 Accurate retrieval

In fact, this situation is not very different, because the advantage of Term Index is gone, but you need to use it to find your position in the Term Dictionary. Maybe it’s a little faster because MySQL overwrites the index without returning to the table.

2.4 When to use ES

As mentioned earlier, when is it appropriate to use ES for query scenarios in a business? I think there are two kinds.

2.4.1 Full-text retrieval

Fuzzy lookup of string types based on keywords is a disaster in MySQL, but a piece of cake for ES. Specific scenarios, such as fuzzy query of message contents by message tables, that is, query of chat records.

Note, however, that if you need a keyword query like a broad search engine rather than a phrase matching query like a log, you need word segmentation for Chinese, the most widely used is IK. Ik segmentation installation is not detailed here.

What does that mean?

participles

An initial query of the log, typing “I’m such a smart cookie,” only gets an exact match.

And if you remove “”, you will get according to” I “, “can”, “true”… . All the information matched by the word segmentation, which will obviously return a lot of information, is not consistent with Chinese semantics. The actual expected word segmentation effect is probably “I”, “can”, “really”, “smart ghost”, and then according to this word segmentation result to match the query.

This is caused by the fact that the default segmentation strategy of ES is not friendly to The support of Chinese, which is based on the letters of English words, but there are Spaces between English words. This is also one of the reasons why many foreign software Chinese search effect is not nice.

For this problem, you can use the command below console to test the word segmentation effect of the current index.

POST yourindex/_analyze {"field":"yourfield", "text":" I am really smart "}Copy the code

2.4.2 Combination Query

If the amount of data is large enough, the table fields are large enough. It does not make sense to throw all field information into ES to create an index. If you use MySQL, you can only separate libraries from tables and reads and writes. Why don’t you put it together.

1. ES + MySQL

Add the id to the field information of the query and put it in ES to make a word segmentation. Put the full information into MySQL for quick retrieval by ID.

2. ES + HBASE

If you want to eliminate the need for separate databases and tables, you might want to ditch MySQL and choose a distributed database, such as HBASE. For NOSQL, the storage capacity is massive, expansion is easy, and query by rowkey is quick.

The above idea is the classic index and data store isolation scheme.

Of course, the bigger the business, the more prone to accidents, will also face more problems. With ES as the index layer, data synchronization, timing, mapping design, high availability, and more need to be considered.

After all, compared to a pure logging system, logs can wait, but users cannot.

2.5 summary

This section is a brief introduction to why ES is fast and where this fast can be used. Now you can open the Kibana console and give it a try.

If you want to access Java projects, with SpringBoot support, in the ES environment under the premise of OK, completely out of the box, just a dependency. Basic CRUD support is perfectly OK.

3. HBASE

HBASE has been mentioned before. What is HBASE?

3.1 Storage Structure

Relational databases such as MySQL are row based.

The name

Primary school

Middle school

The university of

Li mou

XX primary school

YY middle school

NULL

HBASE is in columns (actually column families). In column storage, the table becomes:

The name

School name

Li mou

XX primary school

Li mou

YY middle school

The following figure shows the HBASE table model structure.

Row key is the primary key and is sorted lexicographically. TimeStamp is the version number. Info and area are column families that slice the table horizontally. Name and age belong to a column cluster and can be dynamically added. A Cell is a specific Value.

3.2 OLTP and OLAP

Data processing can be roughly divided into two categories: ON-LINE Transaction Processing (OLTP) and On-Line Analytical processing (OLAP).

OLTP is the main application of traditional relational database, mainly for basic, daily transaction processing.
OLAP is the main application of data warehouse system. It supports complex analysis, focuses on decision support, and provides intuitive and understandable query results.

Column-oriented is suitable for OLAP, and row-oriented is suitable for online transaction processing (OLTP). However, HBASE is not OLAP. It has no transaction and is actually CF oriented. Not many people use HBASE for OLAP.

3.3 RowKey

The HBASE table design depends on RowKey design. This is because HBASE supports only three query methods

1. Rowkey-based single row query 2. Rowkey-based range scan 3

HBASE does not support complex queries.

3.4 Application Scenarios

HBASE is not suitable for quick query in real time. It is more suitable for write-intensive scenarios, with fast write capabilities, and queries are OK for single or small area queries, of course, based on rowkeys only. But its performance and reliability are very high and there is no single point of failure.

4. To summarize

Personally, software development is progressive, technology serves the project, and fit is more important than novelty and complexity.

How to complete a quick query? The best thing to do is to find your own bugs, solve the current problem and create new problems.

Most of the schemes listed in this paper have been briefly mentioned for the specific implementation. In fact, both the MySQL sub-table and the ES business integration will face many details and difficult problems, and those engaged in engineering should always know this and practice it.

reference

Juejin. Im/POST /5bfe77…
Using ELK build log concentration analysis platform wsgzao. Making. IO/post/ELK /) h…
MySQL and the Lucene index analysis www.cnblogs.com/luxiaoxun/p…
HBASE deep simple www.ibm.com/developerwo…

If you like, just click “watching”, leave messages and forward moments

1. What is the experience of slow MySQL query?

1.1 the index

1.2 MDL lock, etc

1.3 flush

1.4 row locks, etc

1.5 the current reading

1.6 Large table scenario

1.7 summary

2. How to evaluate ElasticSearch

2.1 What can BE Done

2.2 Structure of ES

2.3 Why is ES query fast?

2.4 When to use ES

2.5 summary

3. HBASE

4. To summarize

reference

Related Posts

Two classic examples will give you a thorough understanding of the Java callback mechanism

Double linked lists in Go (Golang)

A daily LeetCode (37) : Sum of two numbers II – enter an ordered array