This is the seventh day of my August challenge

Today, I received a SQL from the operation student, which was a little complicated, especially the EXPLAIN of this SQL could not be executed for a long time, so our background team helped to solve this SQL problem, but it happened to find a deep hidden online problem.

select a.share_code, a.generated_time, a.share_user_id, b.user_count, b.order_count, a.share_order_id, b.rewarded_amount from t_risk_share_code a, (select count(distinct r.user_id) user_count, count(distinct r.order_id) order_count, s.rewarded_amount, R.share_code from t_order s,t_order_rel r where R.oreder_id = s.id and R.ype = 1 and R.share_code = 'I just shared order code' group by r.share_code) b where a.share_code = b.share_code and a.type = 1Copy the code

First, we find that directly explaining the SQL is also slow, which means that some subqueries may actually be executed. Therefore, the first step is to break down the subqueries and analyze them step by step, namely:

select count(distinct r.user_id) user_count, count(distinct r.order_id) order_count, max(s.rewarded_amount), R.share_code from t_order s,t_order_rel r where R.oreder_id = s.id and R.ype = 1 and R.share_code = 'I just shared order code' group by r.share_codeCopy the code

EXPLAIN SQL, execute quickly, we find the result is:

T_order = full scan; The primary key of this table is id.

According to the official document, there are several reasons

  1. The table is too small to be indexed. But these two tables here are very large, they’re tens of millions of data.
  2. There is no proper index for WHERE or ON conditions. This is not the case here. Both tables have appropriate indexes for WHERE and ON conditions. SQL = ‘JOIN ON + WHERE’;
  3. Using the index columns and compared them to a constant value, MYSQL by index analysis of the covered most of the values in the table, it is to analyze the accuracy of line finally data pull back to the table, table most pages in the file to be loaded into the memory read, so rather than index first loaded into memory gets hit, than scanning the entire table directly, You end up loading most of the pages in the table’s file into memory anyway. In this case, obviously, it would be faster not to go through the index. T_order_rel is a 1 to many relationship between t_order_rel and t_order_rel, so there is not much data to be hit.
  4. The Cardinality of this column is too low, the Cardinality is the number of different values divided by the number of rows, which is at most one. However, this value is not calculated in real time by the innoDB engine and may not be accurate (especially if the column value is updated and the row position changes in the page). But for distinct or primary key columns there is no calculation, just 1. If the dispersion is too low, then you’re going to hit too many rows, just like in the third case. The SQL we are optimizing here uses the primary key, so this is not the case.

While none of the above will be discussed here, here are a few optimizations we made to avoid full scanning:

  1. To make the SQL execution plan analyzer more accurate, for the fourth case, we might need to execute some tables on a regular basis during slack hoursANALYZE TABLETo ensure the accuracy of the analyzer’s statistics.
  2. In OLTP queries, force index is used to force some indexes to be executed, because the database SQL execution plan is not perfect and there are some index errors. This is a pit we often encounter when using middleware based catalogs such as Sharding-JDBC or native distributed databases such as TiDB.
  3. For MySQL, we set — max-max-for-key = 10000 (the default value is very large), which actually limits the number of rows that may be scanned by the analyzer of SQL execution plan each time. Its principle is very simple, reference source:

sql_planner.cc

double find_cost_for_ref(const THD *thd, TABLE *table, unsigned keyno, double num_rows, Double worst_seeking) {// compare the number of rows to be examined with max_seeks_for_key, take the smaller one // Max_seeks_for_key num_rows = STD ::min(num_rows, double(THD ->variables. if (table->covering_keys.is_set(keyno)) { // We can use only index tree const Cost_estimate index_read_cost = table->file->index_scan_cost(keyno, 1, num_rows); return index_read_cost.total_cost(); } else if (keyno == table->s->primary_key && table->file->primary_key_is_clustered()) { const Cost_estimate table_read_cost = table->file->read_cost(keyno, 1, num_rows); return table_read_cost.total_cost(); } else return min(table->cost_model()->page_read_cost(num_rows), worst_seeks); }Copy the code

This cannot be set too small, otherwise it will appear that the index that can walk multiple indexes but walks the most rows actually scanned.

Now there is no way, EXPLAIN is not enough, we can only further turn to optimizer_trace. The reason for not using Optimizer_Trace directly is that optimizer_Trace must execute the ENTIRE SQL to get all the useful information.

Set session optimizer_trace="enabled=on"; ## SQL select..... SELECT * FROM information_schema.OPTIMIZER_TRACE;Copy the code

We trace that the actual SQL is:

SELECT * FROM 't_order_rel' WHERE (' r '. 'order_id' = CONVERT (' s'. 'id' USING UTf8MB4)) SELECT * FROM 't_order_rel' WHERE (' r '. 'order_id' = CONVERT (' s'. AND ( `r`.`type` = 1 ) AND ( `r`.`share_code` = 'B2MTB6C' ) )Copy the code

I go, the original two tables of field coding is not the same! When JOIN ON, set a layer of encoding CONVERT (S.idusing UTF8MB4)). Date (create_time) < “2021-8-1” is not allowed, but create_time < “2021-8-1” is. Comparisons between columns of different types also do not go to the index, because MySQL automatically uses a type conversion function. This is also a common misuse of MySQL’s syntax sugar.

The default encoding for this t_ORDER_rel table is different from other tables. The default encoding for the entire table is utF8MB4 because some fields use emojis. Moreover, this table is only used to record the use, there is no OLTP business, only some operating students used OLAP scenarios. So this problem has not been found.

After changing the field encoding, SQL is finally not fully scanned. Meanwhile, in the future, pay attention to:

  1. The database specifies the default encoding, the table no longer specifies the default encoding, and the encoding is specified for fields that require a special encoding
  2. When join where, pay attention to whether the types on both sides of compare are the same

Wechat search “my programming cat” to follow the public account, every day, easy to improve technology, win a variety of offers: