As the amount of data increases year by year and the amount of concurrency increases exponentially, SQL performance is becoming one of the primary concerns in the design and development of IT systems. SQL performance has gradually become the primary problem of database performance. 80% of database performance problems are caused by SQL. Faced with the increasing number of SQL performance problems, how to start and how to review in advance has become more and more programmers must consider the problem.
Today will bring the “SQL optimization core thought”, the author will be 8 years of full-time SQL optimization experience and experience to share with you, to uncover the mystery of SQL optimization, so that the front-line engineers in the actual development of no longer sleep and sleep, talk with fear, and finally to SQL optimization skills. First enter the topic SQL optimization must understand the concept.
The number of unique keys for a column is called the cardinality. Let’s say the gender column, there’s only men and women in this column, so this column has base 2. The cardinality of the primary key column is equal to the total number of rows in the table. The high and low cardinality affects the data distribution of the column.
For the test table, the cardinality for the OWNER column and the object_id column are shown below.
1SQL> select count(distinct owner),count(distinct object_id),count(*) from test; 2COUNT(DISTINCTOWNER) COUNT(DISTINCTOBJECT_ID) COUNT(*)3-------------------- ------------------------ ----------4 29 72462, 72462,
Copy the code
Object_id (‘ TEST ‘, ‘owner’, ‘object_id’); object_id (‘ TEST ‘, ‘owner’, ‘object_id’); object_id (‘ TEST ‘, ‘owner’, ‘object_id’); The data distribution for the OWNER column is as follows.
1SQL> select owner,count(*) from test group by owner order by 2 desc; 2OWNER COUNT(*) 3-------------------- ---------- 4SYS 30808 5PUBLIC 27699 6SYSMAN 3491 7ORDSYS 2532 8APEX_030200 2406 9MDSYS 150910XDB 84411OLAPSYS 71912SYSTEM 52913CTXSYS 36614WMSYS 31615EXFSYS 31016SH 30617ORDDATA 24818OE 12719DBSNMP 5720IX 5521HR 3422PM 2723FLOWS_FILES 1224OWBSYS_AUDIT 1225ORDPLUGINS 1026OUTLN 927BI 828SI_INFORMTN_SCHEMA 829ORACLE_OCM 830SCOTT 731APPQOSSYS 332OWBSYS 233
Copy the code
The data distribution for the owner column is extremely uneven, so we run the following SQL.
1select * from test where owner='SYS';
Copy the code
SYS has 30,808 entries and queries 30,808 entries from 72,462 entries, which means that 42.5% of the entries in the table are returned.
1SQL> select 30808/72462*100 "Percent" from dual; 2 Percent3 -- -- -- -- -- -- -- -- -- -- 442.5160774
Copy the code
Do you think the above queries should be indexed? Now let’s switch to a different query.
1select * from test where owner='SCOTT';
Copy the code
SCOTT has 7 data points, 7 data points from 72 462 data points, that is to return 0.009% of the data in the table.
1SQL> select 7/72462*100 "Percent" from dual; 2 Percent3 -- -- -- -- -- -- -- -- -- -- 4.009660236
Copy the code
Ask yourself, should you return 0.009% of the data in the table to the index?
If you’re not familiar with indexes, we’ll cover them in more detail in a later chapter. If you can’t answer the above questions, we’ll give you a heads up. If the result of a query is less than 5% of the data in the table, the index should be used. If the query result returns more than 5% of the data in the table, perform full table scan.
Of course, if you return less than 5% of the data in the table, use the index. If you return more than 5% of the data in the table, use the full table scan. The reason why we say 5% here is for fear that some beginners do not know the answer to the above question and struggle.
Now you have the following query statement.
1select * from test where owner=:B1;
Copy the code
Statement, “:B1” is the binding variable, can pass in any value, the query can go through the index or full table scan.
Now we can conclude that if a column has a low cardinality, the data distribution for that column will be very uneven, resulting in SQL queries that may go through indexes or full table scans because of the uneven data distribution for that column. Select count(*) from table group by column order by 2 desc select count(*) from table group by column order by 2 desc
If the SQL statement is single-table access, it may go through an index, a full table scan, or a materialized view scan. In the absence of materialized views, single-table access is either indexed or full table scan. Now, recall the conditions for a walk index: return less than 5% of the data in the table to walk the index, and return more than 5% to walk the full table scan. I’m sure you’ve already figured out how to optimize single table access.
Let’s look at the following query.
1select * from test where object_id=:B1;
Copy the code
Any value passed in to object_id should go through the index.
Consider the following query statement.
1select * from test where object_name=:B1;
Copy the code
No matter what value is passed to object_name, should the query go to the index?
Please check the object_name data distribution. Writing here, I actually want to change the title of this section to “Data Distribution”. We must pay attention to the column data distribution in the future work!
The ratio of the cardinality to the total number of rows multiplied by 100% is the selectivity of a particular column.
When doing SQL optimization, it doesn’t make sense to look at the cardinality of a column alone; the cardinality has to be compared to the total number of rows to make sense, which is why we introduced the concept of selectivity.
Let’s look at the cardinality and selectivity for each column of the Test table. To see the selectivity, you must first collect statistics. We’ll cover statistics in more detail in Chapter 2. The following script is used to collect statistics for the test table.
1SQL> BEGIN 2 2 DBMS_STATS.GATHER_TABLE_STATS(ownname => 'SCOTT', 3 3 tabname => 'TEST', 4 4 estimate_percent => 100, 5 5 method_opt => 'for all columns size 1', 6 6 no_invalidate => FALSE, 7 7 degree => 1, 8 8 cascade => TRUE); 9 9 END; 10 10 /11PL/SQL procedure successfully completed.
Copy the code
The following script is used to see the cardinality and selectivity for each column in the Test table.
1SQL> select a.column_name, 2 2 b.num_rows, 3 3 a.num_distinct Cardinality, 4 4 round(a.num_distinct / b.num_rows * 100, 2) selectivity, 5 5 a.histogram, 6 6 a.num_buckets 7 7 from dba_tab_col_statistics a, dba_tables b 8 8 where a.owner = b.owner 9 9 and a.table_name = b.table_name10 10 and a.owner = 'SCOTT'11 11 and a.table_name = 'TEST'; 12COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS13--------------- ---------- ----------- ----------- --------- -----------14OWNER 72462 29.04 NONE 115OBJECT_NAME 72462 44236 61.05 NONE 116SUBOBJECT_NAME 72462 106.15 NONE 117OBJECT_ID 72462 72462 100 NONE 118DATA_OBJECT_ID 72462 7608 10.5 NONE 119objtype 72462 44.06 NONE 120CREATED 72462 1366 1.89 NONE 121LAST_DDL_TIME 72462 1412 1.95 NONE 122TIMESTAMP 72462 1480 2.04 NONE 123STATUS 72462 1 0 NONE 124TEMPORARY 72462 2 0 NONE 125GENERATED 72462 2 0 NONE 126SECONDARY 72462 2 0 NONE 127NAMESPACE 72462 21 .03 NONE 128EDITION_NAME 72462 0 0 NONE 02915 rows selected.
Copy the code
Think: What columns must be indexed?
Some people say high cardinality columns, some people say columns in the WHERE condition. The answers are not perfect. How high is the base? You never know how high it is without comparing it to the total number of rows. For example, if a column has tens of thousands of rows in its cardinality, but the total number of rows is billions, is the cardinality of this column still high? This is the fundamental reason for bringing out selectivity.
When the selectivity of a column is greater than 20%, the data distribution for that column is fairly even. The selectivity of object_name and object_id in the test table is greater than 20%, and the selectivity of object_Name column is 61.05%. Now let’s look at the data distribution for this column (just print the distribution for the first 10 rows for demonstration purposes).
1SQL> select * 2 2 from (select object_name, count(*) 3 3 from test 4 4 group by object_name 5 5 order by 2 desc) 6 6 where rownum <= 10; 7OBJECT_NAME COUNT(*) 8------------------ ---------- 9COSTS 3010SALES 3011SALES_CHANNEL_BIX 2912COSTS_TIME_BIX 2913COSTS_PROD_BIX 2914SALES_TIME_BIX 2915SALES_PROMO_BIX 2916SALES_PROD_BIX 2917SALES_CUST_BIX 2918DBMS_REPCAT_AUTH 51910 rows selected.
Copy the code
As can be seen from the above query results, the data distribution of object_Name column is very balanced. Let’s query the following SQL.
1select * from test where object_name=:B1;
Copy the code
Returns up to 30 rows of data regardless of any value passed into object_name.
What columns must be indexed? When a column is listed in a WHERE condition that has no index and is more than 20% selective, the column must be indexed to improve SQL query performance. Of course, if the table has only a few hundred entries, there is no need to create an index.
This brings up the first point at the heart of SQL optimization: only large tables can cause performance problems.
Someone might say, “I have a table that is very small, with only a few hundred entries, but the table is constantly DML, causing hot blocks and performance problems.” We don’t want to talk too much about this; this is an application design problem, not SQL optimization.
Now we will share the first automatic optimization script in this article with you through experiments.
Grab the columns that must be indexed (readers are invited to modify this script appropriately for use in production).
First of all, the column must appear in the WHERE condition. How do I find which list of the table is in the WHERE condition? There are two methods, one can be through V$SQL_PLAN fetching, the other is through the following script fetching.
Perform the following stored procedure to refresh the database monitoring information.
1begin2 dbms_stats.flush_database_monitoring_info; 3end;
Copy the code
After running the above command, run the following query to see which table is listed in the WHERE condition.
1select r.name owner, 2. Name table_name, 3. Name column_name, 4 equality_preds, -- equijoin_preds, Where a.id=b.id 6 nonequijoin_preds, ---- ---- range filter counts > >= < <= between and 8 like_preds, ----LIKE 9 null_preds, ----NULL filter 10 timestamp11 from sys.col_usage$u, sys.obj$o, sys.col$c, sys.user$ r12 where o.obj# = u.obj#13 and c.obj# = u.obj#14 and c.col# = u.intcol#15 and r.name = 'SCOTT'16 and o.name = 'TEST';
Copy the code
Here are the experimental steps.
We start by running a query that lists the owner and object_id in the WHERE condition.
1SQL> select object_id, owner, object_type 2 2 from test 3 3 where owner = 'SYS' 4 4 and object_id < 100 5 5 and rownum <= 10; 6 OBJECT_ID OWNER OBJECT_TYPE 7---------- -------------------- ----------- 8 20 SYS TABLE 9 46 SYS INDEX10 28 SYS TABLE11 15 SYS TABLE12 29 SYS CLUSTER13 3 SYS INDEX14 25 SYS TABLE15 41 SYS INDEX16 54 SYS INDEX17 40 SYS INDEX1810 rows selected.
Copy the code
Then refresh the database monitoring information.
1SQL> begin2 2 dbms_stats.flush_database_monitoring_info; 3 3 end; 4 4 /5PL/SQL procedure successfully completed.
Copy the code
Then we look at what the test table is listed in the WHERE condition.
1SQL> select r.name owner, o.name table_name, c.name column_name 2 2 from sys.col_usage$ u, sys.obj$ o, sys.col$ c, sys.user$ r 3 3 where o.obj# = u.obj# 4 4 and c.obj# = u.obj# 5 5 and c.col# = u.intcol# 6 6 and r.name = 'SCOTT' 7 7 and o.name = 'TEST'; 8OWNER TABLE_NAME COLUMN_NAME 9---------- ---------- ------------------------------10SCOTT TEST OWNER11SCOTT TEST OBJECT_ID
Copy the code
Next we query for columns with selectivity greater than or equal to 20%.
1SQL> select a.owner, 2 2 a.table_name, 3 3 a.column_name, 4 4 round(a.num_distinct / b.num_rows * 100, 2) selectivity 5 5 from dba_tab_col_statistics a, dba_tables b 6 6 where a.owner = b.owner 7 7 and a.table_name = b.table_name 8 8 and a.owner = 'SCOTT' 9 9 and a.table_name = 'TEST'10 10 and a.num_distinct / b.num_rows >= 0.2;11OWNER TABLE_NAME COLUMN_NAME SELECTIVITY12---------- ---------- ------------- -----------13SCOTT TEST OBJECT_NAME 61.0514SCOTT TEST OBJECT_ID 100
Copy the code
Finally, make sure these columns are not indexed.
1SQL> select table_owner, table_name, column_name, index_name2 2 from dba_ind_columns3 3 where table_owner = 'SCOTT'4 4 and table_name = 'TEST'; 5 Unselected rows
Copy the code
By combining the above scripts, we can get fully automated optimized scripts.
1SQL> select owner, 2 2 column_name, 3 3 num_rows, 4 4 Cardinality, 5 5 selectivity, 6 6 'Need index' as notice 7 7 from (select b.owner, 8 8 a.column_name, 9 9 b.num_rows,10 10 a.num_distinct Cardinality,11 11 round(a.num_distinct / b.num_rows * 100, 2) selectivity12 12 from dba_tab_col_statistics a, dba_tables b13 13 where a.owner = b.owner14 14 and a.table_name = b.table_name15 15 and a.owner = 'SCOTT'16 16 and a.table_name = 'TEST')17 17 where selectivity >= 2018 18 and column_name not in (select column_name19 19 from dba_ind_columns20 20 where table_owner = 'SCOTT'21 21 and table_name = 'TEST')22 22 and column_name in23 23 (select c.name24 24 from sys.col_usage$ u, sys.obj$ o, sys.col$ c, sys.user$ r25 25 where o.obj# = u.obj#26 26 and c.obj# = u.obj#27 27 and c.col# = u.intcol#28 28 and r.name = 'SCOTT'29 29 and o.name = 'TEST'); 30OWNER COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY NOTICE31---------- ------------- ---------- ----------- ----------- ----------32SCOTT OBJECT_ID 72462 72462 100 Need index
Copy the code
As mentioned earlier, when a column has a low cardinality, the data distribution for that column is uneven. Imbalanced data distribution can result in a full table scan or an index scan when querying the column, which can easily lead to the wrong execution plan.
If histogram statistics are not collected for a column with a low cardinality, the cost-based optimizer (CBO) considers the column data distribution to be balanced.
Let’s take the test table as an example to explain the histogram with experiments.
The statement for all columns size 1 indicates that the histogram is not collected for all columns.
1SQL> BEGIN 2 2 DBMS_STATS.GATHER_TABLE_STATS(ownname => 'SCOTT', 3 3 tabname => 'TEST', 4 4 estimate_percent => 100, 5 5 method_opt => 'for all columns size 1', 6 6 no_invalidate => FALSE, 7 7 degree => 1, 8 8 cascade => TRUE); 9 9 END; 10 10 /11PL/SQL procedure successfully completed.
Copy the code
Histogram none indicates that no histograms have been collected.
1SQL> select a.column_name, 2 2 b.num_rows, 3 3 a.num_distinct Cardinality, 4 4 round(a.num_distinct / b.num_rows * 100, 2) selectivity, 5 5 a.histogram, 6 6 a.num_buckets 7 7 from dba_tab_col_statistics a, dba_tables b 8 8 where a.owner = b.owner 9 9 and a.table_name = b.table_name10 10 and a.owner = 'SCOTT'11 11 and a.table_name = 'TEST'; 12COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS13--------------- ---------- ----------- ----------- --------- -----------14OWNER 72462 29.04 NONE 115OBJECT_NAME 72462 44236 61.05 NONE 116SUBOBJECT_NAME 72462 106.15 NONE 117OBJECT_ID 72462 72462 100 NONE 118DATA_OBJECT_ID 72462 7608 10.5 NONE 119objtype 72462 44.06 NONE 120CREATED 72462 1366 1.89 NONE 121LAST_DDL_TIME 72462 1412 1.95 NONE 122TIMESTAMP 72462 1480 2.04 NONE 123STATUS 72462 1 0 NONE 124TEMPORARY 72462 2 0 NONE 125GENERATED 72462 2 0 NONE 126SECONDARY 72462 2 0 NONE 127NAMESPACE 72462 21 .03 NONE 128EDITION_NAME 72462 0 0 NONE 02915 rows selected.
Copy the code
The owner column has a very low base, so now let’s query the owner column.
1SQL> set autot trace 2SQL> select * from test where owner='SCOTT'; 37 rows selected. 4Execution Plan 5---------------------------------------------------------- 6Plan hash value: 1357081020 7-------------------------------------------------------------------------- 8| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 9--------------------------------------------------------------------------10| 0 | SELECT STATEMENT | | 2499 | 236K| 289 (1)| 00:00:04 |11|* 1 | TABLE ACCESS FULL| TEST | 2499 | 236K| 289 (1)| 00:00:04 |12--------------------------------------------------------------------------13Predicate Information (identified by operation id):14---------------------------------------------------15 1 - filter("OWNER"='SCOTT')
Copy the code
Note that the query owner=’SCOTT’ returned 7 Rows, but the CBO thought that the Rows were not particularly accurate because owner=’SCOTT’ returned 2 499. The owner column will now be indexed if there are 7 rows in the 72,462 rows.
1SQL> create index idx_owner on test(owner); 2Index created.
Copy the code
Let’s look it up again.
1SQL> select * from test where owner='SCOTT'; 27 rows selected. 3Execution Plan 4---------------------------------------------------------- 5Plan hash value: 3932013684 6------------------------------------------------------------------------------------- 7| Id |Operation |Name | Rows | Bytes | Cost(%CPU)| Time | 8------------------------------------------------------------------------------------- 9| 0 | SELECT STATEMENT | | 2499 | 236K | 73 (0)| 00:00:01 |10| 1 | TABLE ACCESS BY INDEX ROWID |TEST | 2499 | 236K | 73 (0)| 00:00:01 |11|* 2 | INDEX RANGE SCAN |IDX_OWNER| 2499 | | 6 (0)| 00:00:01 |12-------------------------------------------------------------------------------------13Predicate Information (identified by operation id):14---------------------------------------------------15 2 - access("OWNER"='SCOTT')
Copy the code
Now we query ‘owner=’SYS’.
1SQL> select * from test where owner='SYS'; 230808 rows selected. 3Execution Plan 4---------------------------------------------------------- 5Plan hash value: 3932013684 6------------------------------------------------------------------------------------- 7| Id |Operation | Name | Rows | Bytes | Cost(%CPU)| Time | 8------------------------------------------------------------------------------------- 9| 0 | SELECT STATEMENT | | 2499 | 236K| 73 (0)| 00:00:01 |10| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 2499 | 236K| 73 (0)| 00:00:01 |11|* 2 | INDEX RANGE SCAN | IDX_OWNER| 2499 | | 6 (0)| 00:00:01 |12-------------------------------------------------------------------------------------13Predicate Information (identified by operation id):14---------------------------------------------------15 2 - access("OWNER"='SYS')
Copy the code
Note in bold that the query owner=’SYS’ returns 30 808 entries. Can I index 30 808 entries from 72 462 entries? It’s obvious we should do a full table scan. That is to say, the execution plan is wrong.
Why is it wrong to query an execution plan with owner=’SYS’? Because the owner column has a very low base of 29, and the total number of rows in the table is 72, 462. As emphasized above, when a column does not collect histogram statistics, the CBO considers the data distribution of the column to be balanced. Precisely because CBO considers the owner column data distribution to be balanced, CBO estimates Rows will always be 2 499 no matter what value owner is equal to. So how did this 2,499 come from? Here are the answers.
1SQL> select round(72462/29) from dual; 2round(72462/29)3--------------4 2499
Copy the code
Now everyone knows that the “Rows” in the execution plan are fake. The Rows in the execution plan are calculated based on statistics and some mathematical formulas. It’s a shame that many DBAs still don’t know the truth that Rows in execution plans are fake.
When doing SQL optimization, it is often necessary to help the CBO calculate more accurate Rows. Note: We are talking about the more accurate Rows. CBO cannot obtain accurate Rows because statistics are not collected according to 100% standard sampling. Even if statistics are collected according to 100% standard sampling, the data in the table is changing at any time. In addition, the mathematical formula for calculating Rows is currently flawed, and CBO will never be able to calculate accurate Rows.
If CBO could calculate the exact Rows every time, then we would only need to worry about business logic, table design, SQL writing, and how to build indexes, and we would no longer have to worry about SQL executing the wrong plan.
The new SQL Plan Directives of Oracle12c resolve SQL performance problems caused by inaccurate Rows estimates to some extent. SQL Plan Directives are not discussed much in this article.
In order for the CBO to select the correct execution plan, we need to collect histogram information for the owner column to inform the CBO that the column data is not evenly distributed so that the CBO can refer to the histogram statistics when calculating Rows. Now we collect the histogram for the owner column.
1SQL> BEGIN 2 2 DBMS_STATS.GATHER_TABLE_STATS(ownname => 'SCOTT', 3 3 tabname => 'TEST', 4 4 estimate_percent => 100, 5 5 method_opt => 'for columns owner size skewonly', 6 6 no_invalidate => FALSE, 7 7 degree => 1, 8 8 cascade => TRUE); 9 9 END; 10 10 /11PL/SQL procedure successfully completed.
Copy the code
Look at the histogram information for the OWNER column.
1SQL> select a.column_name, 2 2 b.num_rows, 3 3 a.num_distinct Cardinality, 4 4 round(a.num_distinct / b.num_rows * 100, 2) selectivity, 5 5 a.histogram, 6 6 a.num_buckets 7 7 from dba_tab_col_statistics a, dba_tables b 8 8 where a.owner = b.owner 9 9 and a.table_name = b.table_name10 10 and a.owner = 'SCOTT'11 11 and a.table_name = 'TEST'; 12COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS13--------------- ---------- ----------- ----------- ---------- -----------14OWNER 72462 29.04 FREQUENCY 2915OBJECT_NAME 72462 44236 61.05 NONE 116SUBOBJECT_NAME 72462 106 .15 NONE 117OBJECT_ID 72462 72462 100 NONE 118DATA_OBJECT_ID 72462 7608 10.5 NONE 119objtype 72462 44.06 NONE 120CREATED 72462 1366 1.89 NONE 121LAST_DDL_TIME 72462 1412 1.95 NONE 122TIMESTAMP 72462 1480 2.04 NONE 123STATUS 72462 1 0 NONE 124TEMPORARY 72462 2 0 NONE 125GENERATED 72462 2 0 NONE 126SECONDARY 72462 2 0 NONE 127NAMESPACE 72462 21 .03 NONE 128EDITION_NAME 72462 0 0 NONE 02915 rows selected.
Copy the code
Now let’s query the ABOVE SQL to see if the execution plan is still wrong and verify that Rows is still wrong.
1SQL> select * from test where owner='SCOTT'; 27 rows selected. 3Execution Plan 4---------------------------------------------------------- 5Plan hash value: 3932013684 6------------------------------------------------------------------------------------- 7| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 8------------------------------------------------------------------------------------- 9| 0 | SELECT STATEMENT | | 7 | 679 | 2 (0)| 00:00:01 |10| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 7 | 679 | 2 (0)| 00:00:01 |11|* 2 | INDEX RANGE SCAN | IDX_OWNER| 7 | | 1 (0)| 00:00:01 |12-------------------------------------------------------------------------------------13Predicate Information (identified by operation id):14---------------------------------------------------15 2 - access("OWNER"='SCOTT')16SQL> select * from test where owner='SYS'; 1730808 rows selected.18Execution Plan19----------------------------------------------------------20Plan hash value: 135708102021--------------------------------------------------------------------------22| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |23--------------------------------------------------------------------------24| 0 | SELECT STATEMENT | | 30808 | 2918K| 290 (1)| 00:00:04 |25|* 1 | TABLE ACCESS FULL| TEST | 30808 | 2918K| 290 (1)| 00:00:04 |26--------------------------------------------------------------------------27Predicate Information (identified by operation id):28---------------------------------------------------29 1 - filter("OWNER"='SYS')
Copy the code
After collecting the histograms for the owner column, the Rows estimated by the CBO are basically accurate, and once the Rows estimates are correct, the execution plan can’t go wrong.
You may wonder why Rows calculates so accurately after collecting histograms, and what exactly is done by collecting histograms? Collecting the histogram for the OWNER column is equivalent to running the following SQL.
1select owner,count(*) from test group by owner;
Copy the code
The histogram information is the result of the above SQL query, which is stored in the data dictionary. This way, when we query for any value with owner, CBO always calculates the correct Rows because the histogram already knows how many Rows there are for each value.
SQL that uses bound variables whose columns collect histograms causes bound variable snooping. Binding variable snooping is a well-worn issue that won’t be discussed here. Oracle11g introduced Adaptive Cursor Sharing, which basically solved the problem of binding variable snooping, but Adaptive Cursor Sharing also caused some new problems, which are not discussed too much.
What happens when we encounter an SQL with bound variables? It’s actually quite simple, we just need to run the following statement.
1select count(*) from test group by order by 2 desc;
Copy the code
If the column data is evenly distributed, there is basically no problem with SQL; If the column data is not evenly distributed, we need to collect histogram statistics for the column.
There are many other topics about histograms, such as the types of histograms and the number of buckets of histograms, which will not be discussed in this paper. In our opinion, it is enough for the reader to know that the histogram is used to help the CBO get more accurate Rows when performing low base Rows with uneven data distribution.
What columns need to collect the histogram? Columns listed in the WHERE condition should collect histograms when the selectivity of the column is less than 1% and the column has no histograms collected. Note: Never collect histograms for columns that do not appear in a WHERE condition. Collecting histograms for columns that do not appear in a WHERE condition is a complete waste of database resources.
Let’s share the second fully automated optimization script in this article.
Grab the columns where the histogram must be created (you can modify the script appropriately for use in production).
1SQL> select a.owner, 2 2 a.table_name, 3 3 a.column_name, 4 4 b.num_rows, 5 5 a.num_distinct, 6 6 trunc(num_distinct/num_rows * 100,2) selectivity, 7 7 'Need Gather Histogram' notice 8 8 from dba_tab_col_statistics a, dba_tables b 9 9 where a.owner = 'SCOTT'10 10 and a.table_name = 'TEST'11 11 and a.owner = b.owner12 12 and a.table_name = b.table_name13 and num_distinct/num_rows<0.0114 14 and (a.owner, a.table_name, a.column_name) in15 15 (select r.name owner, o.name table_name, c.name column_name16 16 from sys.col_usage$ u, sys.obj$ o, sys.col$ c, sys.user$ r17 17 where o.obj# = u.obj#18 18 and c.obj# = u.obj#19 19 and c.col# = u.intcol#20 20 and r.name = 'SCOTT'21 21 and o.name = 'TEST')22 22 and a.histogram ='NONE'; 23OWNER TABLE COLUM NUM_ROWS NUM_DISTINCT SELECTIVITY NOTICE24----- ----- ----- ---------- ------------ ----------- ----------------------25SCOTT TEST OWNER 72462 29 .04 Need Gather Histogram
Copy the code
When an index is created on a column, the index contains the key value of the column and the ROWID of the row that the key value corresponds to. The data in the table is called back by roWID access recorded in the index. A table is usually read in a single block. If the table is read back for too many times, THE SQL performance is severely affected. If the table is read back for too many times, the index scan is not performed.
When SQL optimization, be sure to pay attention to the number of times back to the table! In particular, pay attention to the physical I/O count of the back table!
Remember the faulty execution plan in Section 1.3?
1SQL> select * from test where owner='SYS'; 230808 rows selected. 3Execution Plan 4---------------------------------------------------------- 5Plan hash value: 3932013684 6------------------------------------------------------------------------------------- 7| Id | Operation | Name | Rows | Bytes | Cost(%CPU)| Time | 8------------------------------------------------------------------------------------- 9| 0 | SELECT STATEMENT | | 2499 | 236K| 73 (0)| 00:00:01 |10| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 2499 | 236K| 73 (0)| 00:00:01 |11|* 2 | INDEX RANGE SCAN | IDX_OWNER| 2499 | | 6 (0)| 00:00:01 |12-------------------------------------------------------------------------------------13Predicate Information (identified by operation id):14---------------------------------------------------15 2 - access("OWNER"='SYS')
Copy the code
The bold part of the execution plan (TABLE ACCESS BY INDEX ROWID) is the back TABLE. The index returns as many rows of data as it returns to the table, each time a single block read (because one ROWID corresponds to one block). This SQL returned 30808 rows of data, so it would take 30808 times to return to the table.
Consider: is the performance of the execution plan above being spent in index scans or back tables?
To get the answer, please experiment in SQLPLUS. To eliminate the effect of arraysize on logical reads, set arraySize =5000. Arraysize indicates how many rows of data the Oracle server transfers to the client at a time. The default is 15. If a block has 150 rows of data, the block will be read 10 times, and since only 15 rows are transferred to the client at a time, the logical read will be amplified. With arraySize =5000, the problem of a block being read n times does not occur.
1SQL> set arraysize 5000 2SQL> set autot trace 3SQL> select owner from test where owner='SYS'; 430808 rows selected. 5Execution Plan 6---------------------------------------------------------- 7Plan hash value: 373050211 8------------------------------------------------------------------------------ 9| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |10------------------------------------------------------------------------------11| 0 | SELECT STATEMENT | | 2499 | 14994 | 6 (0)| 00:00:01 |12|* 1 | INDEX RANGE SCAN| IDX_OWNER | 2499 | 14994 | 6 (0)| 00:00:01 |13------------------------------------------------------------------------------14Predicate Information (identified by operation id):15---------------------------------------------------16 1 - access("OWNER"='SYS')17Statistics18----------------------------------------------------------19 0 recursive calls20 0 db block gets21 74 consistent gets22 0 physical reads23 0 redo size24 155251 bytes sent via SQL*Net to client25 486 bytes received via SQL*Net from client26 8 SQL*Net roundtrips to/from client27 0 sorts (memory)28 0 sorts (disk)29 30808 rows processed
Copy the code
As you can see from the above experiment, the index scan only took 74 logical reads.
1SQL> select * from test where owner='SYS'; 230808 rows selected. 3Execution Plan 4---------------------------------------------------------- 5Plan hash value: 3932013684 6------------------------------------------------------------------------------------- 7| Id |Operation | Name | Rows | Bytes | Cost(%CPU)| Time | 8------------------------------------------------------------------------------------- 9| 0 | SELECT STATEMENT | | 2499 | 236K| 73 (0)| 00:00:01 |10| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 2499 | 236K| 73 (0)| 00:00:01 |11|* 2 | INDEX RANGE SCAN | IDX_OWNER| 2499 | | 6 (0)| 00:00:01 |12-------------------------------------------------------------------------------------13Predicate Information (identified by operation id):14---------------------------------------------------15 2 - access("OWNER"='SYS')16Statistics17----------------------------------------------------------18 0 recursive calls19 0 db block gets20 877 consistent gets21 0 physical reads22 0 redo size23 3120934 bytes sent via SQL*Net to client24 486 bytes received via SQL*Net from client25 8 SQL*Net roundtrips to/from client26 0 sorts (memory)27 0 sorts (disk)28 30808 rows processed29SQL> set autot off30SQL> select count(distinct dbms_rowid.rowid_block_number(rowid)) blocks31 2 from test32 3 where owner = 'SYS'; 33 BLOCKS34----------35 796
Copy the code
SQL SQL: 877 logical reads in the case of a return table, so the 877 logical reads are from?
The 30808 data pieces returned from SQL are stored in 796 data blocks. Accessing these 796 data blocks requires 796 logical reads, including 74 logical reads from the index scan, and 7 logical reads (7=ROUND (30808/5000)). This adds up to 877 logical reads.
So we can tell that most of the performance loss of this SQL is indeed in the back table!
Even worse: assuming that 30 808 pieces of data are in different blocks and the table is not cached in the buffer cache, it would take 30 808 physical I/O to retrieve the table, which is terrible.
If the index is less than 5%, the index is scanned. If the index is more than 5%, the index is scanned. The root cause is back to the table.
If too much data is returned, too many times of return to the table will result in performance deterioration.
TABLE ACCESS BY INDEX ROWID BATCHED improves the performance of TABLE ACCESS BY INDEX ROWID to a certain extent. Batch back tables are not discussed in this article.
What SQL must be called back to a table?
1Select * from table where ...
Copy the code
Such SQL must be returned to the table, so we must prohibit the use of Select *. So what kind of SQL doesn’t need a table back?
1Select count(*) from table
Copy the code
Such SQL does not need to return to the table.
When the column to be queried is also included in the index, there is no need to return to the table, so we often build composite indexes to eliminate the return to the table and improve query performance.
When an SQL server has multiple filtering criteria but only one or part of the columns are indexed, TABLE ACCESS BY INDEX ROWID (*) occurs. In this case, combined indexes need to be created to eliminate TABLE refiltering and improve query performance.
The question of how to create a composite index is so complex that we’ll talk about it again and again later in this article.
This article is excerpted from the Core Ideas of SQL Optimization
“SQL Optimization Core Ideas”
Luo bingsen closed the year
Click on the cover to buy the paper book
Structured Query Language (SQL) is a powerful database Language. Based on relational algebra operation, it has rich functions, simple language, convenient and flexible use, and has become the standard language of relational database. The purpose of this book is to guide readers to master SQL optimization skills to better improve database performance. This book is written based on Oracle and is suitable for readers at all levels.
This book is aimed at front-line engineers, operations engineers, database administrators, and system designers and developers, both beginners and some basic readers will benefit from it.
The book catalog
(Swipe to view)
Chapter 1 SQL optimization must understand concept 1
1.1 CARDINALITY 1
1.2 SELECTIVITY 3
1.3 HISTOGRAM 7
1.4 TABLE ACCESS BY INDEXROWID 13
1.5 CLUSTERING FACTOR 15
1.6 Relationships between Tables 19
Chapter 2 Statistical Information
2.1 What is statistics 21
2.2 Setting Major Statistics Parameters 24
2.3 Checking whether statistics is Expired 32
2.4 Extended Statistics 37
2.5 Dynamic sampling 42
2.6 Customizing statistics Collection Policies 47
Chapter 3 Implementation of Plan 49
3.1 Common Methods for Obtaining an Execution Plan 49
3.1.1 Using AUTOTRACE to Query Execution Plan 49
3.1.2 Using EXPLAIN PLAN FOR to View execution PLAN 52
3.1.3 Viewing the Execution Plan with a-time 54
3.1.4 Viewing the EXECUTION Plan of the SQL Being Executed 56
3.2 Customizing an Execution Plan 57
3.3 How can I Create index 59 by Viewing the Execution Plan
3.4 Use cursor movement to read the executive plan 63
Chapter 4 ACCESS PATH 67
4.1 Common Access Paths 67
4.1.1 TABLE ACCESS FULL 67
4.1.2TABLE ACCESS BY USER ROWID 71
4.1.3TABLE ACCESS BY ROWID RANGE 71
4.1.4TABLE ACCESS BY INDEX ROWID 72
4.1.5 INDEX UNIQUE SCAN for 72
4.1.6 the INDEX RANGE SCAN for 73
4.1.7 INDEX SKIP SCAN for 74
4.1.8 INDEX FULL SCAN for 75
4.1.9INDEX FAST FULL SCAN 77
4.1.10INDEX FULL SCAN (MIN/MAX) 80
4.1.11MAT_VIEW ACCESS FULL 83
4.2 Single-block Read vs. Multi-Block Read 83
4.3 Why index scans are sometimes slower than full table scans 84
4.4 Impact of DML on Index Maintenance 84
Chapter 5 Table connection 86
NESTED LOOPS 86
5.2HASH JOIN 90
5.3 SORT MERGE JOIN 93
5.4 CARTESIAN JOIN 95
5.5 SCALAR SUBQUERY 98
5.6 SEMI JOIN 100
5.6.1 Half-connection Equivalent rewrite 100
5.6.2 Controlling Half-Connection Execution Plan 101
5.6.3 Reader thinking 103
5.7 Anti-Join 104
5.7.1 Anti-connection Equivalent Rewriting 104
5.7.2 Controlling anti-connection Execution Plan 105
5.7.3 Reader thinking 108
5.8 the FILTER of 108
5.9 Who is fast and who is slow IN and EXISTS
5.10 Nature of SQL Statements 111
Chapter 6 Costing 112
6.1 Is it necessary to optimize SQL according to COST 112
6.2 Cost calculation of full table scan 112
6.3 Index range scanning cost calculation 116
6.4 Core thought of SQL Optimization 119
Chapter 7 must master query transformation 120
7.1 Non-nested subquery 120
7.2 View Merge 125
7.3 Predicate push 129
Chapter 8 Tuning Techniques 133
8.1 Viewing the Actual Cardinality (Rows) 133
8.2 Use UNION instead of OR 134
8.3 Page Statement Optimization 135
8.3.1 Single-table Paging 135
8.3.2 Optimization of Multiple table Associated Paging 150
8.4 Use analysis functions to optimize self-join 153
8.5 Association Optimization method of Super-large tables and super-small tables 154
8.6 Optimization method of association between oversized tables and oversized tables 155
8.7LIKE Statement Optimization method 159
8.8 DBLINK optimization of 161
8.9 ROWID section of the table 167
8.10SQL Three-Part Splitting 169
Chapter 9 SQL optimization case appreciation 170
9.1 Composite Index Optimization Case 170
9.2 Histogram Optimization Case 173
9.3 The NL driven table Cannot run INDEX SKIP SCAN 177
9.4 Pay attention to the relationships between tables when optimizing SQL
9.5 Optimization case of INDEX FAST FULL SCAN 179
9.6 Paging Statement Optimization 181
9.7ORDER BY list optimization case 183
9.8 Semi-join Reverse drive master table Case 185
9.9 Semi-join Reverse drive master table Case 2 187
9.10 Performance Problems Caused by Unbalanced Connection Column Data Distribution 192
9.11 Classic Cases of Filter Optimization 198
9.12 Tree Query Optimization Case 202
9.13 Local Index Optimization 204
9.14 Standard quantum query Optimization case 206
9.14.1 Case one 206
9.14.2 Case two 207
9.15 Associated Update Optimization Case 211
9.16 Only NL 213 can be used for External Connection with OR Association Conditions
Treat your head like CBO 217
9.18 Extended Statistics Optimization 221
9.19 Optimize wmsys. WM_CONCAT 227 with LISGAGG analysis function
9.20INSTR Non-equivalent Correlation Optimization case 230
9.21 Cases of REGEXP_LIKE Non-equivalent Correlation Optimization 233
9.22ROW LEVEL SECURITY Optimization Case 237
9.23 Non-nested Subquery Optimization Case 240
9.24 Sub-query Non-nested Optimization Case 247
Failed to push 252 predicates due to bad outer join
9.26 Predicate push optimization case 262
9.27 Optimizing SQL 268 with CARDINALITY
Optimize SQL 272 with wait events
Chapter 10 Automatic SQL audit 281
10.1 Retrieving a table where no index is created for a foreign key
10.2 Capture the column whose histogram needs to be collected 282
10.3 Capture column 283 that must be indexed
SQL > SELECT * from SQL
10.5 Extracting SQL with Standard quantum query 285
10.6 Retrieving SQL 286 with custom functions
SQL 287 was called repeatedly to retrieve the table
SQL 288 of FILTER is removed
10.9 Catch nested loop SQL 290 that returns a large number of rows
SQL 292: select * from SQL where a table is being driven
TABLE ACCESS FULL SQL 293
SQL 294 INDEX FULL SCAN SQL 294
SQL 295 for INDEX SKIP SCAN is captured
10.14 Finding which SQL statements reference the index 297
SQL 298 for Cartesian product
SQL 299 caught the wrong sort merge join
10.17 Retrieving PSQL 301 of LOOP
SQL with low selectivity index caught
10.19 Select columns from SQL where composite indexes can be created
10.20 Creating a composite index in SQL (only a few fields are accessed by the table) 306
Today’s interactive
What kind of articles are you interested in in asynchronous books? Why is that? The deadline is 17 o ‘clock on April 27th, leave a message + forward this activity to the moments of friends, xiaobian will draw a lucky draw to select 5 readers to give two paper books and three e-reading version of 100 yuan asynchronous community voucher, (the most like message will automatically get one).