Databases are a big topic. There are a variety of common relational databases such as Mysql, Oracle, non-relational databases, and graph databases. Database performance is related to many parts, from hardware underlying storage devices, operating systems, database configuration parameters, database architecture, database table structure, application-level connection pool Settings, and SQL indexes.

Database architecture

To analyze the Mysql database, we first need to understand the system architecture of Mysql, as shown in the following figure:

From this architecture diagram, it can be seen that the Mysql system architecture is divided into application layer, Mysql service layer and storage engine layer.

  1. Application layer: The application layer is the top layer of MySQL architecture. Like other client-server architectures, the application layer mainly includes connection processing, user authentication, and security management

  2. MySQL Service layer: This layer is the core layer of MysqlServer and provides all logical functions of the MySQL Server database

  3. Storage engine layer

Storage engine is the subsystem of MySQL that deals with files concretely and is also the most distinctive part of MySQL. The most important feature that sets MySQL apart from other databases is its plug-in table storage engine. He customized a file access mechanism (called the storage engine) based on the abstract interface of the file access layer provided by MySQL AB.

Physical files include redolog, undolog, binlog, errorlog, querylog, slowlog, data, and index

SQL runtime procedure

After knowing the database architecture, it is necessary to know the functions and running logic of these modules during performance analysis, and understand the process that a specific SQL needs to go through: After an SQL reaches the system through the Connection Pool, it is necessary to enter the SQL interface module to determine the type of the statement. Then the Parser module is used to check the syntax and semantics and generate the corresponding execution plan. It then goes to the Optimizer module to determine what index to use, what order to execute, etc. It then looks for data in the Cache or, if not in the Caches, through the file system to disk.

Performance analysis Basic monitoring indicators

It is not enough to understand the mysql system architecture and mysql execution process. In the performance analysis, it is necessary to find out the problems of mysql and first understand some basic knowledge and corresponding monitoring tools.

The first two schemas you need to know are information_SCHEMA and Performance_schema. Information_schema stores all the tables, columns, indexes, permissions, configuration parameters, status parameters, and other information in the database. Something like show ProcessList we often do; From the processList table in this schema. Performance_schema provides the resource consumption of the database at runtime. It collects information at a low cost and can provide a lot of performance data.

Showglobal variables; showglobal variables; And show global status; The former is used to view the configured parameter values, and the latter is used to query the status values. However, these commands are simply lists of information, not statistical analysis. Next, we introduce two good monitoring tools.

Global analysis: mysqlReport

The output report of show status is used to calculate the reference data of performance bottlenecks, but the data is only a simple listing, it is not easy to see the performance problems at once, and mysqlReport is not like the simple listing of data of show Status, but the fusion of these reference data into one optimization reference point. Adjustments can then be made based on the value of the optimized reference point and the measure of that point.

Install mysqlReport in Linux

  1. Yum -y install perl-dbd-mysql dependency package

  2. Yum -y install perl-dbi # yum -y install perl-dbi #

  3. Step 3: yum -y install mysqlReport

These three steps set up the tool on a Linux system. Now you can analyze the database health.

MysqlReport

Use relatively simple, directly execute:

Mysqlreport --user Tesla --password xxx@2015 --host 127.0.0.1 --no-mycnf--flush-status --outfile./result.txtCopy the code

You can save the entire database to the current directory.

View specific command parameters

Mysqlreport - helpCopy the code

Mysqlreport result analysis

Database operation report and query sort report

This table reflects database usage. 608 operations per second is a bit high. The slow parameter is important, but it is too long to set the slow query to 10s.

The DMS section tells us the proportion of the various SQL statements in the database. In this example, SELECT is a lot. If you want to do SQL optimization, you must give priority to SELECT statements, which will have an immediate effect.

  1. Select and sort queries and sorts reports

The report data in this piece is of great reference value, and the problem can be seen immediately. Here, Scan (representing full table Scan) performs full table Scan 48 times per second, which is too much, and the statement needs to be modified, which is also the focus of our later optimization.

  1. InnoDB cache pool report

InnoDB Buffer Pool size defines the maximum memory cache size for InnoDB storage engine table data and index data. This is a very important part of MySQL, where the usage has reached 100% and in this case you have to increase the Innodb cache pool. Here the Read hit reaches 92.57%, the larger the better, try to reach 100%. Here the value is related to the small size of Innodb buffer.

  1. Connect the report

You can see from this that the data connection is fully adequate.

  1. Table lock statements

How many queries need to wait for table lock; Immediate indicates the number of queries that immediately obtain a table lock, followed by a percentage

“Waiting” is almost certainly a bad thing for the database, so the value of Waited should be as small as possible. The most representative is the third field (Waited percentage of all table lock), here is 0.00%, very good, no table lock is sent.

  1. Temporary table report

Explain Using temporary state during SQL analysis. This means that temporary tables need to be created to store intermediate data during query. We need to avoid this by Using proper indexes. MySQL can create temporary tables in Disk tables, memory tables, and temporary files. Obviously, creating temporary tables on Disk costs the most. Tmp_table_size = tmp_table_size = tmp_table_size = tmp_table_size = tmp_table_size = tmp_table_size = tmp_table_size = tmp_table_size

Other global information can be checked under the information

Global analysis results

The following problems need to be solved according to the results of the tool mysqlReport:

  1. The total database operation reaches more than 600 per second. For Intranet system users, the operation is a little too frequent. We can reduce unnecessary database operation.

  2. Slow query is not enabled, and the set time is as long as 10s. Usually, a statement larger than 100ms requires optimization. In this case, you need to set a short time to analyze the slow query

  3. Full table scan 48.5/s to analyze the specific SQL writing method

  4. Innodb cache usage is 100%, and the setting size is too small, need to increase the cache size.

Pt – query – digest tools

It is the preferred tool for analyzing mysql because it can analyze mysql from logs, processList, and tcpdump. Logs include slow log, General log, and binlog. You can also output the analysis to a file or write the file to a table. The analysis process is to parameterize the conditions of the query statement first, and then group the parameterized queries to find out the execution time, times and proportion of each query. Problems can be found and optimized with the help of the analysis results.

Installation method

Download www.percona.com/downloads/p…

Installation: centos dependency package

yum -y install perl-TermReadKey perl-Time-HiResperl-IO-Socket-SSL.noarch
pt-query-digest --help

Copy the code

Pt-query-digest reports generated when analyzing slow /bin logs are very logical and complete. After executing the command, a report is generated because slow log is not enabled on the offline network. Here we analyze the bin log on the offline network

Method of use

Convert to binlog:

mysqlbinlog --no-defaults -vv --base64-output=DECODE-ROWSmysql-bin.000818 > mysql-bin.000818.txt
pt-query-digest --type=binlog mysql-bin.000818.txt > 818.report.log

Copy the code

Filter out full table scan statements

Set database Settings Enable log_QUERIES_NOT_USING_INDEXES = ON; The full table scan statement is output to the slow query log. It is worth noting that SQL statements that take longer than long_query_time are also logged in slow log, regardless of whether the SQL statement uses an index.

Steps for profiling: View the detailed execution plan

Set profiling=1; // This step is to turn on the profiling feature

Step 2: Execute the statement // Execute the statement you see from the slow log

Step 3: Show profiles; // This step is to find the ID of the statement executed in Step 2

Step 4: show profile all for query ID; // This step is to show the results of profiling

Modify the table structure to add an index: The index name is usually the name of the table plus the name of the field

show index fromproject_permissions; ALTER table project_permissions ADD INDEX idex_project (project_id); ALTER table tableName ADD INDEX indexName(columnName) create INDEX INDEX name on table name (columnName)Copy the code

Analysis: very frequently executed statements and full table scan 1)

explain SELECT project_id, modified_time, name, permissions, isGroupFROM project_permissions WHERE project_id=2076;

Copy the code

Mysql implicitly converts project_id to a string type by changing the value type of project_id to a string type

2)

explain SELECT id, model_name, model_type, job_id, properties,gmt_create, owner, last_execution_model, gmt_modified, published, status,module_id from mlstudio_model where job_id=13788;

Copy the code

ALTER table MLstudio_model ADD INDEX index_model(job_id) ALTER table MLstudio_model ADD INDEX index_model(job_id

3)

explain SELECT id, name, user_id, property, gmt_create,gmt_modified, appstatus, execution_info FROM mlstudio_deployed_notebooks WHEREappstatus in (10,140,20,120) ORDER BY gmt_modified desc;

Copy the code

Analysis and solution: There are more than 200 records in the database table, and the index is not added, and the whole table is scanned. The priority is not high, but the property field and execution_info information are large. You are advised not to specify the property if the property field is not used in the query statement

4)

explain select id, algorithm_id, version, create_time, modify_time,module_id, shared, type, source_algorithm_version_id fromti_user_algorithm_version where module_id = 813;


Copy the code

Workaround: Data table records are currently fewer and database fields are shorter

ALTER table ti_user_algorithm_version ADD INDEX index_algorithm(module_id)

Copy the code

5)

explain select id, gmt_create, gmt_modified, name, type,description, checked, permission, user_id, nick_name, config_file_name,config_file_res, module_res, module_dependencies, job_type, user_coded,has_model, icon, module_jars from mlstudio_modules where module_res=0 andtype>0 and type <1001 and job_type=2;

Copy the code

There are not many data records, and the field values are relatively short. The query occupies a relatively small space, and 625 entries have little impact

6)

explain SELECT id, name, type, gmt_create, owner, gmt_modified,published, status, module_id, properties from mlstudio_dataset where module_id= 229;

Copy the code

There are not many data records, and the field values are relatively short, so the space occupied by the query is relatively small. The 55 entries have little impact. If the module_ID is indexed, the few queries can be processed without processing

7)

Explain select algorithm_id from ti_user_favorite whereuser_id = 'jianfehuang' and algorithm_id = 101; create index algorithm on ti_user_algorithm_favorite (user_id,algorithm_id);Copy the code

Solution: Create a joint index. After the index is created, the speed is improved to a certain level. There are 196 records in the database at present

8)

explain select cid, cname, cdesc, cicon, clevel, cparent, cvisible,group_concat(mid order by mname), sum(mpermission) as public_num from(select mmc.id as cid, mmc.name as cname,mmc.desc as cdesc,mmc.icon ascicon,mmc.level as clevel, mmc.parent_id as cparent,mmc.visible ascvisible,mmc.order_num as corder,mm.id asmid, mm.name as mname,mm.permission as mpermission from mlstudio_module_category mmc left joinmlstudio_modules mm on mmc.id =mm.type) as t group by cid, cname, cdesc, cicon, clevel, cparent, cvisibleorder by corder;

Copy the code

9)

select queuequota0_.id as id1_1_, queuequota0_.cpu as cpu2_1_,queuequota0_.gmt_create as gmt_crea3_1_, queuequota0_.gpu_map as gpu_map4_1_,queuequota0_.jizhi_business_flag as jizhi_bu5_1_, queuequota0_.memory asmemory6_1_, queuequota0_.name as name7_1_, queuequota0_.gmt_modified asgmt_modi8_1_, Uuid as uuid9_1_ from queue_quota queuequota0_ wherequeuequota0_. Name = 'g_teG_TESlaml_appGroup04';Copy the code

Analyze full table scan: The current data table is relatively small, with only 155 data items, which has little impact on performance. If the data amount is expected to increase later, consider adding indexes.

10)

select task0_.id as id1_0_, task0_.admin_group as admin_gr2_0_,task0_.alert_group as alert_gr3_0_, task0_.business_flag as business4_0_,task0_.gmt_create as gmt_crea5_0_, task0_.creator as creator6_0_,task0_.description as descript7_0_, task0_.flag as flag8_0_, task0_.modifier asmodifier9_0_, task0_.name as name10_0_, task0_.project_id as project11_0_,task0_.props as props12_0_, task0_.type as type13_0_, task0_.gmt_modified as gmt_mod14_0_,task0_.view_group as view_gr15_0_ from tj_task task0_ where task0_.project_idin (1157, 1913, 2078);Copy the code

Analysis of full table scan: The current tai Chi task data table is relatively small, with only 9 data pieces, which has little impact on performance. If the data amount is expected to increase later, consider adding indexes.

Slow queries tend to slow down with more workflows in a project, which has a significant impact on performance

select flow_id,max(id * 1000 + status) % 1000 as last_user_drive_status frommlstudio_execution_jobflow where (drive_type  = 1 or drive_type is null) andproject_id in (24529) group by flow_idCopy the code

There are problems scanning large amounts of data, copying to temporary tables, sorting files in execution.

Is amended as:

select f.flow_id,f.status from mlstudio_model_flowt inner join mlstudio_execution_jobflow f on t.last_jobflow_id=f.id where t.project_id in (24529)

Copy the code

MySQL tuning innodb_buffer_pool_size size setting

Query online configuration:

SQL > show global variables like 'innodb_buffer_pool_size'; SQL > show global status like 'Innodb_buffer_pool_pages_data'; SQL > show global status like 'Innodb_page_size'; SQL > show global status like 'Innodb_buffer_pool_pages_total';Copy the code

Intranet query data result:

Innodb_buffer_pool_pages_total | 8191
Innodb_buffer_pool_pages_data | 8116
Innodb_page_size | 16384
innodb_buffer_pool_size | 134217728

Copy the code

Tuning reference calculation method:

Val =Innodb_buffer_pool_pages_data/Innodb_buffer_pool_pages_total * 100% val > 95% If val < 95%, reduce innodb_buffer_pool_size. The recommended value is: Innodb_buffer_pool_pages_data * Innodb_page_size * 1.05 / (102410241024)Copy the code

The internal network calculated: 8116/8190=99% need to increase this data

Database configuration modification: /etc/my.cnf modified in the test environment

1. Enable the slow query log. The slow query log is recorded as 1 second, which affects the database performance by 1%

slow_query_log = ON
long_query_time = 1

Copy the code

2. Innodb cache increases

Innodb_buffer_pool_size = 2G # set to 2GCopy the code

3. The current 64M temporary table needs to be enlarged

tmp_table_size = 256M;
max_heap_table_size = 256M;

Copy the code

conclusion

This paper briefly introduces the relevant methods of database optimization. Through the global analysis of two tools: mysqlReport, the show status reference data are integrated and calculated, sorted into one optimization reference point, and then the corresponding adjustment can be made according to the value of the optimization reference point and the measurement standard of the point.

The pt-query-digest tool analyzes MySQL from logs, processList, and tcpdump. Logs include slow logs, general logs, and binlogs. You can use the analysis results to identify problems and optimize them. These two tools can be used to optimize mysql configuration at the database configuration layer, and can also be used to identify slow performance statements and optimize SQL execution through profiling.