Chapter 5 Optimizing Query Performance (4)
Annotation option
You can specify one or more comment options for the query optimizer in the SELECT, INSERT, UPDATE, DELETE, or TRUNCATE table commands. The comment option specifies the options that the query optimizer uses during compiling the SQL query. Typically, the comment option is used to override the system-wide default configuration for a particular query.
grammar
The syntax /*#OPTIONS */(with no space between /* and #) specifies a comment option. Comment options are not comments; It specifies a value for the query optimizer. Comment OPTIONS are specified using JSON syntax, usually “key:value” pairs, for example: /*#OPTIONS {“optionName”:value} */. Support for more complex JSON syntax, such as nested values.
Comment options are not comments; It may not contain any text other than JSON syntax. Contains non-JSON text in /*… */ Separator causes SQLCODE -153 error. InterSystems SQL does not validate the contents of the JSON string.
The #OPTIONS keyword must be specified in uppercase letters. JSON syntax should not use Spaces in curly braces. If SQL code is enclosed in quotes, such as dynamic SQL statements, the quotes in JSON syntax should be double quotes. For example, myquery=”SELECT Name FROM sample. MyTest /*#OPTIONS {“”optName””:””optValue””} */”.
The /*#OPTIONS */ comment option can be specified anywhere in the SQL code where comments can be specified. The comment option is always displayed as a comment at the end of the statement text displayed.
You can specify multiple /*#OPTIONS */ comment OPTIONS in your SQL code. They are displayed in the returned statement text in the specified order. If multiple comment options are specified for the same option, the option value specified by last is used.
The following comment options are documented:
-
/*#OPTIONS {"BiasAsOutlier":1} */
-
/ * # OPTIONS {" DynamicSQLTypeList ":" 10,1,11}"
-
/*#OPTIONS {"NoTempFile":1} */
According to
/*#OPTIONS */ comment OPTIONS are displayed at the end of the SQL statement text, regardless of where they are specified in the SQL command. Some of the displayed /*#OPTIONS */ comment OPTIONS are not specified in the SQL command, but are generated by the compiler’s preprocessor. /*#OPTIONS {“DynamicSQLTypeList”:… } * /
/*#OPTIONS */ comment OPTIONS are displayed in the Show Plan statement text, cached query query text, and SQL statement statement text.
Create a separate cache query for only the different queries in the /*#OPTIONS */ comment OPTIONS.
Parallel query processing
Parallel query prompts instruct the system to perform parallel query processing when running on a multiprocessor system. This can greatly improve the performance of some types of queries. The SQL optimizer determines whether a particular query can benefit from parallel processing and performs parallel processing when appropriate. Specifying parallel query hints does not enforce parallel processing for every query, only those queries that might benefit from parallel processing. This option is not valid if the system is not multiprocessor. To determine the number of processors on the current SYSTEM, use the % system.util.numberofcpus () method.
Parallel query processing can be specified in two ways:
- System-wide, by setting
auto parallel
Options. - In each query
FROM
Clause%PARALLEL
The keyword.
Parallel query offices are supposed to be used for SELECT queries. It should not be used for insert, update, or delete operations.
System-wide parallel query processing
You can configure system-wide automatic parallel query processing using one of the following options:
- On the PORTAL, select System Administration, then Configuration, then SQL and Object Settings, and finally SQL.
View or change the check box for executing queries in a single process. Note that the default value of this check box is unchecked, which means that parallel processing is enabled by default.
- call
$SYSTEM.SQL.Util.SetOption()
The methods are as follows:SET status=$SYSTEM.SQL.Util.SetOption("AutoParallel",1,.oldval)
.
The default value is 1(automatic parallel processing activation). To determine the current Settings, call $SYSTEM. SQL. CurrentSettings (), it will appear as a % of the PARALLEL options enable automatic hints.
Note that changing this configuration setting clears all cache queries in all namespaces.
When activated, the automatic parallel query prompt instructs the SQL optimizer to apply parallel processing to any query that might benefit from this processing. In IRIS 2019.1 and later, automatic parallel processing is enabled by default. Users upgrading from IRIS 2018.1 to IRIS 2019.1 need to explicitly activate automatic parallel processing.
One option the SQL optimizer uses to determine whether to perform parallel processing on a query is an automatic parallel threshold. If activated the range of automatic parallel processing SYSTEM (the default), you can use the $SYSTEM. SQL. Util. SetOption () method sets the automatic threshold value of the optimization of the parallel processing to integer values, as shown below: SET the status = $SYSTEM. SQL. Util. SetOption (” AutoParallelThreshold “, n,. Oldval). N The higher the threshold, the less likely this feature is to be applied to queries. This threshold is used for complex optimization calculations, but can be considered as the minimum number of tuples that must reside in an accessed map. The default value is 3200. The minimum value is 0. To determine the current Settings, call $SYSTEM. SQL. CurrentSettings (), it shows the % of the PARALLEL option automatically prompt threshold.
When automatic parallel processing is enabled, queries executed in a sharding environment will always be executed with parallel processing, regardless of the parallel threshold.
Parallel query processing for a specific query
The optional %PARALLEL keyword is specified in the FROM clause of the query. It recommends that IRIS across systems process queries in parallel using multiple processors, if applicable. This can significantly improve the performance of queries that use one or more COUNT, SUM, AVG, MAX, or MIN aggregate functions and/or groupby clauses, as well as many other types of queries. These are typically queries that process large amounts of data and return small result sets. For example, SELECT AVG(SaleAmt) FROM %PARALLEL User.AllSales GROUP BY Region can use PARALLEL processing.
One-line queries that specify only aggregate functions, expressions, and subqueries perform parallel processing, with or without themGROUP BY
Clause. However, a “multi-line” query that specifies both a single field and one or more aggregate functions does not perform parallel processing unless it containsGROUP BY
Clause. For example,SELECT Name,AVG(Age) FROM %PARALLEL Sample.Person
Parallel processing is not performed, howeverSELECT Name,AVG(Age) FROM %PARALLEL Sample.Person GROUP BY Home_State
Perform parallel processing.
If a query specifying %PARALLEL is compiled in runtime mode, all constants are interpreted in ODBC format.
Specifying %PARALLEL may degrade the performance of some queries. Running %PARALLEL queries on a system with multiple concurrent users can degrade overall performance.
Parallel processing can be performed when querying views. However, even if the %parallel keyword is explicitly specified, no parallel processing is performed on queries that specify %VID.
%PARALLEL
The subquery
%PARALLEL for SELECT queries and their subqueries. Insert command subqueries cannot use %PARALLEL.
%PARALLEL is ignored when applied to subqueries related to peripheral queries. Such as:
SELECT name,age FROM Sample.Person AS p
WHERE 30<(SELECT AVG(age) FROM %PARALLEL Sample.Employee where Name = p.Name)
Copy the code
%PARALLEL is ignored when applied to subqueries that contain complex predicates, or predicates optimized for complex predicates. Predicates considered complex include the FOR SOME and FOR SOME %ELEMENT predicates.
Parallel query processing is ignored
Regardless of how the AUTO PARALLEL option is set, or whether the %PARALLEL keyword is present in the FROM clause, it is possible that some queries will use linear processing rather than PARALLEL processing. After optimizing the query, InterSystems IRIS decides whether to use parallel processing for the query and apply other query optimization options, if specified. RIS can determine that the optimized form of query is not suitable for parallel processing, even if the user-specified form of query seems to benefit from parallel processing. You can use Show Plan to determine if and how InterSystems IRIS partitions queries for parallel processing.
Specifying %PARALLEL does not perform PARALLEL processing in the following cases. The query executed successfully without issuing an error, but no parallelization was performed:
- The query contains
FOR
Certain predicates. - This query contains one
TOP
Clause and oneORDER BY
Clause.
This clause combination optimizes the fastest time to the first line without using parallel processing. Adding the FROM clause %NOTOPOPT optimization-option keyword optimizes the fastest retrieval of the complete result set. If the query does not contain an aggregate function, the combination of %PARALLEL and %NOTOPOPT performs PARALLEL processing of the query.
- A query that contains a left outer join or inner join where the ON clause is not an equality condition. For example,
FROM %PARALLEL Sample.Person p LEFT OUTER JOIN Sample.Employee e ON p.dob > e.dob.
This is because SQL optimization converts this type of connection into a full external connection.
For full external connections, %PARALLEL is ignored.
%PARALLEL
and%INORDER
Optimizations cannot be used at the same time;
If both are specified, %PARALLEL is ignored.
- The query references a view and returns a view ID (
%VID
). - If a table is
BITMAPEXTENT
The index,COUNT(*)
Do not use parallel processing. %PARALLEL
Tables defined using standard data stores.
Its use with custom storage formats may not be supported. %PARALLEL does not support global temporary tables or tables with extended global reference stores.
%PARALLEL
For queries that can access all rows of a table, using row-level security (ROWLEVELSECURITY
) defined tables cannot perform parallel processing.%PARALLEL
Used to store data in a local database.
It does not support global nodes mapped to remote databases.
Shared memory considerations
For parallel processing, IRIS supports multiple interprocess queues (IPQs). Each IPQ processes a single parallel query. It allows parallel work unit subprocesses to send rows back to the main process so that the main process does not have to wait for the unit of work to complete. This enables parallel queries to return the first row of data as quickly as possible without having to wait for the entire query to complete. It also improves the performance of aggregate functions.
Parallel query execution uses shared memory from the universal Memory heap (GMHEAP). If parallel SQL query execution is used, users may need to increase the GMHEAP size. In general, the memory requirement for each IPQ is 4 x 64K = 256K. InterSystems IRIS breaks down a parallel SQL query into the number of CPU cores available. Therefore, the user needs to allocate this much extra GMheap:
<Number of concurrent parallel SQL requests> x <Number cores> x 256 = <required size increase (in kilobytes) of gmheap>
Copy the code
Note that this formula is not 100% accurate, because a parallel query can produce subqueries that are also parallel. Therefore, it is wise to allocate more extra GMHeap than this formula specifies.
Failure to allocate enough GMheap will result in an error reported to messages.log. SQL queries may fail. Other errors may occur when other subsystems try to allocate GMheap.
To view an instance’s GMHeap usage, especially IPQ usage, select System Operation on the main page of the management portal, then Select System Usage, and then click the Shared Memory Heap Usage link;
To change the size of the common memory heap or GMHEAP (sometimes referred to as the Shared memory Heap or SMH), select System Management, then Configuration, then Additional Settings, and finally Advanced Memory from the portal home page;
Cache query considerations
If you are running a cached SQL query, using %PARALLEL, and you do something to clear the cached query when the query is initialized, then the query may report a
error from a worker job. The typical situation that causes a cached query to be purged is to call $system.sql.purge () or recompile the class referenced by the query. Recompiling the class automatically clears any cached queries associated with the class.
If this error occurs, running the query again may execute successfully. Removing %PARALLEL from the query avoids this error.
SQL statement and plan status
SQL queries using %PARALLEL can produce multiple SQL statements. The planned state of these SQL statements is Unfrozen/Parallel. The query whose schedule status is Frozen or Parallel cannot be frozen by user operations.
Generate a report
You can use the report generation tool to submit query performance reports to InterSystems Worldwide Response Center (WRC) Customer Support for analysis. You can run the generate report tool from the management portal in either of the following ways:
- You must first obtain the WRC tracking number from the WRC. WRC can be contacted from the ADMIN portal using the Contact button at the top of each ADMIN portal page. Enter this trace number in the WRC numbering field. You can use this trace number to report the performance of a single query or multiple queries.
- In the SQL Statement area, enter the query text. An X icon will display in the upper right corner. You can use this icon to clear the SQL statement area. When the query is complete, select the Save query button. The system generates a query plan and collects runtime statistics for the specified query. Regardless of system-wide runtime statistics Settings, the report generation tool always collects module-level statistics using collection option 3: Record queries. Because collecting statistics at this level can take time, it is strongly recommended that you select the Run Save query process in the background check box. By default, this check box is selected.
When a background task starts, the tool displays “Please wait… To disable all fields on the page and display a new view process button. Clicking the View Process button opens the Process Details page in a new TAB. On the process details page, you can view the process and pause, Resume, or terminate the process. The status of the process is reflected on the Save query page. When the process completes, the currently saved query table will be refreshed, the View Process button will disappear, and all fields on the page will be enabled.
- Perform Step 2 for each query.
Each query will be added to the current saved Queries table. Note that this table can contain queries with the same WRC trace number or queries with different trace numbers. After all queries are complete, proceed to Step 4.
For each query listed, you can select the Details link. This link opens a separate page that displays the complete SQL statement, properties (including the WRC tracking number and IRIS software version), and a query plan containing performance statistics for each module.
- To delete individual queries, select the check boxes for those queries from the Currently Saved Queries table, and then click the Clear button.
- To delete all queries associated with the WRC trace number, select a row from the currently saved query table. WRC numbers are displayed in the WRC number area at the top of the page. If you then click the Clear button, all queries for that WRC number will be deleted.
- Use the query check box to select the query to report to WRC. To select all queries associated with a WRC trace number, select a row from the currently saved query table instead of using a check box. In both cases, you can select the Generate Report button. The report generation tool creates an XML file that contains query statements, query plans with runtime statistics, class definitions, and SQL INT files associated with each selected query.
If you select a query associated with a single WRC trace number, the generated file will have a default name, such as WRC12345.xml. If you select a query associated with multiple WRC trace numbers, the generated file will have the default name wrCmultiply.xml.
A dialog box appears asking you to specify where to save the report. Once the report is saved, you can click the Mail to link to send it to WRC customer support. Attach files using the Attach/insert function of the mail client.