Abstract: If the intermediate result set of the query is too large, the temporary data file generated by the disk falls, this paper provides two schemes to limit the amount of the disk data of the temporary data file, so as to prevent the normal business operation.

This article is shared in how to limit the number of temporary data files GaussDB(DWS) drives. The author is wangxiaojuan8.

Among some of the SQL statement, there will be a result set is too large, in storage, need to fall plate to the CRT (such as the existence of large amount of data gathered, such as operation, lead to gather in the middle of the operation result set in the memory can’t put when footwall), and trading generated temporary data file takes up too much room, will affect the normal data to the business can’t perform, The disk can only be read-only.

In the above scenario, you can control the number of disks that can be dropped in the result set during user execution in either of the following ways. If the number exceeds the limit, an error message is displayed and the execution of the statement is terminated to prevent the temporary data file from occupying too much space:

1. Solution 1: Set the limit of the amount of data that can be dropped from temporary files for each thread

2. Scheme 2: Set space limit of intermediate result colony disk for users

Scheme 1: Set the limit of data amount of temporary file drop disk for each thread

The GUC parameter temp_file_limit can be set to limit the amount of temporary file drop data per thread. Temp_file_limit Specifies the SUSET type. The value is an integer, expressed in KB. -1 indicates that there is no limit. Default value: -1.

1. How do I set temp_file_limit

You can use the gs_GUC tool to perform global Settings as follows:

gs_guc reload -Z coordinator -Z datanode -N all -Iall -c “temp_file_limit = 1024”

2. Formula for calculating the temp_file_limit value

You can roughly calculate the value of a temp_file_limit by using the following formula: temp_file_limit = Expected total number of drives/number of concurrent drives

The total amount of footwall can generally be set to 20% of the available space, which can be adjusted according to the user’s acceptability. The number of concurrent disk threads is the number of concurrent query threads that generate intermediate temporary data disk during service running. As the amount of data stored in the database increases, the value of temp_FILe_limit is adjusted accordingly.

Note: This parameter limits the amount of temporary file drop data per thread. If a query has multiple threads and the amount of drop data from a single thread exceeds this parameter, the Query will exit with an error. If each thread does not exceed the limit but the amount of data deposited by multiple threads exceeds the limit, an error is not reported.

Example 3.

Take the CUSTOMer_Demographics table in the TPC-DS1X data for example. SQL query is not pushed, the intermediate result set only falls on THE CN disk

postgres=# show temp_file_limit; temp_file_limit ----------------- 1MB (1 row) postgres=# set enable_stream_operator=off; SET postgres=# explain select * from customer_demographics c1, customer_demographics c2 order by c1.cd_demo_sk; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ ------ id | operation | E-rows | E-width | E-costs ----+--------------------------------------------------------------------------+---------------+---------+-------------- 3689472640000-1 | - > Sort | | 112 | 2 | - > 2329492473090.72 Nested Loop (3, 4) | | 3689472640000 | 112 36894726400.00 3 | - > Data Node Scan on customer_demographics "_REMOTE_TABLE_QUERY_" 56 | | 1920800 | | 4-0.00 > the Data Node Scan on customer_demographics "_REMOTE_TABLE_QUERY_" | | 1920800 | 56 # 0.00 (6 rows) postgres = select * from customer_demographics c1, customer_demographics c2 order by c1.cd_demo_sk; ERROR: temporary file size exceeds temp_file_limit (1024kB)Copy the code

Scheme 2: Set space limit of intermediate result colony disk for users

1. How to set the space limit of user intermediate result colony disk

There are two ways to set the space quota of the user’s intermediate result colony:

  1. Specify SPILLSPACE by creating USER, and set the quota of intermediate result colonies for new users

The CREATE USER user_name… SPILLSPACE ‘spillspacelimit’;

  1. ALTER USER Specifies SPILLSPACE to ALTER the space quota of an existing USER’s intermediate result colony disk

ALTER USER user_name… SPILLSPACE ‘spillspacelimit’;

For example, CREATE USER U1 PASSWORD ‘abcd@1234’ SPILL SPACE ‘unlimited’; Create a user and set the intermediate result colony quota to unlimited

ALTER USER u1 SPILL SPACE ‘1G’; — Change the quota of the intermediate result cluster of user U1 to 1G

Description:

  1. This setting takes effect on all nodes. That is, if the sum of the data amount of CN and ALL DN in an SQL SQL cluster exceeds the limit, the statement will terminate with an error.

  2. When the intermediate result is dropped, the amount of data dropped from the temporary file of the user increases accordingly. When a temporary file is deleted, the amount of data stored in the temporary file decreases.

  3. This setting is user-level and accumulates the amount of data in each Query intermediate result tray if the same user runs multiple Queries concurrently.

Note:

For the above Settings to take effect, you need to set the GUC parameter enable_perM_space to ON.

If multiple users are going to do a lot of intermediate result set downsizing, you need to set it up for each user involved.

Example 2.

Example 1: The intermediate result set falls on both CN and DN, and the total amount of data falls exceeds 1G

postgres=# create user u1 password 'abcd@1234';
CREATE USER
postgres=# grant select on customer_demographics to u1;
GRANT
postgres=# alter user u1 spill space '1G';
ALTER USER
postgres=# alter session set session authorization u1 password 'abcd@1234';
SET
postgres=> select * from customer_demographics c1, customer_demographics c2 order by c1.cd_demo_sk;
ERROR:  spill space is out of user's spill space limit
Copy the code

Example 2: THE SQL query is not pushed, and the intermediate result set is disks only on CN

postgres=# set enable_stream_operator=off; SET postgres=# alter session set session authorization u1 password 'abcd@1234'; SET postgres=> explain select * from customer_demographics c1, customer_demographics c2 order by c1.cd_demo_sk; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ ------ id | operation | E-rows | E-width | E-costs ----+--------------------------------------------------------------------------+---------------+---------+-------------- 3689472640000-1 | - > Sort | | 112 | 2 | - > 2329492473090.72 Nested Loop (3, 4) | | 3689472640000 | 112 36894726400.00 3 | - > Data Node Scan on customer_demographics "_REMOTE_TABLE_QUERY_" 56 | | 1920800 | | 4-0.00 > the Data The Node Scan on customer_demographics "_REMOTE_TABLE_QUERY_" | | 1920800 | 0.00.00 (6 rows) postgres = > select * from customer_demographics c1, customer_demographics c2 order by c1.cd_demo_sk; ERROR: spill space is out of user's spill space limitCopy the code

conclusion

The first scheme focuses on limiting the amount of temporary files on each thread, while the second scheme focuses on limiting the amount of temporary files on users. More appropriate parameters and parameter Settings should be selected based on the purpose of services to avoid excessive amount of temporary files affecting normal services.

Click to follow, the first time to learn about Huawei cloud fresh technology ~