In a PostgreSql database, tables with btree indexes are added. The data insertion speed is very slow on a basis of 180 million data. The insertion performance deteriorates rapidly as the height of the index tree becomes higher and higher.
This article uses PostgreSql table partitioning to optimize table partitioning.
Compiled by Visual C++ build 1914, 64-bit
Delete primary tables, child tables, and sequences
DROP TABLE IF EXISTS test_table CASCADE ;
DROP SEQUENCE IF EXISTS test_table_seq;
Copy the code
Create a primary table and sequence. There is no need to add indexes to the primary table. The primary table does not store actual data
CREATE SEQUENCE test_table_seq MINVALUE 1 MAXVALUE 999999999999;
CREATE TABLE test_table (
"id" int8 NOT NULL DEFAULT nextval('test_table_seq'::regclass),
"data_id" numeric,
"logtime" numeric,
"data_value" numeric
);
Copy the code
Logtime is a millisecond timestamp. The table is partitioned according to logTime, converted to month, and divided into different tables by month
Create the primary table trigger function
CREATE OR REPLACE FUNCTION test_table_trigger()
RETURNS TRIGGER AS $$
Declare
now_data VARCHAR;
test_table_name VARCHAR;
sql_text VARCHAR;
BEGIN
now_data = substring(cast(to_timestamp(NEW.logtime/1000) as varchar),1.10);
now_data = REPLACE(now_data, The '-'.'_');
test_table_name = 'test_table_' || SUBSTRING(now_data,7);
sql_text = 'INSERT INTO ' || test_table_name || ' VALUES ( $1.* )';
EXECUTE sql_text USING NEW;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
Copy the code
The primary table sets the trigger
CREATE TRIGGER test_table_trigger
BEFORE INSERT ON test_table
FOR EACH ROW EXECUTE PROCEDURE test_table_trigger();
Copy the code
Create sub-table create function, batch generate all sub-tables by year, add check to the sub-table to check the logTime range and inherit the primary table, then add index to the sub-table
Create or replace function creat_test_table(table_year integer) RETURNS BOOLEAN as
$BODY$
Declare
test_table_name VARCHAR;
test_table_index_name VARCHAR;
start_time_text VARCHAR;
end_time numeric;
start_time numeric;
time1 VARCHAR;
time2 VARCHAR;
sql_text VARCHAR;
row_result RECORD;
BEGIN
start_time_text = table_year || '- 01-01';
start_time = EXTRACT(epoch FROM CAST(start_time_text AS TIMESTAMP)) * 1000 - 28800000.;
time1 = table_year || '- 02-01';
time2 = (table_year+1) || '- 01-01';
sql_text = 'select EXTRACT(epoch FROM cast(generate_series as TIMESTAMP))*1000-28800000 as bb from generate_series(to_date('||quote_literal(time1)||', '||quote_literal('yyyy-mm-dd')||'),to_date('||quote_literal(time2)||', '||quote_literal('yyyy-mm-dd')||'), '||quote_literal('1 month')||' ::interval)';
FOR row_result IN EXECUTE sql_text LOOP
end_time = row_result.bb;
test_table_name = 'test_table_'||SUBSTRING(replace(to_char(to_timestamp(start_time/1000), 'yyyy-mm-dd'), The '-'.'_'),0.8);
raise notice 'creat table %',test_table_name;
IF (select count(*) from pg_class where relname = test_table_name) ! =0 THEN
CONTINUE;
end if;
test_table_index_name = test_table_name || '_idx';
sql_text = 'CREATE Table '||test_table_name||' (CHECK(logtime>= ' || start_time || ' and logtime< ' ||end_time || ')) INHERITS(test_table); ';
EXECUTE sql_text;
sql_text = 'CREATE UNIQUE INDEX '||test_table_index_name||' ON ' ||test_table_name||' USING btree ( "data_id" ASC , "logtime" ASC ); ';
EXECUTE sql_text;
start_time = end_time;
END LOOP;
RETURN TRUE;
END
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
Copy the code
Enable constraint exclusion and maintain partitions
set constraint_exclusion = on;
set enable_partition_pruning = on;
Copy the code
Run the subtable creation function to create all subtables for 2021
SELECT creat_test_table(2021);
Copy the code
Thirteen tables and two functions appear after execution
Execute insert statement to test 1630857600000 as 2021-09-06 00:00:00
INSERT INTO test_table(data_id, logtime, data_value) VALUES (1.1630857600000.1);
Copy the code
This record would normally be inserted into the test_TABLE_2021_09 table
Verify test_table_2021_09 table
SELECT * FROM test_table_2021_09;
Copy the code
The table partition can be inserted properly
Since we set the trigger, the where clause containing the logTime condition should be optimized by the optimizer at query time
Let’s start validating the query statement by writing a query statement that contains the logTime condition in the WHERE clause to see its query plan
EXPLAIN SELECT * FROM test_table where logtime = 1630857600000;
Copy the code
Knowing that the query has been optimized, he matches which subtables to retrieve data from, rather than scanning all the subtables
Compare a query plan that scans all subtables. The query statement does not contain the logTime condition and therefore cannot be optimized by the optimizer
EXPLAIN SELECT * FROM test_table;
Copy the code
A PG database table partition belongs to a physical table and cannot be logically separated. You only need to perform operations on the main table to manage all data
Matters needing attention:
- When a conditional statement contains logtime, do not write an expression such as logtime/1000>1630857600000. This may not be optimized by the optimizer, and all sub-tables will be scanned during the query. When the number of tables increases, the query efficiency will be seriously affected.
- Do not write complex expressions in check to create child tables, preferably in the simplest format (field name conditional value), which contains calculations, or other expressions that will make optimizer optimization ineffective and thus scan all child tables during query, affecting query efficiency.
- There is no need to add indexes to the primary table because there is no data in the primary table.
- Checks in the primary table are inherited from the quilt table.
- You can create a child table by adding logic to the trigger that checks for the existence of the table name that you want to insert into the child table. Or write stored procedures that generate child tables uniformly at a fixed time.