This is the 7th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.
For the scenario where the details are not queried but aggregated by dimension. If only ordinary MergeTree is used, the overhead of both storage space and temporary aggregation during query is quite large.
ClickHouse provides a “pre-aggregation” engine called SummingMergeTree for this scenario.
Case presentation
Create a table
create table order_table5
(
id UInt32,
item_id String,
total_amount Decimal(16.2),
create_time Datetime
) engine = SummingMergeTree(total_amount)
partition by toYYYYMMDD(create_time) primary key (id)
order by (id,item_id );
Copy the code
Insert data
insert into order_table5
values (101.'s_001'.1000.00.'the 2020-06-01 12:00:00'),
(102.'s_002'.2000.00.'the 2020-06-01 11:00:00'),
(102.'s_004'.2500.00.'the 2020-06-01 12:00:00'),
(102.'s_002'.2000.00.'the 2020-06-01 13:00:00'),
(102.'s_002'.12000.00.'the 2020-06-01 13:00:00'),
(102.'s_002'.600.00.'the 2020-06-02 12:00:00');
Copy the code
Perform the first query
Manually merge
OPTIMIZE TABLE order_table5 FINAL;
Copy the code
The query again
conclusion
◼ uses the columns specified in SummingMergeTree () as the summary data columns
◼ you can fill in multiple mandatory numeric columns. If you do not fill in, all non-dimensional and numeric columns are used as summary data columns
◼ uses the column of order by as the dimension column
◼ The other columns retain the first row in insertion order
◼ Data that is not in a partition is not aggregated
Development Suggestions
For aggregate tables, unique keys and serial numbers can be removed, and all fields are dimensions, measures, or timestamps.
The problem
Q: Can you execute the following SQL directly to get the total value:
select total_amount from order_table5 where item_id=' 'and create_date='XXX'Copy the code
A: No, it may contain some temporary details that haven’t been aggregated yet
If you want to get summary values, you still need to aggregate them using sum, which is a bit more efficient, but ClickHouse itself is stored in columns, so the efficiency gains are limited and not particularly significant.
select sum(total_amount) from order_table5 where item_id=' 'and create_date='XXX'Copy the code