Introduction to the Clickhouse table engine

  • Table engine function
    • How is the data stored, where is it written, and where is it read
    • Which queries are supported, and how.
    • Concurrent data access
    • If so, use the index
    • Multithreading supported or not
    • Data replication

Table engine type

  • MergeTree
    • Suitable for high-load, most powerful table engines
    • Isomorphic subsequent background data processing, quickly inserting data, and then applying rules to merge these parts in the background
    • Support for data replication, partitioning, secondary data skip indexes, and other functions not supported by the engine
    • species
      • MergeTree
      • ReplacingMergeTree
      • SummingMergeTree
      • AggregatingMergeTree
      • CollapsingMergeTree
      • VersionedCollapsingMergeTree
      • GraphiteMergeTree
  • Log
    • Lightweight engine with minimal functionality. They are most effective when you need to write many small tables quickly (up to about 1 million rows) and then read them as a whole later.
    • species
      • TinyLog
      • StripeLog
      • Log
  • Integration Engines
    • An engine that communicates with other data storage and processing systems.
    • species
      • Kafka
      • MySQL
      • ODBC
      • JDBC
      • HDFS
      • S3
  • Special Engines (don’t know how to classify, ClickHouse specific)
    • Distributed
    • MaterializedView
    • Dictionary
    • Merge
    • File
    • Null
    • Set
    • Join
    • URL
    • View
    • Memory
    • Buffer

MergeTree series

The MergeTree family of table engines is at the heart of ClickHouse’s data storage capabilities. They provide most of the functionality for recovery capabilities and high-performance data retrieval: column storage, custom partitioning, sparse primary indexes, secondary data skip indexes, and so on.

The basic MergeTree table engine can be considered the default table engine for single-node ClickHouse instances because it is generic and practical for a wide range of use cases.

ReplicatedMergeTree is a good choice for production use because it adds high availability to all the features of the regular MergeTree engine. Another benefit is that deduplication is automatically performed during data ingestion, so if network problems occur during insertion, the software can safely retry.

All other engines in the MergeTree family add additional functionality for specific use cases. Typically, it is implemented in the background as additional data operations.

The main disadvantage of MergeTree engines is that they are quite heavy. So the typical pattern is not so much. If you need many small tables, such as temporary data, consider the log engine family.

MergeTree engine

The same as a parent class with the most basic functions, special needs, can use its subclass: XXXMergeTree

  • Support functions

    • Sort by primary key
    • Partitions can be used if a partitioning key is specified
    • Data replication
    • Support for data sampling
    • Concurrency/index/partition/CRUD
  • Build table statements

    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2) ENGINE = MergeTree()ORDER BY expr[PARTITION BY expr][PRIMARY KEY expr][SAMPLE BY expr][TTL expr 
    [DELETE|TO DISK ' xxx'|TO VOLUME ' xxx' [, ...] ]
    [WHERE conditions] 
    [GROUP BY key_expr [SET v1 = aggr_func(v1) [, v2 =aggr_func(v2) ... ] ] ] ] [SETTINGS name=value. ]Copy the code
  • ENGINE: indicates the name and parameters of the ENGINE.
  • ORDER BY: Sort key
  • PARTITION BY: indicates the PARTITION key
  • PRIMARY KEY – PRIMARY KEY (if it is different from sort KEY)
  • SAMPLE BY – The expression used for sampling.
  • TTL specifies the duration of row storage and a list of rules that define the movement logic of data fragments across disks and volumes.

DELETE | TO DISK ‘XXX’ | TO VOLUME ‘XXX’ specifies the satisfy condition (arrived at specified time) TO perform the action: Remove expired rows, or move the data fragment (if all rows in the data fragment satisfy the expression) TO the specified DISK (TO DISK ‘XXX ‘) or VOLUME (TO VOLUME’ XXX ‘)

  • Settings-additional parameters that control MergeTree behavior

    • Index_granularity – Index granularity.
    • Index_granularity_bytes – index granularity in bytes. Default: 10Mb.
    • Enable_mixed_granularity_parts – Whether to enable index_granularity_bytes to control the index granularity
    • Use_minimalistic_part_header_in_zookeeper – Whether to enable minimum data fragment headers in ZooKeeper
    • Min_merge_bytes_to_use_direct_io – The minimum amount of data required for disk merge operations using direct I/O.
    • Merge_with_ttl_timeout – TTL Specifies the minimum interval between merging frequencies, in seconds. Default value: 86400 (1 day)
    • Write_final_mark – Whether to enable writing the final index mark to the end of the data fragment. Default: 1 (change not recommended)
    • Merge_max_block_size – The maximum number of rows that can be merged in a block. Default value: 8192
    • Storage_policy – Storage policy. See Using a Device with Multiple Blocks for data storage.
    • Min_bytes_for_wide_part,min_rows_for_wide_part Minimum number of bytes/rows that can be stored in Wide format in a data fragment. You can set none, just one, or all of them.
  • example

CREATE TABLE test.tables( `order_id` UInt32, `order_sn` String, `date_added` UInt64 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(toDate(date_added)) ORDER BY order_sn INSERT INTO test.tables (order_id, order_sn, date_added) VALUES(1, '11', 1608963927769); // Force the optimize table test final; SELECT partition, name, active FROM system. Parts WHERE table = 'test'Copy the code
  • Corresponding disk partition directory
  • *.bin Binary compressed data of each field
  • .MRK corresponds to *. The offset of the data block in bin
  • Primary. idx Progressive index (sparse index)
  • Checksum.txt to ensure data correctness and integrity
  • Clouns. TXT Specifies all the fields of the current table
  • Count. TXT Number of rows of all data in the current partition
  • Partition. dat Specifies the partition information
  • Minmax_birthday. Idx Indicates the minimum and maximum data of the partition

ReplacingMergeTree

MergeTree is different in that it removes duplicates with the same sort key value. Deduplication is performed only during data merging. It was still available before the merger.

CollapsingMergeTree

The engine requires that a token column, Sign, be specified in a table sentence. Rows are divided into two categories based on the value of Sign: rows with Sign=1 are called status rows, and rows with Sign=-1 are called cancellation rows. If the primary key is the same, each time the state needs to be added, a status line is written. When the state needs to be deleted, a cancellation line is written. No pairs of rows will be retained. In multithreaded cases, -1 and 1 May be out of order and cannot be deleted

VersionedCollapsingMergeTree

In order to solve the CollapsingMergeTree out-of-order write cases normal folding problem, VersionedCollapsingMergeTree table engine under construction in the predicate sentence added a column Version, for in the case of out-of-order record status line and cancel the corresponding relationship. Rows with the same primary key and opposite Sign will be deleted when Compaction compacts.

SummingMergeTree

When merging pieces of data from the SummingMergeTree table, ClickHouse merges all rows with the same primary key into one row and accumulates according to the specified fields.

AggregatingMergeTree

SummingMergeTree can only be aggregated, this can do all kinds of aggregations, you need to specify the aggregation function

CREATE TABLE agg_table
(   CounterID UInt8,
    StartDate Date,
    UserID AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree() 
PARTITION BY toYYYYMM(StartDate) 
ORDER BY (CounterID, StartDate);
Copy the code

The Log is a class

These engines are developed for scenarios where you need to write many tables with small data volumes (less than a million rows).

The common features of several Log table engines are:

  • Data is written to disk sequentially append;
  • Delete and UPDATE are not supported.
  • Index is not supported;
  • Atomic writing is not supported;
  • Insert blocks select operations.

TinyLog, the engine is the simplest in the series and provides the least functionality and the lowest performance. The TinyLog engine does not support parallel reads and concurrent data access, and stores each column in a different file.

Log and StripLog supports concurrent reading. Each thread processes a separate block of data. The Log engine uses a separate file for each column, and Stripe logs are stored in a single file. The Log engine is relatively efficient

StripeLog

This engine is used in scenarios where many tables with small data volumes (less than a million rows) are written. The StripeLog engine does not support ALTER UPDATE and ALTER DELETE operations. The return order of select is unpredictable

Log

These tags are written on each data block and contain offsets that indicate where to start reading the file to skip the specified number of lines. A write operation blocks reads and other writes. , index is not supported. Similarly, if writing to a table fails, the table will be corrupted, and reading from the table will return an error. The Log engine is suitable for temporary data, write-once tables, and test or demonstration purposes.

TinyLog

The simplest table engine for storing data on disk. Each column is stored in a separate compressed file. When written, data is appended to the end of the file.

Integration series

The system table engine is primarily used to import external data into ClickHouse, or directly manipulate external data sources in ClickHouse.

  • Kafka: Import data from Kafka Topic directly into ClickHouse;
  • MySQL: Use MySQL as the storage engine to select MySQL tables directly from ClickHouse;
  • JDBC/ODBC: Reads the data source through the specified JDBC and ODBC connection string.
  • HDFS: Directly reads data files in specific formats on the HDFS.

Special series

Distributed Table Engine

  • A distributed engine does not store data itself, but can perform distributed queries across multiple servers. The reads are automatically parallel
  • At read time, the index of the remote server table, if any, is used.
  • Distributed engine parameters: cluster name, remote database name, remote table name, data sharding key (optional) in the server configuration file. ENGINE =Distributed(cluster, db, table[, sharding_key])

Merge

The Merge engine (not to be confused with the MergeTree engine) does not store data itself, but can be used to read data from any number of other tables simultaneously. The reads are automatically parallel and write is not supported.

Merge(hits, ‘^WatchLog’)

Data is read from the hits database from tables whose names match the regular ‘^WatchLog’.

File table Engine

The File table engine stores data in a File in a supported File format (TabSeparated, Native, etc.). Read data files generated by other systems

Null Table Engine

Data written to Nulltable is ignored and the return value is null

Set Table Engine

Data sets are always in RAM and cannot be repeated

URL Table Engine

Save data on HTTP/HTTPS, much like FIle Engine

View Table Engine

Used to build views (see CREATE VIEW query for more information). It stores no data, only the specified SELECT query.

Memory Table Engine

The Memory engine stores data in RAM in an uncompressed form. The data is stored entirely as it was obtained at the time of reading. Read and write operations do not block each other. Index is not supported. Queries are parallelized. Maximum rates (over 10 GB/SEC) are achieved on simple queries because there is no disk read and no need to decompress or deserialize the data. When the server is restarted, the data in the table disappears and the table becomes empty.

Buffer Table Engine

Buffered data is written to RAM and periodically flushed to another table. During a read operation, data is read from both the buffer and another table.

Clickhouse database engine

The default is to use the Atomic database engine

  • MySQL
  • MaterializeMySQL
  • Lazy
  • Atomic
  • PostgreSQL

MySQL

The MySQL engine is used to map tables from a remote MySQL server to ClickHouse and allows you to perform INSERT and SELECT queries against the tables to facilitate data exchange between ClickHouse and MySQL.

Can’t do

  • RENAME
  • CREATE TABLE
  • ALTER

MaterializeMySQL

Map tables from the MySQL server to ClickHouse, treating the ClickHouse service as a MySQL copy that reads Binlog and performs DDL and DML requests. Realized the real-time synchronization function of business database based on MySQL Binlog mechanism.

CREATE DATABASE mysql ENGINE = MaterializeMySQL('localhost:3306', 'db', 'user', '***');
SHOW TABLES FROM mysql;
Copy the code

Lazy

The number of seconds for which the data was last accessed is kept in memory and can only be used with the Log table

CREATE DATABASE testlazy ENGINE = Lazy(expiration_time_in_seconds);
Copy the code

Atomic

It supports non-blocking DROP and RENAME TABLE queries as well as atomic exchange TABLE T1 and T2 queries. The atomic database engine is used by default.

CREATE DATABASE test ENGINE = Atomic;
Copy the code

PostgreSQL

Connects to a remote PostgreSQL database and provides read and write operations

CREATE DATABASE test_database 
ENGINE = PostgreSQL('postgres1:5432', 'test_database', 'postgres', 'mysecretpassword', 1);
Copy the code

Resources Website