Author: LemonNan
The original address: mp.weixin.qq.com/s/SUUHF9R_F…
Note: the author and original address should be indicated
introduce
Clickhouse itself is an analytical database that provides many synchronization solutions with other components. This article will use Kafka as a data source to show how to synchronize Kafka data to Clickhouse.
The flow chart
Without further ado, let’s start with a data synchronization flow chart
Build table
Before data synchronization, we need to create the corresponding ClickHouse table. According to the above flowchart, we need to create three tables:
1. Data sheets
Table 2. Kafka engine
3. Materialized view
The data table
CREATE DATABASE IF NOT EXISTS data_sync; CREATE TABLE IF NOT EXISTS data_sync.test (name String DEFAULT 'lemonNan' COMMENT 'name ', Age int DEFAULT 18 COMMENT 'age ', gongzhonghao String DEFAULT 'lemonCode' COMMENT' gongzhonghao ', my_time DateTime64(3, 'UTC') COMMENT 'time ') ENGINE = ReplacingMergeTree() PARTITION BY toYYYYMM(my_time) ORDER BY my_timeCopy the code
Engine table
# create kafka engine table, address: 172.16.16.4, topic: lemonCode CREATE TABLE IF NOT EXISTS data_sync.test_queue( name String, age int, gongzhonghao String, My_time DateTime64(3, 'UTC') ENGINE = Kafka SETTINGS Kafka_broker_list = '172.16.16.4:9092', kafka_topic_list = 'lemonCode', kafka_group_name = 'lemonNan', kafka_format = 'JSONEachRow', kafka_row_delimiter = '\n', kafka_schema = '', kafka_num_consumers = 1Copy the code
Materialized views
CREATE MATERIALIZED VIEW IF NOT EXISTS test_mv TO test AS SELECT name, age, gongzhonghao, my_time FROM test_queue;Copy the code
The data simulation
Here’s how to start the simulation flowchart. You can skip the installation step if you have Kafka installed.
Install the kafka
Kafka is installed on a single machine for demonstration purposes
Wurstmeister /zookeeper docker run -d --name zookeeper -p 2181 The IP address of KAFKA_ADVERTISED_LISTENERS is machine IP docker run -d --name kafka -P 9092:9092 -e KAFKA_BROKER_ID= 0-e KAFKA_ZOOKEEPER_CONNECT = zookeeper: 2181 - link zookeeper - e KAFKA_ADVERTISED_LISTENERS = PLAINTEXT: / / 172.16.16.4:9092 - e KAFKA_LISTENERS = PLAINTEXT: / / 0.0.0.0-9092 - t wurstmeister/kafkaCopy the code
Use the kafka command to send data
# Start the producer, Kafka-console-producer. sh --bootstrap-server 172.16.16.4:9092 -- Topic lemonCode # send the following message {" name ":" lemonNan ", "age" : 20, "gongzhonghao" : "lemonCode", "my_time" : "the 2022-03-06 18:00:00. 001"} {" name ":" lemonNan ", "age" : 20, "gongzhonghao" : "lemonCode", "my_time" : "the 2022-03-06 18:00:00. 001"} {" name ":" lemonNan ", "age" : 20, "gongzhonghao" : "lemonCode", "my_time" : "the 2022-03-06 18:00:00. 002"} {"name":"lemonNan","age":20,"gongzhonghao":"lemonCode","my_time":"2022-03-06 23; 59:59. 002 "}Copy the code
View the Clickhouse data sheet
select * from test;
Copy the code
At this point, the data has been synchronized from Kafka to Clickhouse, which is, well, convenient.
About Data Copies
The table engine used here is ReplicateMergeTree, and one of the reasons for using ReplicateMergeTree is to make multiple copies of the data and reduce the risk of data loss. With ReplicateMergeTree, Data is automatically synchronized to other nodes in the same shard.
In practice, there is another way to synchronize data by using different Kafka consumer groups for data consumption.
See the following figure for details:
Copy Scheme 1
The synchronization mechanism of ReplicateMergeTree synchronizes data to other nodes in the same fragment, occupying resources of consuming nodes during synchronization.
Duplicate Scheme 2
Messages are broadcast to multiple Clickhouse nodes through Kafka’s own consumption mechanism, and data synchronization takes no additional Clickhouse resources.
Areas of attention
What you might need to know about the setup process
- This article appears
172.16.16.4
Is the internal IP address of the machine - Generally, engine lists end in queue, materialized views end in MV, which is more recognizable
conclusion
This article describes how to sync data from Kafka to Clickhouse and multiple replicas. Clickhouse also provides many other integration solutions, including Hive, MongoDB, S3, SQLite, Kafka, and more. See the links below.
The integration of table engine: clickhouse.com/docs/zh/eng…
The last
Please scan the qr code below or search LemonCode to exchange and learn together!