What is a ClickHouse?

ClickHouse: Is a column database management system (DBMS) for online analytics (OLAP)

Let’s get some basic concepts straight

  • OLTP: is a traditional relational database. It mainly operates adding, deleting, modifying, and searching, and emphasizes transaction consistency, such as in banking systems and e-commerce systems
  • OLAP: warehouse database, mainly reads data, does complex data analysis, focuses on technical decision support, provides intuitive and simple results

Next, let’s use a diagram to understand the difference between a column database and a row database

In traditional row database systems (MySQL, Postgres, and MS SQL Server), data is stored in the following order:In a column database system (ClickHouse), data is stored in the following order:

Comparison between the two storage methods:

The above is a basic introduction to ClickHouse, more can be found in the official manual

Second, business issues

The large table with 50 million data volume and two auxiliary tables are stored in Mysql on the business end. The cost of a single linked table query is 3min+ and the execution efficiency is very low. After optimizing indexes, leveling tables, and logic, the results were low, so ClickHouse was used to solve the problem

Finally, through optimization, the query time is reduced to 1s, query efficiency is improved 200 times!

Hope that through this article, can help you quickly master this sharp tool, and in practice to avoid detours.

ClickHouse practice

1. Install Clickhouse on a Mac

I was installed via Docker and view the tutorial. You can also download CK compilation and installation, relatively troublesome.

2. Data migration: Mysql to ClickHouse

ClickHouse supports most of the syntax of Mysql and has a low migration cost. There are currently five migration options:

  • Create table engin mysql, mapping scheme data is still in mysql
  • Insert into select FROM
  • Create table as select from, and import the table at the same time
  • CSV Offline import
  • streamsets

Choose the third option for data migration:

CREATE TABLE [IF NOT EXISTS] [db.]table_name ENGINE = Mergetree AS SELECT * FROM mysql('host:port', 'db', 'database', 'user', 'password')

3. Performance test comparison
type The amount of data Table size Query speed
Mysql 50 million 10G 205s
ClickHouse 50 million 600MB Within 1 s
4. Data synchronization scheme

A temporary table Photo source:ctripCreate a Temp intermediate table to fully synchronize Mysql data to the Temp table in the ClickHouse, and then replace the tables in the original ClickHouse. This applies to scenarios where the amount of data is moderate and the number of increments and variables is frequent

synch

Open source synchronization software recommended: Synch works by obtaining SQL statements from Mysql binlog logs and consuming tasks through message queues

5. Why is ClickHouse fast?
  • Only the column data to be calculated is read instead of the whole row data, which reduces the I/O cost
  • Same column, same type, with a ten-fold compression boost, further reducing IO
  • Clickhouse does personalized search algorithms for different storage scenarios

Four, encountered pit

1. Data type difference between ClickHouse and mysql

Mysql > select * from ‘Mysql’ where ‘error’; The solution: LEFT JOIN B B ON toUInt32(H. ID) = toUInt32(ec.post_id)

2. The deletion or update is performed asynchronously to ensure final consistency

A search of the CK manual reveals that even Mergetree, which has the best data consistency support, only guarantees final consistency:If you have high requirements on data consistency, you are advised to perform full synchronization

Five, the summary

ClickHouse is a great solution to the Mysql query bottleneck. 90% of queries under 2 billion rows can be returned within 1s. As the volume of data increases, ClickHouse also supports clustering.

References:

ClickHouse official manual CK

ClickHouse in Ctrip hotel app CAI Yueyi

How to select Roin123 for ClickHouse engine