Content Outline:

  1. Background;
  2. Clickhouse introduction;
  3. Clickhouse architecture and performance;
  4. Clickhouse practices in the good future;
  5. Construction and planning;
  6. Refs.

background

In the early days of log center development, log search and analysis was mainly implemented based on ElasticSearch. As more and more log center services were accessed and the amount of data increased, the requirements for log analysis and alarm monitoring became more and more complex, and it was difficult to use ElasticSearch to meet the requirements. Therefore, you need to choose the appropriate database according to the requirements of the scenario. What we need:

  • The amount of data will be large, so it needs to be distributed;
  • Support real-time writing, support fast calculation, in a short time to complete the calculation;
  • Strong SQL ability, SQL real-time indicators;
  • Limited manpower, operation and maintenance needs to be simple;
  • Efficient compression ratio storage, server limited, can use fewer servers to store more data;

Based on these characteristics, we chose Clickhouse, and the following sections describe Clickhouse’s features, system architecture, and usage.

Clickhouse introduction

1. Clickhouse features

Figure 2-1 Clickhouse features diagram

As you can see, Clickhouse features are exactly what we need. The following details the core features:

1) Complete DBMS functions:

ClickHouse is fully managed, so it is more of a DBMS than a Database.

As a DBMS, it has some basic functions, such as:

  • DDL (Data Definition Language) : Databases, tables, and views can be created, modified, or deleted dynamically without restarting services;
  • DML (Data Manipulation Language) : Can dynamically query, insert, modify or delete data;
  • Permission control: You can set operation permissions on databases or tables based on user granularity to ensure data security.
  • Data backup and recovery: Provides a backup, export, import and recovery mechanism to meet the requirements of the production environment.
  • Distributed management: Provides the cluster mode to automatically manage multiple database nodes.

2) Column storage and data compression

Column storage and data compression are essential features for a high-performance database. The simplest and most effective way to make queries faster is to reduce the size of the data scanned and the data transferred, and column storage and data compression can help with both. Because Clickhouse is truly column storage, with each column under a different file, the file data type is consistent and can be compressed more efficiently.

3) Vectorization execution engine

Vectorization is performed on the premise of column storage, and the main idea is to read a batch of columns from disk at a time, organized as an array. Each time next processes the column array through the for loop. Doing so would significantly reduce the number of calls to Next. The corresponding CPU utilization is improved, and the data is organized together.

The features of CPU hardware, such as SIMD, can be further used to load all data into the CACHE of CPU to improve the cache hit ratio and improve efficiency. With the dual optimization of column storage and vectorization execution engines, the speed of query execution will take a huge leap forward.

4) Relational model and SQL query

ClickHouse is a relational database. It can support almost 90% of SQL as query statements, such as group by, order by, etc.

5) Diversified table engines

ClickHouse, like mysql, abstracts the storage part, using the storage engine as a separate layer of interface. Clickhouse implements a wide variety of table engines, such as Mergetree, Log, and Memory, each of which has its own unique characteristics and allows you to choose the right one for your business scenario.

6) Multi-threading and Distributed ClickHouse has almost all the typical characteristics of a modern high-performance database, running out of ways to improve performance, especially with widely used technologies such as multi-threading and distributed ClickHouse.

7) Multi-master architecture

Distributed systems such as HDFS, Spark, HBase, and Elasticsearch all use the master-slave Master/Slave architecture. A managed node acts as the Leader to coordinate the whole system. ClickHouse, on the other hand, is a multi-master architecture because of its cluster architecture, which is different from other databases.

8) Online inquiry

ClickHouse uses an LSM tree structure, which allows for a large number of ClickHouse inserts. Clickhouse is also optimized to respond quickly to complex queries without any data preprocessing. Achieve the effect of real-time counting warehouse

9) Data fragmentation and distributed query

Clickhouse has distributed capabilities and naturally supports data sharding, which horizontally shards data. This is an effective way to solve storage and query bottlenecks in a massive data environment. ClickHouse does not have the highly automated sharding capabilities of other distributed systems. ClickHouse provides the concept of Local and Distributed tables. A local table is equivalent to a shard of data. The distributed table itself does not store any data, it is the local table access proxy, its role is similar to the branch library middleware. With distributed tables, you can broker access to multiple data shards for distributed queries.

2. C**** Lickhouse Common application scenarios

  • Telecom industry for data storage and statistical data use;
  • Sina Weibo is used for user behavior data recording and analysis;
  • User behavior analysis for AD networks and RTB, e-commerce;
  • Log analysis;
  • Detection and remote sensing information mining;
  • Business intelligence;
  • Data processing and value data analysis of online games and the Internet of Things;
  • The biggest application comes from Yandex’s statistical analysis service, Yandex.metri ca.

Clickhouse architecture and performance

Clickhouse’s cluster architecture is different from other data clusters in that its cluster capabilities are table-level, whereas familiar big data architectures such as the Hadoop family are service-level clusters. For example, in an HDFS cluster, all files are sliced and backed up. In a Clickhouse cluster, tables can be built on their own, meaning Clickhouse can live on a single node. Those with other big data experience may feel a little strange about this design, which will be introduced in detail from single machine architecture to cluster architecture.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

1. Clickhouse single-machine architecture design

There is a lack of official information about the Clickhouse architecture. Based on the existing experience and external data, I restore the Clickhouse architecture according to my own understanding as follows:



Figure 3-1 ClickHouse single-server architecture diagram

1) Parser and Interpreter

Parser and Interpreter are two very important sets of interfaces: The Parser recursively forms SQL statements into AST syntax trees, and different types of SQL call different Parse implementation classes. The Interpreter, in turn, interprets the AST and further creates the execution pipeline for the query. The Interpreter Interpreter acts like the Service Service layer by concatenating the entire query process, aggregating the resources it needs based on the type of Interpreter. First it parses the AST object; Then perform “business logic” (such as branch judgment, setting parameters, calling interfaces, etc.); The IBlock object is eventually returned, setting up a query execution pipeline in the form of a thread.

2) Table engine

The table engine is a notable feature of ClickHouse and, as mentioned above, ClickHouse has many table engines. Different table engines are implemented by different subclasses. The table engine uses the IStorage interface, which defines DDL (such as ALTER, RENAME, OPTIMIZE, and DROP), read, and write methods that define, query, and write data, respectively.

3) the DataType

The serialization and deserialization of data is handled by DataType. The IDataType interface will have different implementation classes depending on the data type. DataType serializes data forward and backward, but it does not interact directly with memory or disk. Instead, DataType forwards data to Column and Filed processing.

4) Column and Field

Column and Field are the most basic mapping units for ClickHouse data. As a 100 percent Column storage database, ClickHouse stores data in columns, with a Column in memory represented by a Column object. Column object is divided into two parts: interface and implementation. IColumn interface object defines methods to perform various relational operations on data, such as insertRangeFrom and insertFrom methods for inserting data, cut for paging, and filter method for filtering. The concrete implementation objects of these methods are implemented by corresponding objects according to different data types, such as ColumnString, ColumnArray, and ColumnTuple. In most cases, ClickHouse manipulates data as a whole column, but there are exceptions. If you need to manipulate a single concrete value (that is, a row of data in a single column), you need to use the Field object, which represents a single value. In contrast to the generic design of Column objects, Field objects use an aggregated design pattern. The Field object aggregates 13 data types, including Null, UInt64, String, and Array, and their corresponding processing logic.

5) Block

Data manipulation within ClickHouse is directed towards Block objects and is in the form of a stream. Although Column and Filed form the basic mapping unit of data, they still lack some necessary information, such as data type and Column name, for actual operations. ClickHouse then designs Block objects, which can be thought of as a subset of a data table. The essence of a Block object is a triplet of data objects, data types, and Column names, namely Column, DataType, and Column name strings. Column provides the ability to read data, while DataType knows how to serialize and reverse data, so Block further abstracts and encapsulates these objects, simplifying the whole process of using them. A series of data operations can be performed only with Block objects. In its implementation, the Block does not aggregate Column and DataType objects directly, but references them indirectly through the ColumnWith TypeAndName object.

2. Clickhouse cluster architecture design

Clickhouse is a cluster that is managed by configuring Clickhouse_remote_Servers. In the configuration, you can configure the cluster name, node information required by the cluster, through which you can configure the sharding and copy mechanism.

The simple configuration is as follows:

<yandex>
 <clickhouse_remote_servers>
 <cluster1>
 <shard>
 <internal_replication>true</internal_replication>
 <replica>
 <host>clickhouse-node1</host>
 <port>9000</port>
 </replica>
 <replica>
 <host>clickhouse-node2</host>
 <port>9001</port>
 </replica>
 </shard>
 <shard>
 <internal_replication>true</internal_replication>
 <replica>
 <host>clickhouse-node3</host>
 <port>9000</port>
 </replica>
 <replica>
 <host>clickhouse-node4</host>
 <port>9001</port>
 </replica>
 </shard>
 ...
 </cluster1>
 ...
 </clickhouse_remote_servers>
 ...
</yandex>
Copy the code

Replicated_MergeTree+Distributed is a “local table +Distributed table” engine that allows for multiple shard and multiple replicas. The Replicated_MergeTree and Distributed engines are described in detail.

1) Replicated*MergeTree engine

Let’s start with the MergeTree engine, which is at the heart of Clickhouse’s data store. MergeTree engine is based on MergeTree in the extension of a number of functional engines, including support for ReplacingMergeTree, SummingMergeTree and other MergeTree family engines, details can be seen on the official website MergeTree engine introduction, MergeTree engines without Replication can be considered stand-alone engines, that is, they exist on a single node.

To use Replicated_MergeTree, the MergeTree engine data is adjusted by Zookeeper to achieve the effect of replicas. For example, you can create Replicated_MergeTr ee table on each cluster1 node. From the configuration file, you can see that clickhouse-node1 and Clickho use-node2 reside in the same shard. The replica in each shard tag represents the replica node. At this time, when we create the table, we specify the two copies in the same zo Okeeper directory, so the data written to Node1 will be copied to Node2, and the data written to Node2 will be synchronized to Node1, achieving the expected replication effect.

At this point, the local table on each node is complete, but the following Distributed engine is required to aggregate data from multiple shards.

**2) **Distributed engine

Tables using the Distributed engine do not store any data themselves, but allow Distributed query processing across multiple servers, with readings automatically parallel. During reads, the table index on the remote server (the Replicate D *MergeTree engine we used above) is used.

If the Distributed table is used, data can be read and written from different shards based on the sharding scheme configured in the configuration file. For example, when we read data through the Distributed engine table, it will read the data of each fragment in the cluster for summary calculation. Note that this will be a deep paging, some SQL can be dispersed first performed on each node in a query again do result aggregation node, while others can’t do the aggregation, all data must be synchronized to the query node, the unified query node, this kind of situation will need according to the specific situation is optimized.



Figure 3-2 Query flow chart of the local table and distributed table

Figure 3-2 is a two-sharded, two-replicated architecture using the Replicated*Merge Tree + Distributed engine mode. The red numbers represent nodes, so nodes 1 and 2 are copies of each other, and nodes 3 and 4 are copies of each other.

Events in the figure is a Distributed engine table, also known as a Distributed table. Events_loc al is Replicated for the *MergeTree engine table, also called local tables. In this figure, distributed tables are only created on node 3, and online environments typically create a distributed table on each node (which does not consume resources because distributed tables do not store data).

When a query is executed, the distributed table of a node is accessed, as shown in this figure on node 3. The distributed table then reads the data of the two shards separately. In this case, it reads the local table data of node 3 and node 2, which together form the complete data. The Result Set is returned after the query is summarized.

Clickhouse performance

**1) Insertion: ** 100-150m /s insertion speed of single machine;

Select * from groupby; select * from groupby; With indexes, the query time is 0.101 seconds. As you can see, Clickhouse queries are extremely fast, and most common databases in the market can’t match this performance.

**3) Others: ** concurrency, the official website default configuration is 100. Because the big data analysis database is mainly applicable to OLAP scenarios, it does not support concurrency. Multiple big data queries may directly occupy the CPU and other resources. Therefore, the actual concurrency does not reach 100.

Clickhouse practices in the good future



Figure 4-1 ClickHouse online architecture diagram

1. Business scenarios

At present, in addition to our department, there have been several business parties in other departments of Good Future.

1) this department

Use platforms: Log center, Owl, groundhog, Grafana, etc.

How to use it: We consume kakfa with flink, Spark, Gohangout, etc., write to ClickHouse, and then use ClickHouse to perform aggregated queries to display the data.

For example, groundhog aggregation is mainly done through gateway data. You can see the number of calls and request time of each domain name, URL or server. Another example of live data is the consumption live report log, which is presented by Grafana.

2) Other departments

In addition to the department, there are other business parties, including the data r&d department and data Center. The data R&D department mainly synchronizes hotspot data from Hive to Clickhouse via Spark /dataX. Then the data is displayed to analysts through Platforms such as Tianshu tianxuan, which improves the query speed.



Figure 4-2: ClickHouse usage diagram

2. Storage status



Figure 4-3 Data store on a single node

The above data is listed in the first column as the library name, the second column as the number of rows, the third column as the size before compression, and the fourth column as the size after compression.

You can see that there are multiple TB and 10 billion rows databases on a single node. At present, there are 6 data nodes, that is to say, in the cluster, the amount of data needs to be multiplied by 6, which means that the number of rows of some individual libraries has reached 100 billion lines and the capacity has reached 100 tons.

Construction and Planning

1, monitor,

Clickhouse does not currently provide a direct monitoring interface, but the required monitoring data is recorded in the System library, which has been displayed online by someone using Grafana. The current monitoring diagram is shown in Figure 5-1.



Figure 5-1 ClickHouse monitoring diagram

In addition, I also wrote scripts to check and restart each node on a regular basis. At the same time, the hardware information and alarms of each node are displayed on the Holy Tree platform.

2. Problems encountered

Clickhouse, as an OLAP database, has a number of issues, such as version bugs, non-standard usage, and mixing problems. Here are some of the issues that require continuous optimization or business coordination. Other issues encountered are constantly updated on the wiki:

1) The server load is high due to a large number of queries, affecting service queries

** Analysis: ** Multiple complex queries with a large amount of hit data. Each query consumes a lot of CPU and memory. Can cause the server load to be full. In particular, if the memory is full, nodes may hang up. Solution:

  • To avoid full memory usage, set max_memory_us age_ FOR_all_QUERIES to a value lower than the actual server memory. And can also limit the user’s concurrency, the amount of data for each query;
  • In some services, the amount of data to be queried cannot be limited. Therefore, you can add a caching mechanism to avoid large data queries.

2) THE DDL statement is stuck

Clickhouse supports appends and queries, but mergetr EE allows you to update and delete tables. Update and delete operations are performed using ALTER TABLE statements, which actually require DDL permissions. Every time this operation, the database will lock, update tables and other operations, and in the case of a large amount of data, the update operation, the whole data will be sorted according to the index, which is a long process. Clickhouse’s DDL statement should have a queue at the bottom, and any unfinished DDL will cause other DDLS to freeze.

** Minimize the frequency and spacing of DDL statements, or avoid execution at all.

3) they are lost

The Clickhouse cluster solution is very dependent on ZK, and the replica synchronization mechanism is dependent on ZK, resulting in the interaction with ZK every time data is inserted, and in the case of high insertion concurrency, the ZK may be temporarily disconnected.

** ZooKeeper was mixed with Clickhouse before, but the situation was partially improved after the split. However, there are still occasional problems, and we will continue to optimize this area in the future. The performance of ZooKeeper server will be improved as much as possible. We will apply for a high-configuration server to improve the performance of ZooKeeper, so that Clickhouse will not be affected by zooKeeper performance.

3. Future planning

To build a high-performance and stable database for big data scenarios, continuous learning and cooperation with the business side are required.

  • In-depth understanding of the business, according to the business scenario to build tables;
  • Continue to learn clickHouse and optimize SQL according to clickHouse features;
  • At the same time, the cooperation of the business side is also required. If the large query frequency is high, the mechanism such as cache can be considered, or approximate calculation can be used in specific scenarios. At the same time, special scene special treatment, to achieve the appropriate SQL;
  • As data continues to grow, the query pressure is also increasing. Isolate clusters so that important services do not affect each other.
  • Continuously improve the monitoring and alarm mechanism;
  • Clickhouse also has many powerful features that you will try out in the future.

reference

[1]clickhouse.tech/docs

[2]blog.csdn.net/tzs_104…

[3]www.jianshu.com/p/ab8..