As a huawei cloud information giant, Reading Knowledge Cloud is good at presenting complex information in a diversified way. There is always a picture (cloud map), profound and simple blog (cloud class) or short video (cloud vision hall) that can make you quickly start huawei cloud. Click here for more highlights.

Abstract: Open source high-performance, open source online analytical column database management system in 2016. Its high data compression ratio, based on multi-core parallel computing, vectorization execution and SIMD, has excellent performance. ClickHouse is widely used in Internet advertising, App and Web traffic, telecommunications, finance, Internet of Things and many other fields. It is very suitable for business intelligent application scenarios and has a large number of applications and practices at home and abroad.

This article is shared by Huawei Cloud community “Using Scipy. Linalg to use linear system in Python”, author: Hi,EI.

ClickHouse is an open source column database for online analytics, independent of Hadoop’s big data architecture. Its core features are extreme compression and fast query performance. ClickHouse also supports SQL queries with good query performance, especially aggregate analysis queries based on large and wide tables, which are orders of magnitude faster than other analytical databases. ClickHouse is widely used in Internet advertising, App and Web traffic, telecommunications, finance, Internet of Things and many other fields. It is very suitable for business intelligent application scenarios and has a large number of applications and practices at home and abroad.

Key features of ClickHouse

  • Complete DBMS functionality

ClickHouse is a complete Database Management System with basic DBMS functionality, as shown below:

DDL (Data Definition Language) : Databases, tables, and views can be created, modified, or deleted dynamically without restarting the service.

DML (Data Manipulation Language) : Data can be dynamically queried, inserted, modified, or deleted.

Permission control: You can set operation permissions on databases or tables by user granularity to ensure data security.

Data backup and recovery: Provides a backup, export, import and restore mechanism to meet the requirements of the production environment.

Distributed management: Provides the cluster mode to automatically manage multiple database nodes.

  • Column storage and data compression

ClickHouse is a database that uses column storage. Data is organized by column. Data belonging to the same column is kept together and stored in separate files between columns.

When performing data query, column storage can reduce data scan scope and data transfer size, and improve data query efficiency.

  • Vectorization execution engine

ClickHouse implements vectorization execution using the CPU’s SIMD instructions. The full name of SIMD is SingleInstruction Multiple Data, which means to operate Multiple Data with a SingleInstruction and improve performance through Data parallelism (other methods include instruction level parallelism and thread level parallelism). Its principle is the parallel operation of data at the CPU register level, which is more efficient than similar OLAP products.

  • A copy of the mechanism

ClickHouse utilizes ZooKeeper and implements the replication mechanism through the Replicated Mergetree engine (Replicated family of engines). The replica mechanism is a multi-master architecture that allows you to send INSERT statements to any copy and asynchronously copy data to the other copies.

The functional advantages of duplicate mechanism are as follows:

ClickHouse replicas are designed to minimize network data transfer and can be used for synchronization across different data centers to build multi-data center, multi-live cluster architectures.

Replica mechanism is the basis of high availability, load balancing, migration/upgrade.

The system monitors replica data synchronization, identifies faulty nodes, and recovers the faulty nodes when they recover to ensure overall high availability of services.

  • Data sharding and distributed query

ClickHouse provides linear scaling through sharding and distributed table mechanisms.

Sharding mechanism: To solve the performance bottleneck of a single node, split the data from a table into multiple nodes by horizontal shards, with no duplication of data between different nodes, allowing linear scaling of ClickHouse by adding sharding.

Distributed table: The distributed table is used to query fragmented data. The distributed table engine does not store any data and is only a layer agent, which can automatically route to each shard node in the cluster to obtain data. That is, the distributed table needs to work with other data tables.

As shown in the figure below, when querying, we need to query the table_distributed table, which automatically routes query requests to each fragment node and aggregates the results.

With all that said, how do you use ClickHouse? Don’t worry, here’s a quick guide to using the ClickHouse client and database basics.

Preparation before Operation

  • The ClickHouse cluster has been created.

  • The ClickHouse client is installed.

Used by the ClickHouse client

1. Log in to the node where the client is installed as the client installation user.

2. Run the following command to switch to the client installation directory. The following directories are used as an example. The client path varies according to the actual installation path.

cd /opt/Bigdata/client

3. Run the following command to configure environment variables.

source bigdata_env

4. Run the client command of the ClickHouse component.

Clickhouse client–host Clickhouse instance IP –user login name –password password –port Clickhouse port number

The clickHouse Client command line parameters are described as follows:

ClickHouse database basic operations

  • Create database:

The basic grammar

CREATE DATABASE [IFNOT EXISTS] database_name

Use the sample

  • Create a table:

The basic grammar

Method 1: Create a table named table_name in the specified database_name database.

If the table builder does not contain “database_name”, the database selected when the client logs in is used as database_name by default.

CREATE TABLE[IF NOT EXISTS] [database_name.]table_name [ON CLUSTER CLUSTER name]

(

name1 [type1] [DEFAULT|materialized|ALIAS expr1],

name2 [type2] [DEFAULT|materialized|ALIAS expr2],

.

) ENGINE = engine

Method 2: Create a table with the same structure as table_name2, and you can assign different table engine declarations to it.

If there is no table engine declaration, the table created will use the same table engine as database_name2.table_name2.

CREATE TABLE[IF NOT EXISTS] [database_name.]table_name AS [database_name2.]_table_name_2 [ENGINE= engine]

Method three: Use the specified engine to create a table with the same structure as the result of the SELECT clause, and populate it with the result of the SELECT clause.

CREATE TABLE[IF NOT EXISTS] [database_name.]table_name ENGINE = engine ASSELECT

Use the sample

  • Insert table data:

The basic grammar

Method 1: Insert data in standard format.

INSERT INTO [database_name.]table [(c1,c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23),…

Method 2: write using the result of SELECT.

INSERT INTO [database_name.]table [(c1,c2, c3)] SELECT

Use the sample

  • Query table data:

The basic grammar

SELECT [DISTINCT] expr_list

[FROM [database_name.]table |(subquery) | table_function] [FINAL]

[SAMPLE sample_coeff]

[ARRAY JOIN …]

[GLOBAL] [ANY|ALL|ASOF][INNER|LEFT|RIGHT|FULL|CROSS] [OUTER|SEMI|ANTI] JOIN (subquery)|table (ON <expr_list>)|(USING <column_list>)

[PREWHERE expr]

[WHERE expr]

[GROUP BY expr_list][WITH TOTALS]

[HAVING expr]

[ORDER BY expr_list][WITH FILL] [FROM expr] [TO expr] [STEPexpr]

[LIMIT [offset_value,]n BY columns]

[LIMIT [n,]m] [WITH TIES]

[UNION ALL …]

[INTO OUTFILEfilename]

[FORMAT format]

Use the sample

  • Alter table structure:

The basic grammar

ALTER TABLE [database_name].name [ONCLUSTER cluster] ADD|DROP|CLEAR|COMMENT|MODIFYCOLUMN

ALTER supports only *MergeTree, Merge, and Distributed engine tables.

Use the sample

  • Displays database and table information

The basic grammar

show databases

show tables

Use the sample

  • Query the table structure

The basic grammar

DESC|DESCRIBE TABLE [database_name.]table [INTO OUTFILEfilename] [FORMAT format]

Use the sample

  • Delete table

The basic grammar

DROP [TEMPORARY] TABLE [IFEXISTS] [database_name.]name [ON CLUSTER cluster]

Use the sample

Well, that’s all for this cloud lesson. Go and experience more of MapReduce(MRS)! Stab here

Click to follow, the first time to learn about Huawei cloud fresh technology ~