ClickHouse profile

ClickHouse is a column database management system (DBMS) for online analytics (OLAP). ClickHouse started as a product called Yandex.Metrica, primarily for WEB traffic analysis. ClickHouse stands for Click Stream,Data WareHouse, or ClickHouse for short.

ClickHouse is ideally suited for business intelligence, but it can also be used in advertising traffic, Web, App traffic, telecommunications, finance, e-commerce, information security, online gaming, the Internet of Things, and many more. ClickHouse has the following features:

  • Support for complete SQL operations

  • Column storage and data compression

  • Vectorization execution engine

  • Relational models (similar to traditional databases)

  • Rich table engine

  • Parallel processing

  • Online inquiry

  • Data fragmentation

    ClickHouse, as a high-performance OLAP database, has the following shortcomings.

  • Transactions are not supported.

  • Not good at querying by primary Key row granularity (although supported), so ClickHouse should not be used as a key-value database.

  • Not good at deleting data by row (although supported)

Single machine installation

Download the RPM package

The installation method in this document is offline installation. You can download the corresponding RPM package from the following link or directly from Baidu Cloud

- RPM package address https://packagecloud.io/Altinity/clickhouse, baidu cloud link: https://pan.baidu.com/s/1pFR66SzLvPYMfcpuPJww5A to extract the code: gh5aCopy the code

Include these packages in our installed software:

  • clickhouse-clientPackage that contains the ClickHouse-Client application, which is an interactive ClickHouse console client.
  • clickhouse-commonPackage that contains a ClickHouse executable file.
  • clickhouse-serverPackage that contains the ClickHouse configuration file to run as a server.

Contains a total of four RPM packages,

Clickhouse - the client - 19.17.4.11-1. El7. X86_64. RPM clickhouse - common - static - 19.17.4.11-1. El7. X86_64. RPM Clickhouse - server - 19.17.4.11-1. El7. X86_64. RPM clickhouse - server - common - 19.17.4.11-1. El7. X86_64. RPMCopy the code

Screaming message: If an error message is displayed during the installation: dependency detection fails, the dependency package is missing

You can manually install the libicu-50.2-4.el7_7.x86_64. RPM dependency package

Disabling the Firewall

Check the firewall status.
systemctl status firewalld
Temporary firewall shutdown command. The firewall starts automatically after you restart your computer.
systemctl stop firewalld
## Permanently disable firewall command The firewall does not automatically start after the restart.
systemctl disable firewalld
Copy the code

System requirements

ClickHouse can run on any Linux, FreeBSD, or Mac OS X CPU architecture with X86_64, AArch64, or PowerPC64LE. Although the pre-built binaries are typically compiled for x86 _64 and leverage the SSE 4.2 instruction set, using a CPU that supports it becomes an additional system requirement unless otherwise noted. This is the command to check if the current CPU supports SSE 4.2:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
SSE 4.2 supported
Copy the code

To run ClickHouse on processors that do not support SSE 4.2 or have AArch64 or PowerPC64LE architectures, appropriate configuration adjustments should be made through source builds.

Install the RPM package

Upload the RPM package to the /opt/software directory
Run the following command to install
[root@cdh06 software]# rpm -ivh *.rpmError: dependency detection failed: Libicudata.so.50 ()(64bit) is clickhouse-common-static-19.17.4.11-1.el7.x86_64 requires libicui18n.so.50()(64bit) to be clickhouse-common-static-19.17.4.11-1.el7.x86_64 Clickhouse-common-static-19.17.4.11-1.el7.x86_64 requires libicuuc.so.50()(64bit) to be clickhouse-common-static-19.17.4.11-1.el7.x86_64 requires libicuuc.so Clickhouse-common-static-19.17.4.11-1.el7.x86_64 requires libicudata.so.50()(64bit) to be clickhouse-common-static-19.17.4.11-1.el7.x86_64 requires libicudata.so Clickhouse-server-19.17.4.11-1.el7.x86_64 requires libicui18n.so.50()(64bit) is required by clickhouse-server-19.17.4.11-1.el7.x86_64 Libicuuc.so.50 ()(64bit) required by clickhouse-server-19.17.4.11-1.el7.x86_64## Install error, missing the corresponding dependency package,
You need to download the corresponding dependency package
Download libicu-50.2-4.el7_7.x86_64. RPM to install libicu-50.2-4.el7_7.x86_64. RPM

Copy the code

Viewing Installation Information

The directory structure

  • /etc/clickhouse-server: server configuration file directory, including global configuration config. XML and user configuration users.xml.

  • /etc/clickhouse-client: client configuration, including conf.d folder and config.xml file.

  • /var/lib/clickhouse: default data storage directory (the default path configuration is usually modified to save data to the path where the large disk is mounted).

  • /var/log/clickhouse-server: the default directory for saving logs (the configuration of the directory is usually modified to save logs to the path mounted to a large-capacity disk).

The configuration file

  • The/etc/security/limits. D/clickhouse. Conf: configuration file handle quantity
[root@cdh06 clickhouse-server]# cat /etc/security/limits.d/clickhouse.conf 
clickhouse      soft    nofile  262144
clickhouse      hard    nofile  262144
Copy the code

This configuration can also be modified through the max_open_files of config.xml

 <! -- Set limit on number of open files (default: maximum). This setting makes sense on Mac OS X because getrlimit() fails to retrieve correct maximum value. -->
    <! -- <max_open_files>262144</max_open_files> -->
Copy the code
  • /etc/cron.d/clickhouse-server:cron: indicates the scheduled task configuration, which is used to restore the Clickhouse service process that is interrupted due to an exception. The default configuration is as follows:
[root@cdh06 cron.d]# cat /etc/cron.d/clickhouse-server
#*/10 * * * * root (which service > /dev/null 2>&1 && (service clickhouse-server condstart ||:)) || /etc/init.d/clickhouse-server condstart > /dev/null 2>&1
Copy the code

Executable file

Finally, a set of executables in /usr/bin:

  • Clickhouse: Executable file for the main program.

  • Clickhouse-client: A soft link to the ClickHouse executable for clients to connect to.

  • Clickhouse-server: A soft link to the ClickHouse executable for the server to start.

  • Clickhouse-compressor: A built-in compression tool that can be used to decompress data.

Start/stop the service

## Start service
[root@cdh06 ~]# service clickhouse-server start
Start clickhouse-server service: Path to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/
DONE
## Shut down service
[root@cdh06 ~]# service clickhouse-server stop
Copy the code

Client connection

[root@cdh06 ~]# clickhouse-client ClickHouse Client Version 19.17.4.11. Connecting to localhost:9000 as user default. Connected to ClickHouse Server Version 19.17.4 revision 54428.cdh06 :) show databases; SHOW the DATABASES ┌ ─ name ─ ─ ─ ─ ┐ │ default │ │ system │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ 2 rowsin setElapsed: 0.004 SEC.Copy the code

Basic operation

Creating a database

  • grammar
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] [ENGINE =engine(...) ]Copy the code
  • example
CREATE DATABASE IF NOT EXISTS tutorial
Copy the code

By default, ClickHouse uses the native database engine, Ordinary(any type of table engine can be used under this database, and the default database engine is used in most cases). You can also use Lazy and MySQL engines, such as the MySQL engine, which allows you to operate tables in the MySQL database directly from ClickHouse. Assuming there is a database called ClickHouse in MySQL, you can connect to the MySQL database in the following way.

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - grammar -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
ENGINE = MySQL('host:port'['database' | database], 'user'.'password')
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - the sample -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
CREATE DATABASE mysql_db ENGINE = MySQL('192.168.200.241:3306'.'clickhouse'.'root'.'123qwe');
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- operation -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
cdh06 :) use mysql_db;
cdh06 :) show tables;

SHOWTABLES chrysene ─name─ test │ ├ ──── r1 rows in set. Elapsed: 0.005 sec. 

cdh06 :) select * from test;

SELECT *
FROMThe test ┌ ─ ─ id ┬ ─ name ─ ─ ┐ │1Tom │ │ │2 │ jack  │
│  3│ lihua │ └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┘3 rows in set. Elapsed: 0.047 sec. 
Copy the code

Create a table

  • grammar
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [compression_codec] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [compression_codec] [TTL expr2],
    ...
) ENGINE = engine
Copy the code
  • The sample
-- Capitalize the first letter
- build table
create table test(
    id Int32,
    name String
) engine=Memory;
Copy the code

The above command creates a Memory table, using the Memory engine. The table engine determines the characteristics of the data table and how the data will be stored and loaded. The Memory engine is ClickHouse’s simplest table engine, and data is only kept in Memory and lost when the service is restarted.

Cluster installation

Installation steps

The basic steps for standalone installation and use of the ClickHouse client are described above. The next section describes how to install the cluster. ClickHouse cluster installation is very simple. First repeat the steps above to install ClickHouse on each machine and then configure the ** /etc/clickhouse-server-config. XML and /etc/metrika.xml files respectively. It is worth noting that the ClickHouse cluster relies on Zookeeper**, so make sure you have the Zookeeper cluster installed first. The ZK cluster installation steps are very simple and will not be covered in this article. This article demonstrates a three-node ClickHouse cluster installation with the following steps:

  • First, repeat the stand-alone installation steps and install ClickHouse on two separate machines

  • Then, modify the ** /etc/clickhouse-server-config.xml ** file on each machine

    <! If ipv6 is disabled, use the following configuration:
    <listen_host>0.0.0.0</listen_host>
    <! -- If ipv6 is not disabled, use the following configuration :</listen_host> :
    Copy the code

    Scream Tip (1) :

    If ipv6 is disabled and <listen_host>::</listen_host> is used, the following error is reported

    Application: DB::Exception: Listen [::]:8123 failed: Poco::Exception. Code: 1000, e.code() =0, e.displayText() = DNS error: EAI: -9

    Scream Tip (2) :

    The default TCP port number for ClickHouse is 9000. If there is a port conflict, you can change the port number <tcp_port>9001</tcp_port> in the ** /etc/clickhouse-server-config.xml ** file.

  • Finally, create the metrika.xml file under /etc. The following configuration is the shard configuration without copies. We can also configure multiple copies for the shard

    <yandex>
    <! Config.xml < remote_Servers incl=" clickhouse_remote_Servers ">    
    <clickhouse_remote_servers>
        <! -- Cluster name, modifiable -->
        <cluster_3shards_1replicas>
            <! -- Configure three shards, one machine for each shard -->
            <shard>
                <replica>
                    <host>cdh04</host>
                    <port>9001</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>cdh05</host>
                    <port>9001</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>cdh06</host>
                    <port>9001</port>
                </replica>
            </shard>
        </cluster_3shards_1replicas>
    </clickhouse_remote_servers>
    <! Config. XML <zookeeper incl=" zookeeper-Servers "optional="true" /> 
    <zookeeper-servers>
        <node>
            <host>cdh02</host>
            <port>2181</port>
        </node>
        <node>
            <host>cdh03</host>
            <port>2181</port>
        </node>
        <node>
            <host>cdh06</host>
            <port>2181</port>
        </node>
    </zookeeper-servers>
    <! -- Shard tag sets the shard number. < Replica > Sets the shard copy host name. Modify the configuration on the corresponding host.
    <macros>
        <shard>01</shard>
        <replica>cdh04</replica>
    </macros>    
    </yandex>
    Copy the code
  • Start Clickhouse-Server on your respective machines

    # service clickhouse-server start
    Copy the code
  • (Optional) Modify the **/etc/clickhouse-client/config.xml** file

    Since the default clickhouse-client connection is localhost, the default connection port number is 9000. Since we changed the default port number, we need to change the default client connection port number. Add the following content to the file:

     <port>9001</port>
    Copy the code

    **–port 9001** Specifies the port number to connect to when using a client connection, otherwise an error will be reported:

    Connecting to localhost:9000 as user default.
    Code: 210. DB::NetException: Connection refused (localhost:9000)
    Copy the code

Basic operation

Validation of the cluster

After the above configuration is complete, start clickhouse-Server and start Clickhouse-clent on your respective machines

/ / start the server# service clickhouse-server start// Start the client. The -m argument supports multiple lines of input# clickhouse-client -m
Copy the code

You can query the system table to verify that the cluster configuration is loaded:

cdh04 :) select cluster,shard_num,replica_num,host_name,port,user from system.clusters;
Copy the code

To look at the fragmentation information of the cluster (macro variables), execute the following command on each machine:

cdh04 :) select * fromsystem.macros; Chrysene ─macro─ ┬─substitution─ ─── replica │ CDH04 │ shard │01│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ cdh05:)select * fromsystem.macros; Chrysene ─macro─ ┬─substitution─ ─── replica │ CDH05 │ shard │02│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ cdh06:)select * fromsystem.macros; Chrysene ─macro─ ┬─substitution─ ─── replica │ CDH06 │ shard │03│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

Distributed DDL operations

By default, the CREATE, DROP, ALTER, and RENAME operations take effect only on the server where the command is currently executed. In a clustered environment, you can use the ON CLUSTER statement so that it can work across the entire CLUSTER.

For example, create a distributed table:

CREATE TABLE IF NOT EXISTS user_cluster ON CLUSTER cluster_3shards_1replicas
(
    id Int32,
    name String
)ENGINE = Distributed(cluster_3shards_1replicas, default, user_local,id);
Copy the code

The Distributed table engine is defined as follows: ClickHouse’s table engine will be explained in more detail in a future article.

Distributed(cluster_name, database_name, table_name[, sharding_key])
Copy the code

The meanings of each parameter are as follows:

  • Cluster_name: indicates the cluster name, which corresponds to the user-defined name in cluster configuration.
  • Database_name: indicates the database name
  • Table_name: indicates the table name
  • Sharding_key: optional key used for sharding. During data writing, distributed tables distribute data to the local tables of each node based on the sharding key rules.

Scream tips:

Creating distributed tables is a read-on-check mechanism, which means that there is no mandatory order for creating distributed tables and local tables.

It is also worth noting that the ON CLUSTER Distributed DDL is used in the above statement, which means that a Distributed table is created ON each shard in the CLUSTER so that read and write requests to all shards can be initiated from either end.

When the above distributed tables are created, you look at the tables on each machine and see that a newly created table exists on each machine.

Next we need to create a local table, one for each machine:

CREATE TABLE IF NOT EXISTS user_local 
(
    id Int32,
    name String
)ENGINE = MergeTree()
ORDER BY id
PARTITION BY id
PRIMARY KEY id;
Copy the code

We first insert data into the USER_Local table on one machine, and then query the user_Cluster table

Insert data
cdh04 :) INSERT INTO user_local VALUES(1.'tom'), (2.'jack');
Query the user_cluster table. You can use the user_cluster table to operate all user_local tables
cdh04 :) select * fromuser_cluster; ┌ ─ ─ id ┬ ─ name ─ ┐ │2│ jack │ └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┘ ┌ ─ ─ id ┬ ─ name ─ ┐ │1│ │ Tom └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┘Copy the code

Next, we insert some data into the user_Cluster and observe the user_local table. We can see that the data is distributed to other nodes.

-- Insert data to user_Cluster
cdh04 :)  INSERT INTO user_cluster VALUES(3.'lilei'), (4.'lihua'); 
-- Query user_Cluster data
cdh04 :) select * fromuser_cluster; ┌ ─ ─ id ┬ ─ name ─ ┐ │2│ jack │ └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┘ ┌ ─ ─ id ┬ ─ name ─ ─ ┐ │3│ lilei │ └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┘ ┌ ─ ─ id ┬ ─ name ─ ┐ │1│ │ Tom └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┘ ┌ ─ ─ id ┬ ─ name ─ ─ ┐ │4│ lihua │ └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┘-- Check user_local on cDH04
cdh04 :) select * fromuser_local; ┌ ─ ─ id ┬ ─ name ─ ┐ │2│ jack │ └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┘ ┌ ─ ─ id ┬ ─ name ─ ─ ┐ │3│ lilei │ └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┘ ┌ ─ ─ id ┬ ─ name ─ ┐ │1│ │ Tom └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┘-- Check user_local on cDH05
cdh05 :) select * fromuser_local; ┌ ─ ─ id ┬ ─ name ─ ─ ┐ │4│ lihua │ └ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

conclusion

This article first introduces the basic features and usage scenarios of ClickHouse, then explains the offline installation steps for ClickHouse standalone and cluster versions, and gives a simple use case for ClickHouse. This article is a simple introduction to ClickHouse, and you’ll explore the world of ClickHouse in more depth in the following shares.