** Abstract: ** Huawei Intelligent Data Lake MRS service is about to launch the ClickHouse high-performance engine cluster. Users will be able to deploy the cluster in just a few minutes with one click, and quickly have the second level interactive query and analysis capability of petabyte data, helping users to bring the ultimate performance experience.

preface

Introduction to High-performance Engines

ClickHouse is an open source analytical database that has become very popular in the last two years. It was opened in 2016 by Yandex, a Russian company. It is independent of Hadoop big data system, and its core features are extreme compression rate and fast query performance, which can save a lot of costs for users and create more benefits at the same time. It provides a standard SQL protocol compatible interface, support JDBC, ODBC driver, using C++ language. ClickHouse also has an ambitious goal: to be the fastest analytics database in the world. Official tests show ClickHouse is six times ahead of Vertica, 18 times ahead of GreenPlum, and hundreds of times faster than traditional big data engines like Hive and Spark. This is it, and other multiple open source and commercial database performance comparison results: clickhouse. Tech/benchmark/d…

Typical Application Scenarios

ClickHouse, which stands for Click Stream + Data WareHouse, was originally used as a Web traffic analysis tool for OLAP analysis of Data warehouses based on the flow of page Click events. Now, ClickHouse is widely used in Internet advertising, apps and Web traffic, telecommunication, finance, Internet of things, and many other areas, very suitable for business intelligence application scenarios, both at home and abroad have a lot of application and practice: ClickHouse. Tech/docs/en/int…

The body of the

The MRS service of Huawei Intelligent Data Lake is about to launch the ClickHouse high-performance engine cluster. Users can deploy and set up a cluster with one click in just a few minutes, and quickly have the interactive query and analysis capability of petabytes of data in seconds, helping users to bring the ultimate performance experience.

Manual cluster mode upgrade

2.1.1 Fuzzy Clusters

Before we get started, we need to make a mental shift. ClickHouse clusters are different from clusters that are commonly understood. For example, the Hadoop cluster is a complete and independent cluster composed of two Namenodes and multiple Datanodes, and services can be accessed by direct interaction. For a cluster consisting of multiple ClickHouse nodes, there is no central node and it is more of a static resource pool concept. To use the ClickHouse cluster model, the cluster information needs to be defined in the configuration file of each node in advance, and all participating nodes need to agree. Services can be correctly interactively accessed, that is to say, the cluster in the configuration file is the concept we usually understand “cluster”.

2.1.2 Actual Cluster

ClickHouse requires users to take the initiative to plan and define detailed configurations such as shard, partition, replica locations, and so on. The ClickHouse instance of MRS services packages these tasks into automatic transmission for unified management, flexibility and ease of use. In terms of deployment mode, a ClickHouse instance contains three Zookeeper nodes and multiple ClickHouse nodes. The Dedicated Replica mode is used to ensure high reliability of double replicas.

Smooth elastic capacity expansion

With rapid service growth and the cluster storage capacity or CPU computing resources approaching the limit, the MRS service provides smooth and elastic capacity expansion to quickly meet customers’ demands for service growth. MRS provides a one-click data Balance tool when users expand ClickHouse nodes in the cluster, and gives users the initiative to Balance data. Users can determine the mode and time point of data balancing according to service characteristics to ensure service availability and achieve smoother capacity expansion.

Such as:

  • Remove the nodes with high load from the ELB and tilt the load (new data) towards the new nodes;
  • Re-balancing data using specialized tools provided by MRS;
  • Automatic switchover after data dual write aging.

The multiple kunpeng calculation force is supported

Faced with the rapid development of Kunpeng ecology, Huawei Cloud also provides a variety of computing capabilities including X86, ARM and Centerm, and supports IOT, big data and AI technologies to provide the best performance, cost performance and energy efficiency ratio. Thanks to the multi-core advantage of Huawei Yunkunpeng processor, the ClickHouse cluster of MRS also supports The ARM server developed by Huawei, which makes full use of the high concurrency capability of Kunpeng’s multi-core and provides chat-level full-stack independent optimization capability. At the same time, EulerOS, JDK and data acceleration layer developed by Huawei are used. Fully release the underlying hardware computing power, to achieve high cost performance.

Flexible and easy-to-use configuration management

MRS provides users with a unified cluster management page. ClickHouse instance configuration is fully open to users. Users can flexibly modify cluster configuration parameters, including adding and deleting clusters, Macros, and storage information, according to their customized requirements. As with other MRS service components, the ClickHouse configuration is divided into cluster and node levels, which you need to be careful not to overwrite because of the specificity of the ClickHouse engine. Note that some advanced configurations are recommended only for advanced users. Otherwise, system exceptions may occur.

Highly available HA deployment architecture

The MRS service provides users with an ELB-based HA deployment architecture to automatically distribute user access traffic to multiple back-end nodes, expanding the external service capabilities of the system and achieving higher application fault tolerance. In the following figure, when a client requests a cluster, the Elastic Load Balance (ELB) is used to distribute traffic and write Local tables on different nodes and read Distributed tables on different nodes using the ELB polling mechanism. The high availability of cluster write load, read load and application access is strongly guaranteed.

Rich monitoring and o&M capabilities

MRS provides rich ClickHouse cluster monitoring and alarm capabilities to detect system exceptions in real time and ensure service stability. On the Monitoring page of the Cluster Manager, you can view real-time ClickHouse cluster performance indicators such as health, configuration, and role instance status. You can also monitor the internal performance of specific instances, including real-time read, write, and database connection information. In addition, the MRS can connect to the message service system of Huawei Cloud Message Notification Service (SMN) to push alarm information to users by SMS or email. You can customize the monitoring and alarm thresholds to monitor the health status of indicators. When the monitored data reaches the alarm threshold, the system automatically generates an alarm and notifes users of abnormal information in a timely manner. With the above capabilities, MRS helps users with easy operation and maintenance (O&M), real-time monitoring, and real-time alarm sending, and flexible operation, saving users much worry and effort.

Reliable safety protection capability

MRS provides complete security guarantee mechanisms such as VPC network isolation, exclusive resource isolation, and host security to ensure secure and reliable ClickHouse cluster data access. Details are as follows:

  • VPC network isolation: In a public cloud deployment environment, the MRS provides an isolated network environment using a VPC to ensure cluster service and management security. Users can combine the VPC subnet division, route control, and security group functions to provide users with a highly secure and reliable network isolation environment.
  • Exclusive resource isolation: For enterprise, government, and financial customers, MRS provides a resource isolation deployment solution for computing, storage resource pools, network, and control with multi-level isolation, creating a secure, reliable, and convenient “first-class cabin” on the cloud. There are three modes: Dedicated computing resource + shared storage resource, shared computing resource + dedicated storage resource, and dedicated computing resource + dedicated storage resource.
  • Host security service: MRS supports integration with security services on the cloud. For the host security service, we have done compatibility tests to ensure that the function and performance are not affected, so as to enhance the security capabilities of the service, such as supporting vulnerability scanning, security protection, application firewall, fortress machine, webpage tamper-proof and other capabilities.

At the end

Conclusion outlook

The ClickHouse engine, launched by MRS, quickly complements the capabilities of MRS services in the field of real-time big data analysis. MRS ClickHouse has advantages such as smooth capacity expansion, HA, kunpeng support, flexible configuration, simple operation and maintenance, security and reliability compared with the user-built cluster, which will become the first choice for users to build high-performance mass data analysis warehouses on the cloud.

At the same time, as a new heavyweight database engine, we are still in the process of continuous learning and exploration. MRS will continue to be optimized and improved from multiple perspectives of kernel, sertization and ecological side, including: Kunpeng instruction set acceleration, security authentication, SQL diagnosis, BI tool integration, AI fusion advanced features, etc.

Case sharing

  • We tested the ClickHouse Star Schema Benchmark test suite on Huawei Cloud MRS service. Three virtual machines with 32 vCPUS * 128GB and a 1TB data set. Six of the 13 queries were returned within 1 second and 10 were returned within 5 seconds. All the queries were returned within 10 seconds, showing outstanding performance.
  • Low-cost case study for mass data analysis: altinity.com/blog/2020/1…

Click to follow, the first time to learn about Huawei cloud fresh technology ~