Apache Kylin Getting Started 3 - Installation and configuration

Apache Kylin starter series directory

Introduction to Apache Kylin 1 – Basic Concepts
Getting Started with Apache Kylin 2 – Principles and Architecture
Apache Kylin Getting started 3 – Details of installation and configuration parameters
Apache Kylin Starter 4 – Building the Model
Apache Kylin Starter 5 – Build Cube
Apache Kylin Starter 6 – Optimizing Cube
Construct Kylin query time monitoring page based on ELKB

Install Kylin

The first two articles introduced you to the basic concepts and how Apache Kylin works. The next part of the article will start with installation, deployment and configuration.

Big Data Environment Requirements (V2.5.1)

Hadoop: 2.7 + 3.1 +
The Hive: 0.13 1.2.1 +
HBase: 1.1+, 2.0
Spark (Optional) 2.1.1+
Kafka (optional) 0.10.0+
The JDK: 1.8 +
OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+
HDP (officially tested) 2.2-2.6 and 3.0
CDH (officially tested) 5.7-5.11 and 6.0

Big Data Environment Requirements (V2.4.x)

Hadoop: 2.7 +
The Hive: 0.13 1.2.1 +
HBase: 1.1 +
Spark (Optional) 2.1.1+
Kafka (optional) 0.10.0+
The JDK: 1.7 +
OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+
HDP (officially tested) 2.2-2.6
CDH (officially tested) 5.7-5.11

As you can see from the above configuration, the latest version (V2.5.1) has many changes. It supports Hadoop 3.1 and HBase 2.0. The JDK requirement is JDK8; CDH users should note that V2.5 already supports CDH 6.0.

The hardware configuration

Minimum Configuration (official website)

4 core CPU
16 GB memory
100 GB disk

Recommended Configuration (OFFICIAL KAP Document)

Two Intel Xeon processors, 6-core (or 8-core) CPU, 2.3ghz or above
64GB memory
At least one 1TB SAS hard disk (3.5-inch), 7200RPM, RAID1

Installation package directory description

Bin: directory where Kylin scripts are stored, including start-stop management, metadata management, environment check, and sample creation scripts.
Conf: directory for storing Kylin configuration files, including Hive, Job, Kylin running parameters, and Kylin Config.
Lib: Kylin JDBC driver, HBase Coprocessor JAR directory.
Meta_backups: Kylin metadata backup directory.
Sample_cube: the script and data that the official sample depends on;
Sys_cube: the script on which the system Cube builds;
Spark: The default is spark. In the figure, a soft connection points to spark whose address is independently deployed.

Install and deploy Kylin

For details, please refer to the official website. Here is a brief introduction to Kylin installation steps:

Download the appropriate version from the official website;
Unpack the installation package and configure the environment variable KYLIN_HOME to point to the Kylin folder.
Check the Kylin operating environment:$KYLIN_HOME/bin/check-env.sh;
Start the Kylin:$KYLIN_HOME/bin/kylin.sh start;
Through the browserhttp://hostname:7070/kylinThe initial user name and password areADMIN/KYLIN;
run$KYLIN_HOME/bin/kylin.sh stopYou can stop Kylin.

Parameter configuration

Configuration File Overview

Component name	The file name	describe
Kylin	`kylin.properties`	The global configuration file used by Kylin
Kylin	`kylin_hive_conf.xml`	Hive task configuration items, which will adjust Hive configuration parameters based on this file when generating intermediate tables through Hive in the first step of Cube construction
Kylin	`kylin_job_conf_inmem.xml`	Contains configuration items for MR tasks when Cube build algorithms areFast CubingThe MR parameters in the build task are adjusted according to the Settings in this file
Kylin	`kylin_job_conf.xml`	Configuration item of the MR task, when`kylin_job_conf_inmem.xml`Does not exist, or the Cube build algorithm isLayer CubingThe MR parameters in the build task are adjusted according to the Settings in this file
Hadoop	`core-site.xml`	Hadoop global configuration file used to define system-level parameters, such as HDFS URL and Hadoop temporary directory
Hadoop	`hdfs-site.xml`	This parameter is used to configure HDFS parameters, such as the storage location of NameNode and DataNode, number of file copies, and file read permission
Hadoop	`yarn-site.xml`	This parameter is used to configure Hadoop cluster resource management system parameters, such as the communication port between ResourceManader and NodeManager and the Web monitoring port
Hadoop	`mapred-site.xml`	This parameter is used to configure MR parameters, such as the default number of Reduce jobs and the default upper and lower limits of memory that can be used by jobs
Hbase	`hbase-site.xml`	This parameter is used to configure Hbase running parameters, such as the master machine name and port number, and the location where root data is stored
Hive	`hive-site.xml`	This parameter is used to configure Hive operating parameters, such as Hive data storage directory and database address

Hadoop Parameter Configuration

yarn.nodemanager.resource.memory-mbThe value of the configuration item must be at least 8192MB
yarn.scheduler.maximum-allocation-mbThe value of the configuration item must be at least 4096MB
mapreduce.reduce.memory.mbThe value of the configuration item must be at least 700MB
mapreduce.reduce.java.optsThe value of the configuration item must be at least 512MB
yarn.nodemanager.resource.cpu-vcoresThe value of the configuration item is not less than 8

Kellin.properties core parameter

The configuration of	The default value	instructions
kylin.metadata.url	`kylin_metadata@hbase`	Kylin metadata path
kylin.env.hdfs-working-dir	`/kylin`	HDFS path used by the Kylin service
kylin.server.mode	`all`	Run mode, which can be all, Job, or Query
kylin.source.hive.database-for-flat-table	`default`	The Hive intermediate table is stored in the Hive database
kylin.storage.hbase.compression-codec	`none`	Compression algorithm used by HTable
kylin.storage.hbase.table-name-prefix	`kylin_`	Prefix of the HTable table name
kylin.storage.hbase.namespace	`default`	HTable Default tablespace
kylin.storage.hbase.region-cut-gb	`5`	Region Indicates the partition size
kylin.storage.hbase.hfile-size-gb	`2`	Hfile size
kylin.storage.hbase.min-region-count	`1`	Minimum number of regions
kylin.storage.hbase.max-region-count	`500`	Maximum number of regions
kylin.query.force-limit	`- 1`	for`select *`Statement enforces a LIMIT clause
kylin.query.pushdown.update-enabled	`false`	Whether to enable query down pressure
kylin.query.pushdown.cache-enabled	`false`	Enable querying whether cache is enabled
kylin.cube.is-automerge-enabled	`true`	Automatic merging of segments
kylin.metadata.hbase-client-scanner-timeout-period	`10000`	Timeout period of HBase data scanning
kylin.metadata.hbase-rpc-timeout	`5000`	Timeout period for an RPC operation
kylin.metadata.hbase-client-retries-number	`1`	HBase Retry Times

Some notes on the above parameters:

kylin.query.force-limitThe default value is no limit. The recommended value is 1000.
kylin.storage.hbase.hfile-size-gbCan be set to 1 to help speed up MR.
kylin.storage.hbase.min-region-countYou can set this parameter to the number of HBase nodes to force data to be distributed on N nodes.
kylin.storage.hbase.compression-codecBy default, no compression is performed. You are advised to configure the compression algorithm when the environment is running.

Spark Configuration

All use of kylin. Engine. The spark – the conf. As a prefix spark configuration properties can be in the $KYLIN_HOME/conf/kylin. The properties of the management, of course, these parameters to cover support in advanced configuration of Cube. The following are recommended Spark dynamic resource allocation configurations:

// Run in yarn-cluster mode. Of course, you can configure an independent Spark cluster: spark://ip:7077 kylin.engine.spark-conf.spark.master=yarn kylin.engine.spark-conf.spark.submit.deployMode=cluster / / start dynamic resource allocation kylin. Engine. The spark - the conf. Spark. DynamicAllocation. Enabled =true
kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=2
kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1000
kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300
kylin.engine.spark-conf.spark.shuffle.service.enabled=trueKylin. Engine. The spark - the conf. Spark. Shuffle. Service. The port = 7337 / / memory Settings kylin. Engine, spark - the conf. Spark. Driver. = 2 g memory / / data larger or greater dictionary can be big executor memory kylin. The engine, spark - the conf. Spark. Executor. = 4 g memory Kylin. Engine. The spark - the conf. Spark. Executor. Cores = 2 / / kylin.engine.spark-conf.spark.net work overtime heartbeat. Timeout = 600 / / partition size kylin.engine.spark.rdd-partition-cut-mb=100Copy the code

Cube Planner configuration

Cube Planner is a new function added after V2.3. With this function, you can see the number and combination of all cuboids after Cube is created successfully. In addition, after the configuration is successful, we can see the matching of Query and Cuboid on the line, which enables us to see the popular, unpopular or even unused CuBoids. With these, we can guide the secondary optimization of Cube construction. About the Cube Planner use, you can refer to the official document: kylin.apache.org/cn/docs/tut… .

Refer to the article

Kylin official documentation
Kyligence_Enterprise_3_1-zh.pdf
Kylin 2.0 Spark Cubing optimization improvements

Any Code, Code Any!

Scan code to pay attention to “AnyCode”, programming road, together forward.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Apache Kylin Getting Started 3 – Installation and configuration

Install Kylin

Big Data Environment Requirements (V2.5.1)

Big Data Environment Requirements (V2.4.x)

The hardware configuration

Minimum Configuration (official website)

Recommended Configuration (OFFICIAL KAP Document)

Installation package directory description

Install and deploy Kylin

Parameter configuration

Configuration File Overview

Hadoop Parameter Configuration

Kellin.properties core parameter

Spark Configuration

Cube Planner configuration

Refer to the article

Apache Kylin Getting Started 3 – Installation and configuration

Install Kylin

Big Data Environment Requirements (V2.5.1)

Big Data Environment Requirements (V2.4.x)

The hardware configuration

Minimum Configuration (official website)

Recommended Configuration (OFFICIAL KAP Document)

Installation package directory description

Install and deploy Kylin

Parameter configuration

Configuration File Overview

Hadoop Parameter Configuration

Kellin.properties core parameter

Spark Configuration

Cube Planner configuration

Refer to the article

Related Posts

TCP, UDP, HTTP, and Socket

In Flex, Flex -grow, Flex -shrink, Flex -basis

How to build a Swoft development environment with Docker