Welcome to my GitHub

Github.com/zq2599/blog…

Content: all original article classification summary and supporting source code, involving Java, Docker, Kubernetes, DevOPS, etc.;

This is the final installment of the CDH+Kylin trilogy series, and a quick review of what we’ve already done:

  1. CDH+Kylin Trilogy one: Preparation: Prepare machines, scripts, installation packages;
  2. “CDH+Kylin Trilogy Ii: Deployment and Setup” : Complete the DEPLOYMENT of CDH and Kylin, and make relevant Settings on the management page;

Now that Hadoop and Kylin are in place, let’s try Kylin’s official demo.

Yarn Parameter Settings

After setting Yarn memory parameters, restart Yarn to make it take effect. Otherwise, the task submitted by Kylin cannot be executed due to resource constraints.

About the official Kylin demo

  1. The following is part of the script (create_sample_tables.sql) of the official demo to create a Hive table based on HDFS data:

2. Through the script, you can see that KYLIN_SALES is the fact table, and the other tables are dimension tables, and KYLIN_ACCOUNT and KYLIN_COUNTRY are associated, so the dimension model conforms to Snowflake Schema.

Import sample data

  1. Log in to the CDH server over SSH
  2. Switch to the HDFS account: su – HDFS
  3. Run the ${KYLIN_HOME}/bin/sample.sh command
  4. The console output is as follows:

Check the data

  1. Check data and run beeline to enter the session mode. (Beeline is recommended to replace the Hive CLI.)

2. Enter the URL in Beeline session mode:! Connect JDBC: hive2: / / localhost: 10000, according to the prompt for account HDFS, directly enter the password:3. Run the show tables command to view the current Hive table.SQL: select min(PART_DT), Max (PART_DT) from kylin_sales; , it can be seen that the earliest 2012-01-01 and the latest 2014-01-01, the whole query takes 18.87 seconds:

Build the Cube:

Data ready, ready to build Kylin Cube:

  1. Login Kylin website: http://192.168.50.134:7070/kylin
  2. Load Meta data as shown below:

3. As shown in the red box below, data is loaded successfully:4. In the Model page, you can see the fact table and dimension table. To create a MapReduce task, calculate the Cardinality of each column of the dimension table KYLIN_ACCOUNT:5. Go to the Yarn page (port 8088 of the CDH server), as shown in the following figure, a MapReduce job is being executed:The KYLIN_ACCOUNT table has Cardinality data calculated. The number of ACCOUNT_ID is 10000. The Cardinality value is 10420. Kylin used HyperLogLog approximate algorithm to calculate Cardinality, and there is error with the exact value, the Cardinality of the other four fields is consistent with Hive query results) :7. Start building Cube:8. The date range is from 2012-01-01 to 2014-01-01. Note that the deadline must exceed 2014-01-01:9. Progress can be seen in the Monitor page:10. Go to the Yarn page (port 8088 of the CDH server) and view the related tasks and resource usage.11. After the build is complete, the Ready icon appears:

The query

  1. The query takes 18.87 seconds to execute on Hive. The query takes 0.14 seconds to execute.

Order by date, sort by date, and then query with Kylin and Hive separately:

select part_dt, sum(price) as total_sold, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt;
Copy the code
  1. Kylin query time 0.13 seconds:

4. The result is the same, which takes 40.196 seconds:5. Finally, resource usage: during Cube construction, 18G memory was used:CDH+Kylin is now deployed to play, and the CDH+Kylin trilogy series is over. If you are learning About Kylin, I hope this article will give you some Pointers.

Welcome to pay attention to the public number: programmer Xin Chen

Wechat search “programmer Xin Chen”, I am Xin Chen, looking forward to enjoying the Java world with you…

Github.com/zq2599/blog…