Chap Chap Chap Chap Kylin

Apache Kylin is an OLAP analysis engine developed by a team led by Han Qing @lukehq, a Chinese from eBay. This is the first time in eBay’s history that it is open source and contributed to the Apache Foundation project. The project became Apache’s top project in 2015. The performance of Kylin is amazing, N times better than Hive!

Kylin (http://kylin.io)

As a solution designed to accelerate analysis processes in a Hadoop environment and work smoothly with SQL compatibility tools, Kylin has successfully introduced SQL interfaces and a multidimensional analysis mechanism (OLAP) to Hadoop to support extremely large data sets.

On-Line Analytical Processing

How the Cube Build Engine transforms relational data into key-value data in an off-line manner. The yellow line also shows the process of analyzing data online. Data requests can be generated by SQL submissions using SQL-based tools, or implemented by third-party applications through Kylin’s RESTful services.

The RESTful service invokes the Query Engine, which in turn checks to see if the corresponding target dataset actually exists. If it does exist, the engine accesses the target data directly and returns the result with a sub-second delay. If the target data set does not exist, the engine will route the query without matching data set to SQL on Hadoop, that is, to Hive and other Hadoop clusters to be responsible for processing.

1-1 Kylin Background

Our users — such as those in analytics and business — want to continue to use their familiar tool solutions, such as Tableau and Excel, with minimal latency.

In view of this, the basic requirements for eBay products are outlined:

  1. The query latency of tens of billions of data rows needs to be maintained at the sub-second level.

  2. Ability to provide ANSI SQL for users using SQL compatibility tools.

  3. Complete OLAP solution to achieve various advanced functions.

  4. Possess the ability to support high base and super-large scale business system.

  5. High concurrent processing capacity for thousands of users.

  6. A distributed horizontal scaling architecture capable of handling terabyte to petabyte level analysis tasks.

1-2 Kylin What

  1. Extensible ultra-fast OLAP engine

  2. Hadoop ANSI SQL interface

  3. Interactive query capabilities: (subsecond interaction with Hadoop data)

  4. Multidimensional Cube (Molap Cube) (can define data model and build Cube for billions of data sets)

  5. Seamless integration with BI tools (such as Tableau)

Other features:

  • Job management and monitoring

  • Compression and coding

  • Incremental updating

  • Using HBase Coprocessor

  • Hyperloglog-based Dinstinc Count Approximation Algorithm

  • Web-friendly interface to manage, monitor, and work with cubes

  • Access control security at the project and cube levels

  • Support LDAP

1-3 Kylin EcoSys

    1. Kylin core

    2. extension

    3. integration

    4. The user interface

    5. drive

    • Kylin OLAP engine basic framework, including Metadata engine, query engine, Job engine and storage engine, and also includes REST server to respond to client requests

    • Integration: Integration with scheduling systems, ETL, monitoring and other lifecycle management systems

    • Drivers: ODBC and JDBC drivers to support different tools and products, such as Tableau

    1-4 Kylin components

    1. Metadata Manager

    2. Job Engine

    3. Storage Engine

    4. REST Server

    5. ODBC driver

    6. Query Engine

    Metadata Manager

    Kylin is a metadata-driven application. The metadata management tool is a key component for managing all the metadata stored in Kylin, including the most important Cube metadata. The proper functioning of all other components is based on the metadata management tool.

    Job Engine

    The engine is designed to handle all offline tasks, including shell scripts, Java APIs, Map Reduce tasks, and so on. The task engine manages and coordinates all tasks in Kylin, ensuring that each task is executed and troubleshooting in between.

    Storage Engine

    The engine is responsible for managing the underlying storage — specifically, the cuboid, which is stored as key-value pairs. The storage engine uses HBase — the optimal key-value system usage solution in the Hadoop ecosystem today. Kylin is also able to extend support for other key-value systems, such as Redis. •REST Server: REST Server is a set of entry points for application development designed to enable application development for the Kylin platform. Such applications can provide queries, get results, trigger Cube build tasks, get metadata, get user permissions, and so on.

    ODBC driver

    To support third-party tools and applications such as Tableau, we built and open sourced a set of ODBC drivers. Our goal is to make the adoption of the Kylin platform even smoother.

    Query Engine

    When the Cube is ready, the query engine can retrieve and parse the user query. It then interacts with other components in the system to return the corresponding results to the user.

    In Kylin, we use an open source dynamic data management framework called Apache Calcite to parse the SQL and other inserts in the code.

    1-5 The practice of Jingdong Cloud Sea

    Jingdong yun: http://www.csdn.net/article/2015-11-27/2826343?utm_source=tuicool&utm_medium=referral

    1-6 Introduction to Kylin Cube algorithm

    http://www.linuxeden.com/html/news/20150910/162787.html

    1-7 Reference links

    CSDN : http://www.csdn.net/article/2014-10-25/2822286