To calculate indicators, relying solely on early statistical software would be inefficient. And the accuracy goes down a lot, so Kylin was born.
What are Kylin’s requirements for dimension tables?
1. Primary key values must be unique for data consistency; Kylin checks and reports an error if two rows have the same primary key value.
2. The smaller the dimension table, the better, because Kylin loads the dimension table into memory for query; Tables that are too large are not suitable for dimension tables, and the default threshold is 300MB.
3. Change frequency is low, Kylin tries to reuse the snapshot of the dimension table in each build. If the dimension table changes frequently, reuse will fail, which will result in creating snapshots of the dimension table frequently.
4. Dimension tables should not be Hive views. Although kylin 1.5.3 added support for cases where dimension tables are views, the View needs to be materialized every time, resulting in additional time overhead.