background
Global dictionary
- Used to resolve Count DISTINCT scenarios
- Build globally unique contiguous ids
- Convert a string to an int based on a bitmap
Cons: If you use the Global Dictionary, the Global Dictionary gets bigger and bigger, and builds slower and slower.
build
- At build time, a field of type BigInt, do Count distinct and select exact calculation.
- Found that Kylin added the global dictionary.
- Problem: String is technically an int, bigInt is not
screening
View the source code found that the field can only be “tinyint” “SmallINT” “int” “INTEGER” four types.
Source code address: github.com/apache/kyli…
Source code address: github.com/apache/kyli…
To solve
Currently, data stored in Hive is in Bigint format. The int type can store 2 billion + data. Changing the int type to meet table requirements.
Alter table [table_name] change column [old_colum_name] [new_colum_name] [new_colum_type]
Remaining issues
It is recommended that there be only one “all (Job + Query)” or “job” in cluster deployment mode. If you deploy three “all”, the global dictionary will fail to build.