This article is from OPPO Internet technology team, if you need to reprint, please indicate the source and author. Welcome to follow our official account: OPPO_tech

Hive was designed as an offline warehouse product, and although Hortonworks has made a lot of improvements to Make Apache Hive 100x Faster, performance improvements are still modest.

As OPPO data volume gradually increased, hive running for hours was no longer sufficient for interactive queries, so we introduced Presto, an excellent representative of SQL on Hadoop. It is mainly used for impromptu query.

However, things are often not that simple, many analysts are used to HQL, the cost of migration is very high, so in accordance with the principle of business are beautiful God. We jumped into the Hive compatibility pit to make it easier for users to migrate their business to Presto. Here are some of our main compatibility work.

Presto is compatible with Hive. In general, the following issues need to be addressed:

  1. Udf hive;

  2. Hive views;

  3. For example, 1/2=0 in Presto and 0.5 in Hive.

Solving these problems completely takes time and effort, so we built a routing layer on top of Presto and Hive to prioritize query requests to Presto. If not, routes are routed to hive.

Of course, there is a risk that the same SQL may yield different results in Presto and Hive due to different semantic processing.

Semantic modifications

Part 1: Key not present in map

Presto processes maps. If the key does not exist, an error is reported, and Hive returns NULL.

Part 2: Cast as string

Because presto does not have a type named string, Unknown type errors occur when cast as string is cast or when the table definition contains string. So we replace string with vARCHar in astBuilder.java.

Part 3: Type implicit conversion

During migration, type mismatches often occur in PRESto. The core reason is that Hive converts data types for compatibility. Initially consider overloading the operator, such as adding the following function:

Can support the following case:

However, this method requires overloading almost all operators, such as +,-,*,/,between,in, etc., which is too tedious.

So instead, we implicitly insert a CAST. Select 1 = cast(‘1’ as int); select 1 = cast(‘1’ as int)

Hive View Compatibility

Presto does not access views defined by Hive, but does not access simple view definitions, such as:

We can provide support, and the method is simple: When Presto reads the view definition (StatementAnalyzer.java), the original SQL definition is parsed and converted into presto’s view structure.

conclusion

  1. Support for some UDF functions for business development;

  2. For example, map data {fK_1 =1, FK_2:2, FK_3:3} is used in Hive 1.x. Colelction.delim is used as the separator between values. Hive 3.x uses “collection.delim”; Prestosql 315 relies on Hive 3.x, which causes problems in map data parsing in Hive 1.x.

Presto’s compatibility with Hive goes far beyond udTF and Hive null handling, which highlights the importance of a unified SQL standard.

Here’s the point:

One last digression: OPPO Internet operation and maintenance cloud storage team is looking for a number of positions. We welcome students who are interested in MongoDB kernel source code, Wiredtiger storage engine, RocksDB storage engine, database room live, data link synchronization system, middleware, database and so on to join us and participate in OPPO million-level high concurrency document database development.

Location: Shenzhen/Chengdu

Contact email: [email protected] (it is better to indicate the source of nuggets)