Introduction to openLooKeng Execution Plan optimization

Before talking about the pushdown framework, let’s take a quick look at the general flow of openLooKeng execution plan optimization.

As shown in the figure above, after the user SQL statement is received, the SQL is converted into an Abstract Syntax Tree (AST) tree. The AST tree is then transformed into a logical execution plan tree. Then, the most important step in the execution plan optimization process, use rules or Optimizers. Each PlanOptimizer can operate a subtree of execution plans, based on statistics or experience, Replace the current subtree with a better subexecution plan for optimization purposes. PlanOptimizers are usually optimization rules that have accumulated over time, such as predicates push-down, join reorder, etc. PlanOptimizers can store physical execution information in the ConnectorHandle. The new pushdown frame works on this layer.

After the optimal execution plan is obtained, the logical execution plan is transformed into a physical execution plan, which is then fragmented and divided into a sub-tree to be executed according to stage, and finally scheduled to be executed on worker.

PrestoSql push-down scheme

Initially, openLooKeng evolved from PrestoSql, adding a number of features, as well as a number of performance optimizations. Before I talk about openLooKeng’s new pushdown framework, let me give you an overview of its original pushdown scheme.

  • According to community discussion, PrestoSql’s original pushdown framework had some goals:

1) Use existing rules or Optimizer frameworks instead of using visitor-based PlanOptimizers for pushdown, and have the Connector provide transformation rules for pushdown;

2) There is not a native mechanism to support push-down for all operations.

  • Let’s take a look at prestoSql execution:

1) First, introduce a series of push-down rules. Each rule is responsible for pushing down corresponding operations to a TableScan operation. For example, PushFilterIntoConnector PushProjectionIntoConnector, PushAggregationIntoConnector, etc.

2) These rules interact with connectors via the specified metadata call. If the Connector supports this pushdown, the operation is pushed down to a TableScan operation and the related information is recorded in the connectorTableHadle.

  • The following uses PushFilterIntoConnector as an example.

In the example above, assume that there are two filter criteria in filter, like and f function, where connector can handle like expressions.

PushFilterIntoConnector calls metadata. pushFilter, which calls Connector’s pushFilter function. This function returns a new tableHandle, The new tableHandle records information about the like expression and returns a Remaining Filter, which the Connector cannot handle. Finally, PushFilterIntoConnector converts the original execution plan (the first box in the figure above) into a new execution plan (the last box in the figure above).

  • The rule-based pushdown scheme above has the following problems:

1) Push-down of Nodes such as JoinNode and WindowNode cannot be implemented, especially in the case of join. A Join node not only needs to visit the current Join node, but also needs to visit its left and right Nodes. Meanwhile, it also needs to save the context information of the join. Rule-based push-down schemes are difficult to deal with this situation;

2) The push-down logic is complicated, and the information below the push-down cannot be saved. In the case of Join, it is not known where the push-down information of Join is stored.

3) Due to the above reasons, the current push-down scheme cannot push SQL statements down to data source operations, so the data source itself cannot be fully utilized for those data sources with relatively fast execution speed, so a new push-down framework is introduced.

OpenLooKeng’s new pushdown framework

  • thought

The main idea behind openLooKeng’s new pushdown framework is to expose the execution plan subtree to the Connector and have the Connector provide PlanOptimizers (visitor patterns) to the execution optimization engine, allowing the Connector to introduce arbitrary optimizations.

To prevent PlanOptimizers of one connector from modifying execution plan subtrees of other connectors, openLooKeng imposes two restrictions on planNodes exposed to connectors:

1) Exposed PlanNodes must be moved to the Presto-SPI module;

2) Expose only the subexecution plan tree belonging to connector to the corresponding Connector. As shown in the figure below, the left subtree will only be exposed to Hive Connector and the right subtree will only be exposed to Mysql Connector. Then they’ll apply their own PlanOptimizers.

  • implementation

OpenLooKeng’s pushdown framework, shown above, works in two simple steps:

ConnectorPlanOptimizer, HiveFpuPushdownOptimizer, which needs to implement the optimize interface, The optimize function returns the optimized execution plan as an entry point for the subexecution plan;

2) ApplyConnectorOptimization optimizer, introduced in the optimization engine,

The Optimizer calls its connectorPlanOptimizer based on the connector where the child execution plan resides. As shown in the figure below, the Aggregation and Filter operations are both pushed down into the data source after being optimized by the HiveFpuPushdownOptimizer.

  • Modify the

The new framework feature PR links to

Gitee.com/openlookeng…

The main modifications are as follows:

  1. Move the PlanNodes to the presto-spi module

  2. Change Expression in PlanNode and Assignments to RowExpression (described in the next section)

  3. Add TranslateExpressions and ApplyConnectorOptimization Optimizer

  4. Modify existing Rules and Optimizers

  5. Add a ConnectorPlanOptimizer for the Connector

  • Expression-to-RowExpression

In a database or query system, AST and IR trees are isolated for better isolation, and they use different data structures. This is the separation of AST (Node) and IR (PlanNode) discussed in the Presto community. Specifically, the Expression of AST tree species is converted into RowExpression in PlanNode. Currently the AST and PlanNode are using Expression together and cannot be well isolated. Rker.

The current lifecycle of an execution plan is as follows:

building AST

building raw plan

plan optimization

plan sanity check

plan cost computation

building subplans

distributing subplan (over the wire)

compiling subplan locally

Currently, the Expression to RowExpression conversion takes place at step 8. We moved the transformation operation to step 3. The reason for not moving the transformation to the second step is that the coverage is too extensive and the changes are too large.

Example demonstrate

First of all, three catalogs are configured in the demo system. They all point to the same data source, but the push-down Settings are different. Mysql2 is not pushed, mysqL1 part is pushed down (join is not pushed), mysql is all pushed down.

Not pushing

The partial push down case, in this case, the filter is pushed down

All push down cases

How to contribute

Here’s a quick look at how developers can adapt the new push-down framework by adding a connectorPlanOptimizer to a new connector. The main steps are as follows:

First: Duplicate the following functions in XXXConnector

Second: implement XXXPlanOptimizerProvider

Third: Implement PlanOptimizer in XXXConnector

The optimize function in PlanOptimizer is a visitor to visit the execution plan tree

Fifth: Implement Visitor, which generates push-down statements while modifying the execution plan tree

Sixth: implement XXXQueryGenerator, implement a visitor in XXXQueryGenerator to record push information to XXXQueryGeneratorContext, if there are nodes can push, generate the corresponding SQL

The detailed implementation can refer to the baseJdbc implementation of openLooKeng, which implements the push down of the JDBC data source.

Above is openLooKeng development engineer Luo Dan share.

Live video review: www.bilibili.com/video/BV1of…


Official website: summer.iscas.ac.cn/

Students guide: summer. Iscas. Ac. Cn/help/studen…

Mission details: summer.iscas.ac.cn/#/org/orgde…


OpenLooKeng is an open source, high-performance data virtualization engine that provides a unified SQL interface and cross-data source/data center analysis capability to provide a minimalist data analysis experience for big data users.

OpenLooKeng community official website: openLooKeng. IO /zh-cn/

OpenLooKeng code store address: gitee.com/openlookeng