What is a push down

Such as the following SQL

select name from mysql.config.user_info where id = 123
Copy the code

As simple as it seems, simply hand the SQL directly to mysql and presto fetch the final result.

However, the biggest feature of Presto is that it crosses sources, and the SQL syntax of different sources may differ slightly (for example, in the supported functions). Therefore, Presto needs to implement its own SET of SQL syntax, and it is definitely not possible to deliver SQL directly to mysql.

Moreover, some complex SQL in a cross-source scenario, such as joins of tables in different data sources, will certainly need to be parsed and executed by different data sources in different parts.

Select * from mysql1; select * from mysql2; select * from mysql2;

select a.id, a.count
from
  (select id, count
  from mysql1.statis.table_a
  where count > 100) a
  left join
  (select id
  from mysql2.config.table_b
  where create_time >= '2021-01-01') b
on a.id = b.id
where b.id is null
Copy the code

There are two ideas for execution with this SQL. As shown in the figure below.

The one on the left is a normal thought with optimization in mind. And the right idea, for SQL analysis of the program implementation, is more standardized and unified.

In effect, Presto first generates an execution plan from SQL along the lines shown on the right. At this point, where count > 100 is still executed on the Presto service. Presto then optimizes the execution plan and pushes where count > 100 to mysql for execution. This is called “push down”.

After the push down, the actual implementation of Presto actually matches the ideas shown on the left. Select id, count, not just where count > 100. In this way, mysql only needs to return the filtered ID and count, not all fields.

Push down source code execution

SQL execution can be said simply like this

  1. Coordinator parses presto’s SQL into a series of internal data (such as syntax trees, execution plans)
  2. When the worker loads mysql data, it reassembs these internal data into mysql SQL and hands it to mysql for execution

Let’s take a quick look at push-down code execution using this simple example from the beginning (the presto version is 0.213, which is older and only selected to simplify the presentation).

select name from mysql.config.user_info where id = 123
Copy the code
  1. Generate and optimize execution plans
//presto-main class LogicalPlanner { public Plan plan(Analysis analysis, Stage stage) { PlanNode root = planStatement(analysis, analysis.getStatement()); // For (PlanOptimizer: PlanOptimizer) {root = optimizer.optimize(root,...) ; }}Copy the code

Before root optimization, TableScanNode nodes are as follows:

After the optimization, TableScanNode nodes included in root are as follows:

As you can see, optimization directly modifies the execution plan tree.

  1. Assemble the SQL on the worker
//presto-base-jdbc class QueryBuilder { public PreparedStatement buildSql(...) { StringBuilder sql = new StringBuilder() sql.append("SELECT ") ... List<String> accumulator = toConjuncts(columns, tupleDomain, accumulator) // clauses.isEmpty()) { sql.append(" WHERE ").append(Joiner.on(" AND ").join(clauses)) } PreparedStatement statement = client.getPreparedStatement(connection, sql.toString()) statement.setInt(...) }}Copy the code