OpenMLDB: Learn the details of the paramterized query statement

background

In database management systems (DBMS), a prepared statement or parameterized statement is a feature used to execute the same or similar database statements repeatedly with high efficiency. Typically used with SQL statements such as queries or updates, the prepared statement takes the form of a template into which certain constant values are substituted during each Execution. (en.wikipedia.org/wiki/Prepar…).

In a database system, parameterized statements, on the one hand, provide the ability of pre-compilation to efficiently execute statements and improve performance. On the other hand, it can prevent SQL injection attacks, which provides better security. These two points are the main reasons why traditional database systems support statements with parameters.

From a database system perspective, OpenMLDB supports Parameterized Query statements to further improve database query capabilities. From a business perspective, it enables OpenMLDB to support rule feature calculation in rule engine scenarios.

Scenario example: Rule engine feature calculation

SELECT 
SUM(trans_amount) as F_TRANS_AMOUNT_SUM, 
COUNT(user) as F_TRANS_COUNT,
MAX(trans_amount) as F_TRANS_AMOUNT_MAX,
MIN(trans_amount) as F_TRANS_AMOUNT_MIN,
FROM t1 where user = 'ABC123456789' and trans_time between 1590115420000 and 1592707420000;
Copy the code

In our example, we calculate the total number of transactions, the number of transactions, the maximum transaction amount, and the minimum transaction amount for user ABC123456789 between 2020-05-22 02:43:40 and 2020-06-20 07:43:40. These characteristics are passed to downstream components (the rules engine) for use.

In a real-world scenario, it would be impossible to write a single piece of SQL query code for each user. Therefore, a template for regular feature calculation is required, and the user’s time interval is dynamic.

The simplest way to do this is to write a program like the following, which concatenates the username and time interval as variables into an SQL statement.

String query = "SELECT "+
  "SUM(trans_amount) as F_TRANS_AMOUNT_SUM, "+
  "COUNT(user) as F_TRANS_COUNT,"+
  "MAX(trans_amount) as F_TRANS_AMOUNT_MAX,"+
  "MIN(trans_amount) as F_TRANS_AMOUNT_MIN,"+
  "FROM t1 where user = '"+ user +"' and trans_time between " 
  + System.currentTimestamp()-30*86400000+ " and " + System.currentTimestamp();

executor.execute(query);
Copy the code

This approach is straightforward, but query performance will be poor and there may be SQL injection risks. A more recommended approach is to use a Parameterized query.

PreparedStatement stmt = conn.prepareStatement("SELECT "+
  "SUM(trans_amount) as F_TRANS_AMOUNT_SUM, "+
  "COUNT(user) as F_TRANS_COUNT,"+
  "MAX(trans_amount) as F_TRANS_AMOUNT_MAX,"+
  "MIN(trans_amount) as F_TRANS_AMOUNT_MIN,"+
  "FROM t1 where user = ? and trans_time between ? and ? ");

stmt.setString(1, user);
stmt.setTimestamp(2, System.currentTimestamp()-30*86400000);
stmt.setTimestamp(3, System.currentTimestamp())
ResultSet rs = stmt.executeQuery();
rs.next();
Copy the code

Implementation details

In OpenMLDB, a new syntax function is supported, which usually requires parsing, plan generation and optimization, expression Codegen, query execution, and so on. If necessary, consider adding or refactoring interfaces on the client side. The Paramteried Query support basically covers the modification and development of the above modules, so understanding the implementation details can help you quickly understand OpenMLDB development, especially the OpenMLDB Engine development.

The following figure shows the process diagram of executing query with parameters.

The user is in the applicationJavaApplicationWhere S uses JDBC (PrepraredStatement) to perform a query with parameters.
The client (TAB Client) provides the interfaceExecuteSQLParameterizedTo handle queries with parameters and call the Tablet via RPCQueryService.
The Tablet server relies on the Engine module for query compilation and execution.
Query statement compilation needs to go through SQL syntax analysis, plan generation optimization, expression Codegen three major stages. After a successful compilation, the compilation results are stored in the SQL context of the current execution session (jizSeesion). If the current query statement has already been precompiled, there is no need to repeat the compilation. The corresponding compilation artifacts can be retrieved directly from the compilation cache and stored in the RunSession SqlContext.
Query execution requires a call to RunSeesionRunInterface. The execution resultrun outputIt will be stored in an attachment to the Response and sent back to the TabletClient. Finally deposited toResultSetBack to theJavaApplication

1. JDBC PreparedStatement

1.1 JDBC Prepared Statements Overview

Sometimes it is more convenient to use a PreparedStatement object for sending SQL statements to the database. This special type of statement is derived from the more general class, Statement, that you already know.

If you want to execute a Statement object many times, it usually reduces execution time to use a PreparedStatement object instead.[[2]](Using Prepared Statements)

JDBC provides SQL statements prepared for the user to execute parameters. You can use PrepareStatment to perform query, insert, and update operations with parameters. In this section, we’ll go into the details of how OpenMLDB’s PrepareStatement executes a query with an argument.

1.2 Usage of OpenMLDB PreapredStatement

public void parameterizedQueryDemo() { SdkOption option = new SdkOption(); option.setZkPath(TestConfig.ZK_PATH); option.setZkCluster(TestConfig.ZK_CLUSTER); option.setSessionTimeout(200000); try { SqlExecutor executor = new SqlClusterExecutor(option); String dbname = "demo_db"; boolean ok = executor.createDB(dbname); // create table ok = executor.executeDDL(dbname, "create table t1(user string, trans_amount double, trans_time bigint, index(key=user, ts=trans_time));" ); // insert normal (1000, 'hello') OK = executor. ExecuteInsert (dbname, "insert into T1 values('user1', 1.0, 'user1') 1592707420000);" ); Ok = executor. ExecuteInsert (dbname, "insert into T1 values('user1', 2.0, 1592707410000);" ); Ok = executor. ExecuteInsert (dbname, "insert into T1 values('user1', 3.0, 1592707400000); ); Ok = executor. ExecuteInsert (dbname, "insert into T1 values('user2', 4.0, 1592707420000); ); Ok = executor. ExecuteInsert (dbname, "insert into T1 values('user2', 5.0, 1592707410000);" ); Ok = executor. ExecuteInsert (dbname, "insert into T1 values('user2', 6.0, 1592707400000); ); PreparedStatement query_statement = executor.getPreparedStatement(dbname, "select SUM(trans_amout), COUNT(trans_amout), MAX(trans_amout) from t1 where user=? and trans_time between ? and ?" ); query_statement.setString(1, "user1"); query_statement.setLong(2, 1592707410000); query_statement.setLong(3, 1592707420000); com._4paradigm.openmldb.jdbc.SQLResultSet rs1 = (com._4paradigm.openmldb.jdbc.SQLResultSet) query_statement.executeQuery(); query_statement.setString(1, "user2"); query_statement.setLong(2, 1592707410000); query_statement.setLong(3, 1592707420000); com._4paradigm.openmldb.jdbc.SQLResultSet rs2 = (com._4paradigm.openmldb.jdbc.SQLResultSet) query_statement.executeQuery(); query_statement.setString(1, "user3"); query_statement.setLong(2, 1592707410000); query_statement.setLong(3, 1592707420000); com._4paradigm.openmldb.jdbc.SQLResultSet rs3 = (com._4paradigm.openmldb.jdbc.SQLResultSet) query_statement.executeQuery(); } catch (Exception e) { e.printStackTrace(); }}Copy the code

Step 1: Construct executor. Prepare database, table, table data (if required)
Step 2: Create a PreparedStatement instance using a query statement with parameters

PreparedStatement query_statement = executor.getPreparedStatement(dbname, "select SUM(trans_amout), COUNT(trans_amout), MAX(trans_amout) from t1 where user=? and trans_time between ? and ?" );Copy the code

Step 3: Set the parameter values for each position.

query_statement.setString(1, "user1");
query_statement.setLong(2, 1592707410000);
query_statement.setLong(3, 1592707420000);
Copy the code

Step 4: Perform the query. Obtain the query result. Note that the parameter data in a PrepareStatement is automatically cleared after a query is executed. You can directly set new parameter values for a new round of query

com._4paradigm.openmldb.jdbc.SQLResultSet rs2
  = (com._4paradigm.openmldb.jdbc.SQLResultSet) query_statement.executeQuery();
query_statement.setString(1, "user2");
query_statement.setLong(2, 1592707410000);
query_statement.setLong(23,  1592707420000);
com._4paradigm.openmldb.jdbc.SQLResultSet rs2
  = (com._4paradigm.openmldb.jdbc.SQLResultSet) query_statement.executeQuery();
Copy the code

1.3 Implementation details of PreparedStatement

public class PreparedStatement implements java.sql.PreparedStatement { //... // Argument line protected String currentSql; // Protected TreeMap<Integer, Object> currentDatas; / / parameter type protected TreeMap < Integer, com. _4paradigm. Openmldb. DataType > types; / / the last query parameter types protected TreeMap < Integer, com. _4paradigm. Openmldb. DataType > orgTypes; / /... }Copy the code

PrepareaStatement inherited the JDBC interface standard Java. SQL. PreparedStatement. It maintains some basic elements required for query compilation and execution: query statements (currentSql), parameter data (currentDatas) parameter types, and so on.

buildPrepareaStatementAfter that, we initializePreparedStatement, and set thecurrentSql
After setting the parameter values,currentDatas.typesWill be updated.
When executing a query,query_statement.executeQuery()

@Override public SQLResultSet executeQuery() throws SQLException { checkClosed(); dataBuild(); Status status = new Status(); com._4paradigm.openmldb.ResultSet resultSet = router.ExecuteSQLParameterized(db, currentSql, currentRow, status); / /... Return rs; }Copy the code

First, executiondataBuild: Encodes parameter data sets into currentRow by parameter type and position. It is worth noting that we can reuse the original currentRow instance if the parameter type does not change.

protected void dataBuild() throws SQLException { // types has been updated, create new row container for currentRow if (null == this.currentRow || orgTypes ! = types) { // ... Here to omit this. CurrentRow = SQLRequestRow. CreateSQLRequestRowFromColumnTypes (columnTypes); this.currentSchema = this.currentRow.GetSchema(); this.orgTypes = this.types; } / /... CurrentRow initializes the associated code for (int I = 0; i < this.currentSchema.GetColumnCnt(); i++) { DataType dataType = this.currentSchema.GetColumnType(i); Object data = this.currentDatas.get(i+1); if (data == null) { ok = this.currentRow.AppendNULL(); } else {/ / omit details encoding / / if (DataType. KTypeInt64. Equals (DataType)) {/ / ok = this. CurrentRow. AppendInt64 ((long) data); / /} / /... } if (! ok) { throw new SQLException("append data failed, idx is " + i); } } if (! this.currentRow.Build()) { throw new SQLException("build request row failed"); } clearParameters(); }}Copy the code

Next, the query interface with parameters provided by the client is invokedExecuteSQLParameterized.

2. TabletClient and Tablet

2.1 Tablet_client

The client provides an ExecuteSQLParameterized interface to support query with parameters.

/// Execute batch SQL with parameter row
std::shared_ptr<hybridse::sdk::ResultSet> 
ExecuteSQLParameterized(const std::string& db, const std::string& sql,
                        std::shared_ptr<SQLRequestRow> parameter,
                        ::hybridse::sdk::Status* status) override;
Copy the code

ExecuteSQLParameterized extracts the parameter type, parameter row size, and other information from the parameter row parameters, loads QueryRequest, and loads the parameter data row into the ROC attachment. The client invokes RPC, and the query is compiled and run on the server.

Loads the argument row size, fragment number, and argument type listQueryRequest

request.set_parameter_row_size(parameter_row.size());
request.set_parameter_row_slices(1);
for (auto& type : parameter_types) {
  request.add_parameter_types(type);
}
Copy the code

The parameter data row is stored in an RPC attachmentcntl->request_attachment()

auto& io_buf = cntl->request_attachment(); if (! codec::EncodeRpcRow(reinterpret_cast<const int8_t*>(parameter_row.data()), parameter_row.size(), &io_buf)) { LOG(WARNING) << "Encode parameter buffer failed"; return false; }Copy the code

Call the RPC

bool ok = client_.SendRequest(&::openmldb::api::TabletServer_Stub::Query, cntl, &request, response);
Copy the code

2.2 Server Tablet

The Query function on the server Tablet is responsible for getting the argument row information from the QueryRequest, Then call interface engine_->Get()[compile query statement and call interface session.run](https://link.zhihu.com/?target=http%3A//xn--%2560session-i30nta9680a88mb6mc I9ble5a3t6bkia4g4p.run /)() Executes the query statement.

::hybridse::vm::BatchRunSession session; if (request->is_debug()) { session.EnableDebug(); } session.SetParameterSchema(parameter_schema); { bool ok = engine_->Get(request->sql(), request->db(), session, status); / /... } ::hybridse::codec::Row parameter_row; auto& request_buf = static_cast<brpc::Controller*>(ctrl)->request_attachment(); if (request->parameter_row_size() > 0 && ! codec::DecodeRpcRow(request_buf, 0, request->parameter_row_size(), request->parameter_row_slices(), &parameter_row)) { response->set_code(::openmldb::base::kSQLRunError); response->set_msg("fail to decode parameter row"); return; } std::vector<::hybridse::codec::Row> output_rows; int32_t run_ret = session.run(parameter_row, output_rows);Copy the code

For more details, read the tablet client and its source implementation.

3. Compile: indicates the compilation of query statements

3.1 Compilation of query statements

Step 1: For query statements with parameters, additional parameter types need to be configured during compilation compared with ordinary queries.

session.SetParameterSchema(parameter_schema);
Copy the code

Step 2: After configuring the parameter list, callengine.Get(...)Interface to compile SQL statements

The compilation of query statements needs to go through three major stages: SQL parsing (3.2. Parser), plan generation (3.3 Planner), expression Codegen (3.4 Codegen). After a successful compilation, the compilation results are stored in the SQL context of the current execution session (RunSeesion). The following sections walk you through the process of compiling a query statement with parameters.

If the current query statement has already been precompiled, there is no need to repeat the compilation. The corresponding compilation artifacts can be retrieved directly from the compilation cache and stored in the RunSession SqlContext. We need to pay special attention to design changes to the compile cache. For parameterized queries, hitting the cache requires matching both the SQL and the parameter type list.

// check if paramter types matches with target compile result or not.
for (int i = 0; i < batch_sess->GetParameterSchema().size(); i++) {
  if (cache_ctx.parameter_types.Get(i).type() != batch_sess->GetParameterSchema().Get(i).type()) {
    status = Status(common::kEngineCacheError, "Inconsistent cache parameter type, expect " +
                    batch_sess->GetParameterSchema().Get(i).DebugString()," but get ", cache_ctx.parameter_types.Get(i).DebugString());
    return false;
  }
}
Copy the code

3.2. Parser: Parses the Parser

OpenMLDB syntax interpreter is developed based on ZetaSQL SQL interpreter: in addition to the original syntax capabilities of ZetaSQL, it also supports OpenMLDB-specific syntax features. For example, the special table type LastJoin and window type ROWS_RANGE are introduced for AI scenarios. The syntax parsing and new syntax features of OpenMLDB will be covered in future technical articles.

SQL Parameterized statements used? As a parameter to the placeholder, the placeholder is ZetaSQL interpreter resolved as ZetaSQL: : ASTParameterExpr. Since ZetaSQL already supports the parsing of Parameterized Query statements, we don’t need to make many additional changes to the syntax parsing module. We just need to turn on the original constraints to recognize such parameter expressions. An expression node of type ParameterExpr of OpenMLDB is converted and stored in the syntax tree.

/// Convert zetasql::ASTExpression into ExprNode base::Status ConvertExprNode(const zetasql::ASTExpression* ast_expression, node::NodeManager* node_manager, node::ExprNode** output) { //... base::Status status; switch (ast_expression->node_kind()) { //... case zetasql::AST_PARAMETER_EXPR: { const zetasql::ASTParameterExpr* parameter_expr = ast_expression->GetAsOrNull<zetasql::ASTParameterExpr>(); CHECK_TRUE(nullptr ! = parameter_expr, common::kSqlAstError, "not an ASTParameterExpr") // Only support anonymous parameter (e.g, ?) so far. CHECK_TRUE(nullptr == parameter_expr->name(), common::kSqlAstError, "Un-support Named Parameter Expression ", parameter_expr->name()->GetAsString()); *output = node_manager->MakeParameterExpr(parameter_expr->position()); return base::Status::OK(); } / /... }}Copy the code

For example, the following parameter query statement:

SELECT col0 FROM t1 where col1 <= ? ;Copy the code

After the syntax is parsed, the following query syntax tree is generated:

+-list[list]:
  +-0:
    +-node[kQuery]: kQuerySelect
      +-distinct_opt: false
      +-where_expr:
      |  +-expr[binary]
      |    +-<=[list]:
      |      +-0:
      |      |  +-expr[column ref]
      |      |    +-relation_name: <nil>
      |      |    +-column_name: col1
      |      +-1:
      |        +-expr[parameter]
      |          +-position: 1
      +-group_expr_list: null
      +-having_expr: null
      +-order_expr_list: null
      +-limit: null
      +-select_list[list]:
      |  +-0:
      |    +-node[kResTarget]
      |      +-val:
      |      |  +-expr[column ref]
      |      |    +-relation_name: <nil>
      |      |    +-column_name: col0
      |      +-name: <nil>
      +-tableref_list[list]:
      |  +-0:
      |    +-node[kTableRef]: kTable
      |      +-table: t1
      |      +-alias: <nil>
      +-window_list: []
Copy the code

Where col1 <=? Is resolved as:

+-where_expr:
      |  +-expr[binary]
      |    +-<=[list]:
      |      +-0:
      |      |  +-expr[column ref]
      |      |    +-relation_name: <nil>
      |      |    +-column_name: col1
      |      +-1:
      |        +-expr[parameter]
      |          +-position: 1
Copy the code

3.3 Planner: Plan generation

Logic plan

In the logical planning phase, queries with parameters are no different from normal parameters. Therefore, this article does not attempt to go into the details of the logical plan. The following parameter query statement:

SELECT col0 FROM t1 where col1 <= ? ;Copy the code

The logical plan is as follows:

: +-[kQueryPlan] +-[kProjectPlan] +-table: t1 +-project_list_vec[list]: +-[kProjectList] +-projects on table [list]: | +-[kProjectNode] | +-[0]col0: col0 +-[kFilterPlan] +-condition: col1 <= ? 1 +-[kTablePlan]Copy the code

Readers interested in the details of logical and physical plans should follow our column. A series of articles on the technical details of the engine will follow.

Physical planning

In the physical plan generation phase, to support queries with parameters, two things are done:

First, the list of parameter types is maintained in the physical plan context, expression analysis context, and CodeGen context.

In a parameterized query, the parameters used in the final execution are dynamically specified by the user, so the parameter types are also dynamically specified externally. To do this, we provide interfaces that allow users to configure a list of parameter types, if any, when compiling SQL. This list will eventually be stored in the physical plan context, expression analysis context, and CodeGen context.

Class PhysicalPlanContext {//... private: const codec::Schema* parameter_types_; } // ExprAnalysisContext {//... private: const codec::Schema* parameter_types_; } // Codegen context class CodeGenContext {// private: const codec::Schema* parameter_types_; }Copy the code

Secondly, the type inference of parameter expression is completed according to the parameter type list.

A Parameterized Query statement is almost like a syntax tree generated by a normal query statement. The only difference is that the syntax tree for parameterized Query has parameter expression nodes (ParamterExpr). Because the type of the parameter is neither independent of the schema for querying the upstream table, nor is it a constant. Therefore, we cannot directly type infer this parameter expression. This requires special treatment for ParamterExpr during the plan generation phase, especially during type inference of expressions. When inferring the output type of ParamterExpr, you need to find the corresponding type in the parameter type list based on the parameter location.

Status ParameterExpr::InferAttr(ExprAnalysisContext *ctx) { // ignore code including boundary check and nullptr check // . type::Type parameter_type = ctx->parameter_types()->Get(position()-1).type(); node::DataType dtype; CHECK_TRUE(vm::SchemaType2DataType(parameter_type, &dtype), kTypeError, "Fail to convert type: ", parameter_type); SetOutputType(ctx->node_manager()->MakeTypeNode(dtype)); return Status::OK(); }Copy the code

The physical plan produces the following result:

SIMPLE_PROJECT(sources=(col0)) FILTER_BY(condition=col1 <= ? 1, left_keys=, right_keys=, index_keys=) DATA_PROVIDER(table=auto_t0)Copy the code

FILTER_BY contains the expression condition=(col1 <=? 1)

3.4 Codegen: Code generation of expressions

The Codegen module is responsible for analyzing the expression list for each plan node and then performing a series of expression and function code generation processes. After codeGen, at least one CodeGen function will be generated for each plan node that needs to evaluate an expression. These functions are responsible for evaluating expressions.

Codegen function adds a parameter

OpenMLDB generates intermediate code (IR) for each node involved in expression evaluation through LLVM. This is done by generating @__internal_SQL_codegen_6 functions for each node expression list (these functions will be called during statement execution (4 Run: query statement execution) :

; ModuleID = 'sql' source_filename = "sql" define i32 @__internal_sql_codegen_6(i64 /*row key id*/, i8* /*row ptr*/, I8 rows * / * * / within PTR, i8 / * * * the output row PTR PTR * /) {__fn_entry__ : / / is omitted}Copy the code

The arguments to this function mainly contain Pointers to INT_8, which point to rows or rows PTR (aggregate computation-dependent data sets). The function body is responsible for evaluating each expression, encoding the results in a sequence of lines, and encoding the address to the last i8** output parameter.

When the list of expressions contains parameter expressions, we also need to obtain parameter data. Therefore, we need to add a pointer to the parameter row (parameter_row PTR) to the existing function structure.

Status RowFnLetIRBuilder::Build(...) {// omit STD ::vector< STD ::string> args; std::vector<::llvm::Type*> args_llvm_type; args_llvm_type.push_back(::llvm::Type::getInt64Ty(module->getContext())); args_llvm_type.push_back(::llvm::Type::getInt8PtrTy(module->getContext())); args_llvm_type.push_back(::llvm::Type::getInt8PtrTy(module->getContext())); args_llvm_type.push_back(::llvm::Type::getInt8PtrTy(module->getContext())); / / the parameters of a new Type int8ptr args_llvm_type push_back (: : LLVM: : Type: : getInt8PtrTy (module - > getContext () - > getPointerTo ()); / /... }Copy the code

Thus, when parameter expressions are supported, the codeGen function structure looks like this:

; ModuleID = 'sql' source_filename = "sql" define i32 @__internal_sql_codegen_6(i64 /*row key id*/, i8* /*row ptr*/, I8 */ *rows PTR */, i8* /*parameter PTR PTR */, i8** /*output row PTR PTR */) {__fn_entry__: //Copy the code

Parameter expression codegen

The argument row follows the OpenMLDB encoding format just like a normal data row. The 0th element of the argument row is the first parameter in the parameter query statement, the first element is the second parameter, and so on. Therefore, evaluating a parameter expression is essentially reading the corresponding position of the parameter from the parameter row.

// Get paramter item from parameter row // param parameter // param output // return Status ExprIRBuilder::BuildParameterExpr(const ::hybridse::node::ParameterExpr* parameter, NativeValue* output) { // ... VariableIRBuilder variable_ir_builder(ctx_->GetCurrentBlock(), ctx_->GetCurrentScope()->sv()); NativeValue parameter_row; Parameter_row CHECK_TRUE(variable_ir_Builder. LoadParameter(& parameter_ROW, status), kCodegenError, status.msg); / /... CHECK_TRUE(buf_Builder.buildgetfield (parameter->position()-1, slice_ptr, slice_size, output), kCodegenError, "Fail to get ", parameter->position(), "th parameter value") return base::Status::OK(); }Copy the code

Thus, the Filter node condition col1

; ModuleID = 'sql' source_filename = "sql" define i32 @__internal_sql_codegen_6(i64, i8*, i8*, i8*, i8**) { __fn_entry__: %is_null_addr1 = alloca i8, align 1 %is_null_addr1 = alloca i8, align 1 %is_null_addr1 = alloca i8, align 1 col5} %5 = call i8* @hybridse_storage_get_row_slice(i8* %1, i64 0) %6 = call i64 @hybridse_storage_get_row_slice_size(i8* %1, I64 0) // Get field row[1] col1 %7 = call i32@hybridse_storage_get_int32_field (i8* %5, i32 1, i32 7, I8 * nonNULL % is_NULl_addr) %8 = load i8, i8* %is_null_addr, align 1 paramter_row = {? 1} %9 = call i8* @hybridse_storage_get_row_slice(i8* %3, i64 0) %10 = call i64 @hybridse_storage_get_row_slice_size(i8* %3, I64 0) // Get field of paramter_row[0] call i32@hybridse_storage_get_int32_field (i8* %9, i32 0, i32 7, I8 * nonnull % is_NULl_addr1) %12 = load i8, i8* %is_null_addr1, align 1 = 0, %8 1 %14 = icmp sle i32 %7, %11 // ... Store i1 %14, i1* %20, align 1 ret i32 0}Copy the code

We won’t go into the details of CodeGen here. Technical articles on Codegen design and optimization will be updated in the future. If you’re interested, stay tuned to the OpenMLDB technology column.

4. Run: executes the query statement

After the query statement is compiled, the compilation product is stored in the current RunSession.
RunSession provideRunThe interface supports query statement execution. For a parameterized query, additional information about the argument row is passed in to execute the query compared to a normal query.

session.run(parameter_row, outputs)
Copy the code

Parameters of the lineparamter_rowWill be stored in aRun contextRunContextIn:

RunnerContext ctx(&sql_ctx.cluster_job, parameter_row, is_debug_);
Copy the code

In a parameterized query, the evaluation of an expression may depend on dynamically passed parameters. Therefore, when executing the plan, we need to take the argument rows from the run context and carry them into the expression function to evaluate. Take the TableProject node as an example,
For a normal query, implementing TableProject is to iterate over each row in the table and then do something for each rowRowProjectOperation. In a parameterized query scenario, because expression evaluation may depend on parameters as well as data rows. So, we need to get the argument row from the run row below, and thenproject_gen_.Gen(iter->GetValue(), parameter).

std::shared_ptr<DataHandler> TableProjectRunner::Run( RunnerContext& ctx, const std::vector<std::shared_ptr<DataHandler>>& inputs) { // ... // Get argument rows from the run context (if not, get an empty line pointer auto& parameter = ctx.getparameterRow (); iter->SeekToFirst(); int32_t cnt = 0; while (iter->Valid()) { if (limit_cnt_ > 0 && cnt++ >= limit_cnt_) { break; Output_table ->AddRow(project_gen_.Gen(iter->GetValue(), parameter)); output_table->AddRow(project_gen_. iter->Next(); } return output_table; } const Row ProjectGenerator::Gen(const Row& row, const Row& parameter) { return CoreAPI::RowProject(fn_, row, parameter, false); }Copy the code

CoreAPI::RowProjectFunction data rows and argument rows to evaluate the list of expressions. Its most important job is to call the FN function. The fn function is a function derived from the expression list Codegen at compile time of the query statement. In the sectionCode generation of expressionsAs described in 3.4 Codegen: Expression code generation, we added a parameter line pointer to the parameter list of Codegen functions.

// Computes the list of expressions based on the input data rows and parameter rows and outputs hybridse::codec::Row CoreAPI::RowProject(const RawPtrHandle fn, const hybridse::codec::Row row, const hybridse::codec::Row parameter, Auto udf = reinterpret_cast<int32_t (*)(const int64_t, const int8_t*, const int8_t* /*paramter row*/, const int8_t*, int8_t**)>( const_cast<int8_t*>(fn)); auto row_ptr = reinterpret_cast<const int8_t*>(&row); auto parameter_ptr = reinterpret_cast<const int8_t*>(&parameter); int8_t* buf = nullptr; uint32_t ret = udf(0, row_ptr, nullptr, parameter_ptr, &buf); / / is omitted part of the code return Row (base: : RefCountedSlice: : CreateManaged (buf, hybridse: : codec: : RowView: : GetSize (buf))); }Copy the code

Future jobs

The PreparedStatement is precompiled on the server tablet, where the compiled results are cached. In the next query, the compiled result can be reused as long as the SQL statement and parameter type match successfully. But this means that every time the client executes a query, it needs to transfer the SQL statement and parameter types to the server tablet. When the query statement is long, this overhead is easy to store. Therefore, there is still room for improvement in our design. Consider generating a unique pre-compiled query QID on the server that is passed back to the client and stored in the context of PrepareStatemetn. As long as the type of the query parameter does not change, the client can perform the query with QID and parameter. This reduces the transmission overhead of the query statement.

std::shared_ptr<hybridse::sdk::ResultSet> 
      ExecuteSQLParameterized(const std::string& db, const std::string& qid,
                              std::shared_ptr<SQLRequestRow> parameter,
                              ::hybridse::sdk::Status* status) override;
Copy the code

More developers are welcome to pay attention to and participate in OpenMLDB open source projects.