Last time we talked about how Validator converts an abstract syntax tree (AST) generated by Parser into an execution plan. This time, we’ll look at how an execution plan is generated.
An overview of the
The Planner is an Execution Plan generator that generates an unoptimized Execution Plan for an Executor to execute from a semantically valid query syntax tree verified by the Validator. The execution plan is then passed to the Optimizer to generate an optimized execution plan, which is ultimately handed to Executor. An execution plan consists of a series of planNodes.
Source directory structure
Heavy/heavy/heavy/heavy/heavy/heavy/heavy/heavy/heavy/Heavy/Heavy/Heavy/Heavy/Heavy/Heavy/Heavy PlannersRegister. H ├ ─ ─ SequentialPlanner. CPP ├ ─ ─ SequentialPlanner. H └ ─ ─ the testCopy the code
Among them, the SubPlan data structure and several interfaces to Planner are defined in Planner.
struct SubPlan {
// root and tail of a subplan.
PlanNode* root{nullptr};
PlanNode* tail{nullptr};
};
Copy the code
The PlannerRegister is responsible for registering available planners, and Nebula Graph currently registers SequentialPlanner, PathPlanner, LookupPlanner, GoPlanner, MatchPlanner.
The statement corresponding to SequentialPlanner is SequentialSentences, and SequentialSentence is a composite statement consisting of multiple sentences and spaced semicolons. Each statement could be a GO/LOOKUP/MATCH statement, so SequentialPlanner generates multiple plans by calling several other statements’ planners and connecting them end to end with Validator::appendPlan.
The match directory defines the connection strategy between Planner and SubPlan for Opencypher-related statements and clauses such as match, UNWIND, WITH, RETURN, WHERE, ORDER BY, SKIP, and LIMIT. SegmentsConnector connects subplans end to end into a complete plan using connection policies (AddInput, addDependency, innerJoinSegments, etc.) based on the relationship between them.
SRC/planner/match ├ ─ ─ AddDependencyStrategy. CPP ├ ─ ─ AddDependencyStrategy. H ├ ─ ─ AddInputStrategy. CPP ├ ─ ─ AddInputStrategy. H ├ ─ ─ CartesianProductStrategy. CPP ├ ─ ─ CartesianProductStrategy. H ├ ─ ─ CypherClausePlanner. H ├ ─ ─ EdgeIndexSeek. H ├ ─ ─ Expand. CPP ├ ─ ─ Expand. H ├ ─ ─ InnerJoinStrategy. CPP ├ ─ ─ InnerJoinStrategy. H ├ ─ ─ LabelIndexSeek. CPP ├ ─ ─ Labelindexseek.h ├─ LeftouterJoinStrategy.h ├─ MatchClausePlanner. CPP ├─ MatchClausePlanner. H ├─ MatchPlanner ├── MatchPlanner. H ├─ MatchPlanner. H ├─ MatchPlanner. H ├─ MatchPlanner ├─ Index Index.cpp ├─ Index Index.cpp ├─ Index Index.cpP ├─ Index Index.cpP ├─ Index index.cpP ├─ Index index.cpP ├─ Index index.cpP ├─ ReturnClausePlanner. H ├ ─ ─ SegmentsConnector. CPP ├ ─ ─ SegmentsConnector. H ├ ─ ─ SegmentsConnectStrategy. H ├ ─ ─ ├── UnionStrategy. H ├─ unwind planner. H ├─ unwind planner .cpP Heavy Exercises ── Heavy Exercises for WhereClausePlanner.CPP Heavy Exercises ── Heavy Exercises for WhereClausePlanner ├── ├─ class-scienceplannerCopy the code
The NGQL directory defines NGQL statement related planners (e.g., GO, LOOKUP, FIND PATH)
SRC /planner/ NGQL ├─ GoPlanner. CPP ├─ GoPlanner. H ├─ LookupPlanner PathPlanner.hCopy the code
The Plan directory defines seven categories and more than 100 plan nodes.
SRC/Planner /plan ├─ Admin. CPP ├─ Admin. H ├─ Algo Log.cpp Exercises ── Log.h Exercises ── ─ Heavy metal exercises Query.cpp ├─ query.h ├─ sc.hCopy the code
Description of some nodes:
- Admin is a database management node
- Algo is the node related to algorithms such as path and subgraph
- Logic is a logical control node, such as loop, binary selection, etc
- Maintain is schema related node
- Mutate is a DML-related node
- Query is the node associated with the Query computation
- Scan is an index that scans related nodes
Each PlanNode generates an Executor in the Executor phase, each Executor responsible for a specific function.
Eg. GetNeighbors:
static GetNeighbors* make(QueryContext* qctx,
PlanNode* input,
GraphSpaceID space,
Expression* src,
std::vector<EdgeType> edgeTypes,
Direction edgeDirection,
std::unique_ptr<std::vector<VertexProp>>&& vertexProps,
std::unique_ptr<std::vector<EdgeProp>>&& edgeProps,
std::unique_ptr<std::vector<StatProp>>&& statProps,
std::unique_ptr<std::vector<Expr>>&& exprs,
bool dedup = false.bool random = false,
std::vector<storage::cpp2::OrderBy> orderBy = {},
int64_t limit = - 1,
std::string filter = "")
Copy the code
GetNeighbors is a semantic encapsulation of kv for storage layer edges: it finds the end of an edge based on the starting point of an edge of a given type. GetNeighbors gets edge properties (edgeProps) during edge finding. Because the outgoing edge is stored on the same partition (data slice) as the starting point, we can also easily obtain the properties of the edge starting point (vertexProps).
Aggregate nodes:
static Aggregate* make(QueryContext* qctx, PlanNode* input, std::vector
&& groupKeys = {}, std::vector
&& groupItems = {})
*>
*>
Copy the code
The Aggregate node is an Aggregate compute node that groups groups based on groupKeys and calculates Aggregate values based on groupItems as intra-group values.
Loop nodes:
static Loop* make(QueryContext* qctx,
PlanNode* input,
PlanNode* body = nullptr,
Expression* condition = nullptr);
Copy the code
Loop is a loop node that executes the PlanNode fragment between body and the nearest start node until condition is false.
InnerJoin nodes:
static InnerJoin* make(QueryContext* qctx,
PlanNode* input,
std::pair<std::string, int64_t> leftVar,
std::pair<std::string, int64_t> rightVar,
std::vector<Expression*> hashKeys = {},
std::vector<Expression*> probeKeys = {})
Copy the code
The InnerJoin node inlines two tables (Table, DataSet). LeftVar and rightVar are used to reference the two tables, respectively.
The entry function
The Planner entry function is the Validator::toPlan
Status Validator::toPlan(a) {
auto* astCtx = getAstContext(a);if(astCtx ! =nullptr) {
astCtx->space = space_;
}
auto subPlanStatus = Planner::toPlan(astCtx);
NG_RETURN_IF_ERROR(subPlanStatus);
auto subPlan = std::move(subPlanStatus).value(a); root_ = subPlan.root; tail_ = subPlan.tail;VLOG(1) < <"root: " << root_->kind() < <" tail: " << tail_->kind(a);return Status::OK(a); }Copy the code
Specific steps
1. Call getAstContext ()
The first call to getAstContext() gets the AST context that has been overwritten by the Validator. These context-related data structures are defined in SRC /context.
├── ├.h └── ├.h └─ ├.h └── ├.hCopy the code
struct AstContext {
QueryContext* qctx; // Context for each query request
Sentence* sentence; // Ast of the query statement
SpaceInfo space; / / the current space
};
Copy the code
CypherAstContext defines the AST context for openCypher syntax, and QueryAstContext defines the AST context for nGQL syntax.
2. Call Planner: : toPlan (astCtx)
Then call Planner::toPlan(astCtx), find the Planner whose statement is registered in the PlannerMap based on the AST context, and generate the corresponding execution plan.
Each Plan consists of a series of PlanNodes, among which there are two major relationships: execution dependence and data dependence.
-
Execution dependencies: From the execution sequence, plan is a directed acyclic graph, and the dependencies between nodes are determined when the plan is generated. For each node in execution phase, actuators will generate a corresponding operator, and starting from the root node scheduling, found this node depend on other nodes, at this time first node recursive calls to rely on, have been found no dependent nodes (the Start node), and then began to perform, execute this node, continue to implement this node is dependent on the other nodes, All the way to the root node.
-
Data dependencies: Data dependencies of nodes are generally the same as execution dependencies, that is, outputs from the previous node that scheduled execution. Some nodes, such as InnerJoin, have multiple inputs, so its input might be the output of a node several nodes apart from it.
(Solid lines for execution dependencies, dashed lines for data dependencies)
For example
Using MatchPlanner as an example, let’s look at how an execution plan is generated:
Statement:
MATCH (v:player)-[:like*2..4]-(v2:player)\
WITH v, v2.age AS age ORDER BY age WHERE age > 18\
RETURN id(v), age
Copy the code
This statement is overwritten by the MatchValidator and outputs a tree consisting of a context.
= >
Each Clause and SubClause corresponds to a context:
enum class CypherClauseKind : uint8_t {
kMatch,
kUnwind,
kWith,
kWhere,
kReturn,
kOrderBy,
kPagination,
kYield,
};
struct CypherClauseContextBase : AstContext {
explicit CypherClauseContextBase(CypherClauseKind k) : kind(k) {}
virtual ~CypherClauseContextBase() = default;
const CypherClauseKind kind;
};
struct MatchClauseContext final : CypherClauseContextBase {
MatchClauseContext() : CypherClauseContextBase(CypherClauseKind::kMatch) {}
std::vector<NodeInfo> nodeInfos; // Vertex information involved in pattern
std::vector<EdgeInfo> edgeInfos; // Edge information involved in pattern
PathBuildExpression* pathBuild{nullptr}; // Construct an expression for path
std::unique_ptr<WhereClauseContext> where; // filter SubClause
std::unordered_map<std::string, AliasType>* aliasesUsed{nullptr}; // Enter the alias information
std::unordered_map<std::string, AliasType> aliasesGenerated; // Alias information generated}; .Copy the code
And then:
1. Find the statement planner
Find the corresponding statement planner, which is of type Match. Find the planner MatchPlanner for this statement in PlannersMap.
2. Generate the plan
Call MatchPlanner:: Transform method to generate plan:
StatusOr<SubPlan> MatchPlanner::transform(AstContext* astCtx) {
if (astCtx->sentence->kind() != Sentence::Kind::kMatch) {
return Status::Error("Only MATCH is accepted for match planner.");
}
auto* matchCtx = static_cast<MatchAstContext*>(astCtx);
std::vector<SubPlan> subplans;
for (auto& clauseCtx : matchCtx->clauses) {
switch (clauseCtx->kind) {
case CypherClauseKind::kMatch: {
auto subplan = std::make_unique<MatchClausePlanner>()->transform(clauseCtx.get());
NG_RETURN_IF_ERROR(subplan);
subplans.emplace_back(std::move(subplan).value());
break;
}
case CypherClauseKind::kUnwind: {
auto subplan = std::make_unique<UnwindClausePlanner>()->transform(clauseCtx.get());
NG_RETURN_IF_ERROR(subplan);
auto& unwind = subplan.value().root;
std::vector<std::string> inputCols;
if(! subplans.empty()) {
auto input = subplans.back().root;
auto cols = input->colNames(a);for (auto col : cols) {
inputCols.emplace_back(col);
}
}
inputCols.emplace_back(unwind->colNames().front());
unwind->setColNames(inputCols);
subplans.emplace_back(std::move(subplan).value());
break;
}
case CypherClauseKind::kWith: {
auto subplan = std::make_unique<WithClausePlanner>()->transform(clauseCtx.get());
NG_RETURN_IF_ERROR(subplan);
subplans.emplace_back(std::move(subplan).value());
break;
}
case CypherClauseKind::kReturn: {
auto subplan = std::make_unique<ReturnClausePlanner>()->transform(clauseCtx.get());
NG_RETURN_IF_ERROR(subplan);
subplans.emplace_back(std::move(subplan).value());
break;
}
default: { return Status::Error("Unsupported clause."); }}}auto finalPlan = connectSegments(astCtx, subplans, matchCtx->clauses);
NG_RETURN_IF_ERROR(finalPlan);
return std::move(finalPlan).value(a); }Copy the code
Match statements may consist of multiple match /UNWIND/WITH/RETURN clauses, so in the transform, the corresponding ClausePlanner is directly called to generate a SubPlan according to the type of Clause. Finally, SegmentsConnector connects them according to various connection policies.
In our example statement,
MatchClause: Match (v:player)-[:like*2..4]-(v2:player) MatchClause::transform
StatusOr<SubPlan> MatchClausePlanner::transform(CypherClauseContextBase* clauseCtx) {
if(clauseCtx->kind ! = CypherClauseKind::kMatch) {return Status::Error("Not a valid context for MatchClausePlanner.");
}
auto* matchClauseCtx = static_cast<MatchClauseContext*>(clauseCtx);
auto& nodeInfos = matchClauseCtx->nodeInfos;
auto& edgeInfos = matchClauseCtx->edgeInfos;
SubPlan matchClausePlan;
size_t startIndex = 0;
bool startFromEdge = false;
NG_RETURN_IF_ERROR(findStarts(matchClauseCtx, startFromEdge, startIndex, matchClausePlan));
NG_RETURN_IF_ERROR(
expand(nodeInfos, edgeInfos, matchClauseCtx, startFromEdge, startIndex, matchClausePlan));
NG_RETURN_IF_ERROR(projectColumnsBySymbols(matchClauseCtx, startIndex, matchClausePlan));
NG_RETURN_IF_ERROR(appendFilterPlan(matchClauseCtx, matchClausePlan));
return matchClausePlan;
}
Copy the code
The transform method is divided into the following steps:
- Find a starting point for expansion:
There are currently three strategies for finding a starting point, registered with startVidFinders by Planner:
// MATCH(n) WHERE id(n) = value RETURN n
startVidFinders.emplace_back(&VertexIdSeek::make);
// MATCH(n:Tag{prop:value}) RETURN n
// MATCH(n:Tag) WHERE n.prop = value RETURN n
startVidFinders.emplace_back(&PropIndexSeek::make);
// seek by tag or edge(index)
// MATCH(n: tag) RETURN n
// MATCH(s)-[:edge]->(e) RETURN e
startVidFinders.emplace_back(&LabelIndexSeek::make);
Copy the code
Of the three strategies, VertexIdSeek is the best, which can determine the specific starting point VID; PropIndexSeek follows, which is converted to an IndexScan with the filter attribute; LabelIndexSeek will be converted to an IndexScan.
The findStarts function iterates through the information of all Nodes in the match pattern for each starting point search policy until a node that can serve as the starting point is found and generates corresponding Plan Nodes for finding the starting point.
The point-finding strategy for the sample statement is LabelIndexScan, where the starting point is determined to be V. Finally, an IndexScan node is generated with the index on the Player tag.
- According to the starting point and match pattern, carry out multi-step expansion:
In the example, the match pattern of the sentence is: (v:player)-[:like*1..2]-(v2:player), starting with v, extending one or two steps along the edge like, and ending with a Player type tag.
Do the extension first:
Status Expand::doExpand(const NodeInfo& node, const EdgeInfo& edge, SubPlan* plan) {
NG_RETURN_IF_ERROR(expandSteps(node, edge, plan));
NG_RETURN_IF_ERROR(filterDatasetByPathLength(edge, plan->root, plan));
return Status::OK(a); }Copy the code
The Loop body is expandStep, which extends one step based on the given starting point, and GetNeighbors is generated for each step. The end of each expansion step is the starting point of the next expansion step, and the cycle continues until the maximum number of steps specified in the pattern is reached.
In the expansion of the m-th step, the end of the path with the length of M-1 obtained previously is taken as the starting point of the expansion, and a path with the step length of 1 consisting of the starting point of the edge and the edge itself is constructed according to the expansion results. We then perform an InnerJoin with the previous path of step m-1 to get a set of paths with step M.
Then call to filter the group of paths, remove the path with repeated edges (openCypher path extension does not allow repeated edges), and finally use the end output of the path as the starting point of the next expansion. The next expansion continues with the above steps until the maximum number of steps specified in the Max is reached.
After loop, the UnionAllVersionVar node will be generated to combine the path of 1 to M steps constructed by loop body each time. FilterDatasetByPathLength () function generates a Filter node Filter out the step length is less than the match the pattern in the specified minimum number of path.
(v)-like-()-e-(v)-? Also missing is attribute information for the end point of the last step. Therefore, we also need to generate a GetVertices node, and then perform an InnerJoin with the previous M-step PATH. The result is a set of paths matching the match pattern.
The principle of multi-step match expansion will be explained in more detail in the article Variable Length Pattern Match.
// Build Start node from first step
SubPlan loopBodyPlan;
PlanNode* startNode = StartNode::make(matchCtx_->qctx);
startNode->setOutputVar(firstStep->outputVar());
startNode->setColNames(firstStep->colNames());
loopBodyPlan.tail = startNode;
loopBodyPlan.root = startNode;
// Construct loop body
NG_RETURN_IF_ERROR(expandStep(edge,
startNode, // dep
startNode->outputVar(), // inputVar
nullptr,
&loopBodyPlan));
NG_RETURN_IF_ERROR(collectData(startNode, // left join node
loopBodyPlan.root, // right join node
&firstStep, // passThrough
&subplan));
// Union node
auto body = subplan.root;
// Loop condition
auto condition = buildExpandCondition(body->outputVar(), startIndex, maxHop);
// Create loop
auto* loop = Loop::make(matchCtx_->qctx, firstStep, body, condition);
// Unionize the results of each expansion which are stored in the firstStep node
auto uResNode = UnionAllVersionVar::make(matchCtx_->qctx, loop);
uResNode->setInputVar(firstStep->outputVar());
uResNode->setColNames({kPathStr});
subplan.root = uResNode;
plan->root = subplan.root;
Copy the code
- Print table, confirm table column name:
Use all named symbols in the match pattern as table column names to generate a table for use in subsequent clauses. This generates a Project node.
WithClause::transform creates SubPlan:
WITH v, v2.age AS age ORDER BY age WHERE age > 18
Copy the code
The WITH clause yields v and v2.age as a table, sorts age as a sort item, and filters the sorted table.
The YIELD section generates a Project node, the ORDER BY section generates a Sort node, and the WHERE section generates a Filter node.
The third clause is the Return clause, which generates a Project node.
RETURN id(v), age
Copy the code
The complete execution plan of the final integration statement is shown below:
The above is the introduction of this article.
Ac graph database technology? To join the Nebula Exchange group, please fill out your Nebula card and Nebula Assistant will bring you into the group