This is the fourth in a series of TiDB source readings. The last article gave a brief overview of the overall process, and basically whatever statement runs under this framework, DDL statements are no exception.
This article will use the Insert statement as an example to help you understand the previous article, and the next article will cover the process of executing the Select statement. These two are the most commonly used reading and writing sentences. The rest of the sentences are believed to be understood by readers. They can either study by themselves or wait for the following articles. For these two types of statements, only the core flow is explained at present, and more complex Join, Insert-Into- onduplicate-update and so on will be explained in a later article. In addition, this article will focus on the specific execution logic of each statement under the execution framework. Please understand the behavior of Insert statements before reading this article.
Table structure
Here we begin with a table structure on which the following SQL statements operate.
CREATE TABLE t {
id VARCHAR(31),
name VARCHAR(50),
age int.key id_idx (id)};Copy the code
The Insert statement
INSERT INTO t VALUES (“pingcap001”, “pingcap”, 3); Take this statement as an example to explain how Insert works.
Statement processing flow
Parser, Plan, and Executor. Next() is used to drive the actual execution of an SQL statement. For frameworks, each type of statement is similar; Each statement has its own processing logic for each core step.
Syntax parsing
Parser (Parser, Insert) Parser (Parser, Insert) Parser
// InsertStmt is a statement to insert new rows into an existing table.
/ / See https://dev.mysql.com/doc/refman/5.7/en/insert.html
type InsertStmt struct {
dmlNode
IsReplace bool
IgnoreErr bool
Table *TableRefsClause
Columns [](#)*ColumnName
Lists [](#)[](#)ExprNode
Setlist [](#)*Assignment
Priority mysql.PriorityEnum
OnDuplicate [](#)*Assignment
Select ResultSetNode
}
Copy the code
The statement in question is relatively simple, involving only the Table and Lists fields — which Table to insert what data into. Where Lists is a two-dimensional array, and each row in the array corresponds to a row of data. This statement contains only one row of data. Now that you have an AST, all the processing, validation, and permission checking is bypassed (the logic for each statement is the same), so let’s look at the logic for Insert statements.
Query plan
The next step is to convert the AST into a Plan structure, which is done in planBuilder.buildinsert (). There are two main parts to this simple statement:
-
Complete Schema information
Including Database/Table/Column information, this statement does not specify which columns to insert data into, so all columns are used.
-
Process data in Lists
This will go through all the values and convert ast.ExprNode to expression. expression, which is incorporated into our expression framework, which will be evaluated later. In most cases, Value is a Constant, that is, expression.constant.
If the Insert statement is complex, such as the data to be inserted from a Select or OnDuplicateUpdate, more processing is done, but the reader can execute buildInsert() instead of going into detail here.
Now that ast.insertSTmt has been converted to the plan.Insert structure, there is nothing to optimize for Insert statements. The plan.Insert structure only implements the plan interface, so in the following statement, Not going into the Optimize process:
if logic, ok := p.(LogicalPlan); ok {
return doOptimize(builder.optFlag, logic)
}
Copy the code
Other simpler statements will not go into doOptimize, such as Show statements. The next article will cover Select statements, which will involve the doOptimize function.
perform
Once you have the plan.insert structure, the query plan is complete. Finally, let’s look at how Insert is performed.
First, plan.insert is converted to the executor.insertexec structure, which is used for subsequent execution. The first step is to evaluate each row in which data is inserted. For details, see getRows. After getting the data, we enter the most important logic — insertexec.exec (). However, considering only the SQL described in our article, we can simplify the code to the following logic:
for _, row := range rows {
h, err := e.Table.AddRecord(e.ctx, row, false)
}
Copy the code
Let’s look at how the AddRecord function writes a row of data to the storage engine. To understand this code, you need to understand how TiDB maps SQL data to key-values. You can read some of our previous articles, such as this one. Assuming you already know the background, you’ll need to write the key-value of the Row and Index to the storage engine.
The code to construct Index data is in the addIndices() function, which calls the index.create () method:
Create Index Key;func (c *index) GenIndexKey(sc *stmtctx.StatementContext, indexedValues [](#)types.Datum.h int64.buf [](#)byte) (key [](#)byte.distinct bool.err error){... key = c.getIndexKeyBuf(buf,len(c.prefix)+len(indexedValues)*9+9)
key = append(key, [](#)byte(c.prefix)...)
key, err = codec.EncodeKey(sc, key, indexedValues...)
if! distinct && err ==nil {
key, err = codec.EncodeKey(sc, key, types.NewDatum(h))
}
Copy the code
Construct Index Value:func (c *index) Create(ctx context.Context, rm kv.RetrieverMutator, indexedValues [](#)types.Datum.h int64) (int64, error) {
if! distinct {// non-unique index doesn't need store value, write a '0' to reduce space
err = rm.Set(key, [](#)byte'0')
return 0, errors.Trace(err)
}
......
if skipCheck {
err = rm.Set(key, encodeHandle(h))
return 0, errors.Trace(err)
}
Copy the code
The code for constructing Row data is relatively simple, in the tables.AddRecord function:
Create Row Key: Key: = t. ordkey (recordID)Copy the code
To construct the Row Value: writeBufs.RowValBuf, err = tablecodec.EncodeRow(ctx.GetSessionVars().StmtCtx, row, colIDs, writeBufs.RowValBuf, writeBufs.AddRowValues)Copy the code
Once constructed, the key-value is written to the current transaction cache by calling something like this:
iferr = txn.Set(key, value); err ! =nil {
return 0, errors.Trace(err)
}
Copy the code
These key-values can be committed to the storage engine during the transaction.
summary
Insert statements are the simplest of the many DML statements, and this article does not cover the more complex cases of Insert statements, so they are relatively easy to understand. With all this code, let’s walk through the process again with a picture.
This article describes how to write data, but how does TiDB delete data? That is, Delete statement is what the execution process is like, please track the source code, investigate the process, interested readers can imitate this article to write a source code analysis document, contribute to us.
The next article will cover the process of executing Select statements, not only in the SQL layer, but also in how the Coprocessor module works.
Author: Shen Li