TiDB source read series of articles (four) Insert statement overview

This is the fourth in a series of TiDB source readings. The last article gave a brief overview of the overall process, and basically whatever statement runs under this framework, DDL statements are no exception.

This article will use the Insert statement as an example to help you understand the previous article, and the next article will cover the process of executing the Select statement. These two are the most commonly used reading and writing sentences. The rest of the sentences are believed to be understood by readers. They can either study by themselves or wait for the following articles. For these two types of statements, only the core flow is explained at present, and more complex Join, Insert-Into- onduplicate-update and so on will be explained in a later article. In addition, this article will focus on the specific execution logic of each statement under the execution framework. Please understand the behavior of Insert statements before reading this article.

Table structure

Here we begin with a table structure on which the following SQL statements operate.


CREATE TABLE t {
id      VARCHAR(31),
name VARCHAR(50),
age    int.key id_idx (id)};Copy the code

The Insert statement

INSERT INTO t VALUES (“pingcap001”, “pingcap”, 3); Take this statement as an example to explain how Insert works.

Statement processing flow

Parser, Plan, and Executor. Next() is used to drive the actual execution of an SQL statement. For frameworks, each type of statement is similar; Each statement has its own processing logic for each core step.

Syntax parsing

Parser (Parser, Insert) Parser (Parser, Insert) Parser

// InsertStmt is a statement to insert new rows into an existing table.
/ / See https://dev.mysql.com/doc/refman/5.7/en/insert.html
type InsertStmt struct {
    dmlNode

    IsReplace   bool
    IgnoreErr   bool
    Table       *TableRefsClause
    Columns     [](#)*ColumnName
    Lists       [](#)[](#)ExprNode
    Setlist     [](#)*Assignment
    Priority    mysql.PriorityEnum
    OnDuplicate [](#)*Assignment
    Select      ResultSetNode
}

Copy the code

The statement in question is relatively simple, involving only the Table and Lists fields — which Table to insert what data into. Where Lists is a two-dimensional array, and each row in the array corresponds to a row of data. This statement contains only one row of data. Now that you have an AST, all the processing, validation, and permission checking is bypassed (the logic for each statement is the same), so let’s look at the logic for Insert statements.

Query plan

The next step is to convert the AST into a Plan structure, which is done in planBuilder.buildinsert (). There are two main parts to this simple statement:

Complete Schema information

Including Database/Table/Column information, this statement does not specify which columns to insert data into, so all columns are used.
Process data in Lists

This will go through all the values and convert ast.ExprNode to expression. expression, which is incorporated into our expression framework, which will be evaluated later. In most cases, Value is a Constant, that is, expression.constant.

If the Insert statement is complex, such as the data to be inserted from a Select or OnDuplicateUpdate, more processing is done, but the reader can execute buildInsert() instead of going into detail here.

Now that ast.insertSTmt has been converted to the plan.Insert structure, there is nothing to optimize for Insert statements. The plan.Insert structure only implements the plan interface, so in the following statement, Not going into the Optimize process:

    if logic, ok := p.(LogicalPlan); ok {
        return doOptimize(builder.optFlag, logic)
    }
Copy the code

Other simpler statements will not go into doOptimize, such as Show statements. The next article will cover Select statements, which will involve the doOptimize function.

perform

Once you have the plan.insert structure, the query plan is complete. Finally, let’s look at how Insert is performed.

First, plan.insert is converted to the executor.insertexec structure, which is used for subsequent execution. The first step is to evaluate each row in which data is inserted. For details, see getRows. After getting the data, we enter the most important logic — insertexec.exec (). However, considering only the SQL described in our article, we can simplify the code to the following logic:

    for _, row := range rows {
            h, err := e.Table.AddRecord(e.ctx, row, false)
	}
Copy the code

Let’s look at how the AddRecord function writes a row of data to the storage engine. To understand this code, you need to understand how TiDB maps SQL data to key-values. You can read some of our previous articles, such as this one. Assuming you already know the background, you’ll need to write the key-value of the Row and Index to the storage engine.

The code to construct Index data is in the addIndices() function, which calls the index.create () method:

Create Index Key;func (c *index) GenIndexKey(sc *stmtctx.StatementContext, indexedValues [](#)types.Datum.h int64.buf [](#)byte) (key [](#)byte.distinct bool.err error){... key = c.getIndexKeyBuf(buf,len(c.prefix)+len(indexedValues)*9+9)
    key = append(key, [](#)byte(c.prefix)...)
    key, err = codec.EncodeKey(sc, key, indexedValues...)
    if! distinct && err ==nil {
        key, err = codec.EncodeKey(sc, key, types.NewDatum(h))
    }
Copy the code

Construct Index Value:func (c *index) Create(ctx context.Context, rm kv.RetrieverMutator, indexedValues [](#)types.Datum.h int64) (int64, error) {
    if! distinct {// non-unique index doesn't need store value, write a '0' to reduce space
        err = rm.Set(key, [](#)byte'0')
        return 0, errors.Trace(err)
    }
......
    if skipCheck {
        err = rm.Set(key, encodeHandle(h))
        return 0, errors.Trace(err)
    }
Copy the code

The code for constructing Row data is relatively simple, in the tables.AddRecord function:

Create Row Key: Key: = t. ordkey (recordID)Copy the code

To construct the Row Value: writeBufs.RowValBuf, err = tablecodec.EncodeRow(ctx.GetSessionVars().StmtCtx, row, colIDs, writeBufs.RowValBuf, writeBufs.AddRowValues)Copy the code

Once constructed, the key-value is written to the current transaction cache by calling something like this:

    iferr = txn.Set(key, value); err ! =nil {
        return 0, errors.Trace(err)
    }
Copy the code

These key-values can be committed to the storage engine during the transaction.

summary

Insert statements are the simplest of the many DML statements, and this article does not cover the more complex cases of Insert statements, so they are relatively easy to understand. With all this code, let’s walk through the process again with a picture.

This article describes how to write data, but how does TiDB delete data? That is, Delete statement is what the execution process is like, please track the source code, investigate the process, interested readers can imitate this article to write a source code analysis document, contribute to us.

The next article will cover the process of executing Select statements, not only in the SQL layer, but also in how the Coprocessor module works.

Author: Shen Li

TiDB source read series of articles (four) Insert statement overview

Table structure

The Insert statement

Statement processing flow

Syntax parsing

Query plan

perform

summary

Related Posts

Description of the unified monitoring function for multi-cloud VMS

Baby, interview, technology advanced, fund: 30, 2020 | Denver annual essay

High performance website construction – Front-end performance optimization (Attached is the detailed plan of Vue first screen loading time optimization)