“This is my fourth day of participating in the First Challenge 2022. For more details: First Challenge 2022.”

Introduction:

Text2SQL learning arrangement (three) : SQLNet and TypeSQL models briefly introduce the baseline after the introduction of WikiSQL data set. At that time, pre-trained language models such as BERT were not widely used in various NLP tasks, so the author basically used BI-LSTM as the basic component to build the model.

This blog introduces two approaches to WIkiSQL dataset challenges with pre-trained language model BERT: SQLOVA and X-SQL. Among them, SQLOVA’s work was published in NIPS 2019, which shows that more and more people are paying attention to this task.

A Comprehensive Exploration on WikiSQL with Table-AwareWord Contextualization

The innovation points

SQLOVA’s work is innovative in two ways:

  1. Using pre-training models to achieve human performance;
  2. Human representations on WikiSQL datasets are given.

The proposed framework

The whole framework consists of two parts:

  • Table-aware Encoding Layer, which is used to get Table and context-aware question word representation;
  • NL2SQL Layer, used to generate SQL queries from encoded representations.

Input module

The figure above shows a problem-SQL example of a WikiSQL dataset. A table’s schema (table names, column names, and so on in a graph) is structural information, and with a pre-training model like BERT, all inputs need to be converted into a sequence. SQLOVA is converted as follows:

As you can see, the sequence begins with query (the user’s natural language Question), followed by the names of the individual columns of the database. Because WIkiSQL is a single-table database, there is no need to encode table names. During serialization, SQLOVA utilizes special tokens such as [CLS] and [SEP] from BERT as separators.

As shown in the figure above, enter the table-aware Encoding Layer to encode the natural language query with all the headers of the entire Table to indicate that there is an interaction between the question and the information in the Table.

NL2SQL module

Similar to the previous SQLNet, SQLOVA uses the idea of slot filling when generating SQL statements, again using six sub-modules to generate each part of the SQL statement.

The six sub-modules of the NL2SQL layer do not share parameters and use pointer instead Network infer Where_val, which trains a module to infer start and end. Where_val relies not only on where_col but also on where_op(for example, text does not have the type > or <). When combining question and header, A concatenation operation is used instead of a summation operation.

Execution-guided decoding

SQLOVA uses the execution-guided Decoding technique to reduce unexecutable Query statements.

The so-called execution-guided decoding is to check whether the GENERATED SQL sequence is a grammatically correct SQL statement during the output and return result, so that the SQL statement finally output by the model can be executed without grammatical errors. It is executed by feeding the executor the SQL queries from the candidate list in order, discarding those that fail to execute or return empty results. This technique can be referred to the paper Robust text-to-SQL Generation with execution-guided Decoding.

The experimental results

Human performance is also presented in the paper. According to the results, SQLova has exceeded human performance.

Ablation Study

At the same time, sufficient ablation experiments were carried out to verify the effectiveness of each module. It can be seen that the introduction of pre-training model BERT has greatly improved the results. That is, the context in which words are used contributes greatly to the ACC of logical forms.

X-SQL: reinforce schema representation with context

The innovation points

X-sql is also a way to leverage a pre-trained language model. It uses contextual output from bert-style pre-trained models to enhance structured Schema representations and learn a new schema representation along with type information in downstream tasks. Meanwhile, instead of BERT, MT-DNN (Multi-task Deep Neural Networks for Natural Language Understanding) is used. See fyubang.com/2019/05/23/…). Pre-training model.

The proposed framework

The overall architecture of the model is shown in the figure above.

The model consists of three parts:

  1. Sequence encoder:
  2. Context-enhanced schema encoder
  3. Output layer

Sequence encoder

The sequence encoder is similar to BERT, but the main changes are as follows:

  1. Add a special EMPTY column to each table [EMPTY];
  2. Segment embedding is replaced with type embeding, including problem, category column, number column and special empty column.
  3. Use MT-DNN instead of Bert-Large for initialization.

Context-enhanced Schema encoder:

This module uses attention to get the hCi of each column according to the encoding of tokens in each column of the table. The vector of multiple token outputs corresponding to each column name is aggregated and mixed with the information in the [CTX] token.

Output layer

As before, the final output is decomposed into six sub-tasks, each with a simpler structure, using LayerNorm, hCi for schema and hCTX for context.

The results of

With these improvements, X-SQL increased performance to over 90%, surpassing SQLOVA.

conclusion

In this paper, we introduce two methods of using BERT (MT-DNN), a pre-trained language model, to represent the representation of context relation between SCHem and Question. Witness the rapid development of today’s NLP technology.