This is the 12th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021
Discrete Prompts
Firstly, discrete Prompt Mining. This article published in TACL 2020 describes how to use pre-training language model as “knowledge base” and introduces dependency tree and Paraphrase to mine better “templates”. The following is the experimental result
As you can see, several of the mined “join predicates” are significantly better than the artificially designed “templates”
There are many ways to achieve Prompt paraphrasing, such as “paraphrase”, we look at an example through DeepL translation:
So we get a Prompt Paraphrasing of X shares a border with Y: X and Y share a boundary
BARTScore simply provided us with a list of synonym substitutions for various phrases, which I’m all too familiar with because I’ve memorized similar things on English exams before
Gradient-based Search (gradient-based Search) was proposed by AUTOPROMPT, which was published in EMNLP 2020. Its main idea can be expressed in the figure below
In the figure above, a real joy is the original input sentence xinpx_{\text{inp}}xinp, Trigger tokens in red xtrigx_{\text{trig}}xtrig are “excited” by xinpx_{\text{inp}}xinp. Xtrigx_ {\text{trig}}xtrig and xinpx_{\text{inp}}xinp are combined to construct the final input xpromptx_{\text{prompt}}xprompt, which is fed to Masked LM to predict the emotion tag. The table below adds many examples of other NLP tasks
Xtrigx_ {\text{trig}}xtrig sets are generated using HotFlip and adversarial training. White-box Adversarial examples for text classification, Universal Adversarial Triggers for Attacking and Analyzing NLP these two papers
Prompt Generation is a work of Chen danqi’s team, which mainly applies the Seq2Seq pre-training model T5 to the template search process. T5 performs pre-training based on a variety of unsupervised objectives, one of the most effective unsupervised objectives is to replace one or more consecutive spans with
or
and generate corresponding outputs. Such as:
Thank you <X> me to your party <Y> week
Copy the code
T5 will generate for inviting at
and last at
. Obviously, T5 is a good way to generate templates without specifying the token number of the template. Specifically, there are three possible ways of generation
The specific template generation process is shown in the following figure:
First, fill bits <X> and <Y> (the three generation methods mentioned above) are added before and after the tag words, and then they are fed into the T5 model, which automatically generates sequences at the fill bits. Finally, the tag words (great or Terribel) are converted into [MASK] tags to form multiple templates. In the specific process, Beam Search method was used to generate multiple candidate templates, and then the dev set was used to fine-tune each candidate template and select one of the best templates
I would also like to mention another interesting point in this paper. The final sentences sent into the model for prediction were spliced with “Demonstration” of each category, as shown in the figure below
The design of Prompt is a bit like doing semantic similarity task. XXX is the original Input sentence, YYY is known as a positive example, ZZZ is a negative example, and the following form of Input is constructed:
X is for example [MASK]? Y is a positive example; Z is negativeCopy the code
This is a bit like a programming language triadic operator, or the equivalent of having the model compare the semantic similarity of XXX with YYY or ZZZ. Here we naturally want to ask: how YYY, ZZZ are selected? It’s actually based on two rules:
- For each raw input sentence, a sample “sample” is randomly sampled from each category and spliced into Prompt
- For each original input Sentence, in each category, a sample “example” is randomly selected from the top 50% samples with the highest similarity by similarity calculation with sentence-bert