This is the fourth day of my participation in the November Gwen Challenge. Check out the details: The last Gwen Challenge 2021
StructBERT
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
StructBERT is a BERT improvement of Ali’s model, which has achieved good results and currently ranks the second in GLUE rankings
First of all, let’s look at the following two sentences in English and Chinese
i tinhk yuo undresatnd this sentneces.
Research table research Ming, Chinese characters order is not fixed a shadow read ring read. For example, after you read this sentence, you realize that all the words in this sentence are messy
In fact, the above two sentences are out of order
This is where structBERT’s ideas for improvement come from. For a person, the order of words or characters does not affect reading, nor does a model. A good LM needs to know how to correct its own errors
StructBERT’s model architecture is the same as BERT’s. It is improved by adding two new pre-training objectives: Word Structural Objective and Sentence Structural Objective in the case of existing MLM and NSP tasks
Word Structural Objective
The partial and molecular sequences were randomly selected from the unmasked sequence (the hyperparameter KKK was used to determine the length of the subsequence), and the word order in the subsequence was scrambled, and the model was allowed to reconstruct the original word order
Where θ\thetaθ represents the parameters of the model, and it is hoped that there is maximum likelihood of restoring the subsequence to the correct order
- The model must learn to reconstruct more intrusive data, a difficult task
- In Smaller KKK, the model must learn to recreate less intrusive data and the task is simple
In this paper, K=3K=3K=3 is set, which is better for single sentence task
Sentence Structural Objective
Given sentence pairs (S1,S2), judge that S2 is the next sentence of S1, the last sentence, and the unrelated sentence (three classification questions)
When sampling, for one sentence S, the probability of 13\frac{1}{3}31 samples the next sentence pair of S, the probability of 13\frac{1}{3}31 samples the last sentence of S, and the probability of 13\frac{1}{3}31 randomly samples the sentence pair of another document
Ablation Studies
Ablation studies were performed on two proposed pre-training tasks to verify the effectiveness of each task
As shown in the figure above, these two tasks have a significant impact on the performance of most downstream tasks (except SNLI)
- The first three are single-sentence tasks, and it can be seen that Word Structural Objective has a great influence on them
- The last three are Sentence pairs, and you can see that Sentence Structural Obejctive has a lot of influence on them
Afterword.
Unfortunately, I couldn’t find a pre-trained StructBERT model on Github
Reference
- StructBERT research
- StructBERT (ALICE), rounding
- StructBert Review