This paper introduces a new language representation model BERT, a bi-directional encoder representation from Transformer. Unlike recent language representation models, BERT aims to pre-train deep bidirectional representations based on all layers of left and right contexts. BERT is the first fine-tuning representational model to achieve current optimal performance in a large number of sentence-level and token level tasks, outperforming many systems using task-specific architectures and breaking the current optimal performance record for 11 NLP tasks.
BERT related Resources
The application method and experimental results of adding BERT in Chinese and small data sets
The title | instructions | additional |
---|---|---|
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Original paper | 20181011 |
Reddit discussion | The authors discuss | |
BERT-pytorch | Google AI 2018 BERT pytorch implementation | |
BERT Model and Fine-Tuning | Xi Xiangyu’s thesis interpretation | |
The strongest NLP pre-training model! Google BERT swept 11 NLP task records | Paper analyses | |
【NLP】Google BERT | Lee is obsessed with reading | |
How to evaluate the BERT model? | Read the paper thought point | |
Detailed interpretation of BERT model of NLP breakthrough achievement | Octopus Maruko reads | |
BERT interpretation of Google’s strongest NLP model | AI Technology Review | |
To pre-train BERT, this is how they solved it with TensorFlow before the official code was released | Reappearance of the paper | 20181030 |
Google has finally opened the BERT code: 300 million entries and a full understanding of the heart of the machine | 20181101 | |
Why does Bert work miracles? | 20181121 | |
The ultimate in BERT Fine-Tune practice | BERT’s guide to Fine Tune on Chinese data sets | 20181123 |
BERT is an open source implementation that delivers significant improvements with minimal data | 39 | 20181127 |
The essence of BERT’s paper
Model structure
One of the main modules, Transformer, comes from Attention Is All You Need
Model input
Pre-training method
Mask the language model (cloze) and predict the next sentence task.
The experiment
Model analysis
Effect of Pre-training Tasks
Effect of Model Size
Effect of Number of Training Steps
Feature-based Approach with BERT
conclusion
Recent empirical improvements due to transfer learning with language models have demonstrated that rich, unsupervised pre-training is an integral part of many language understanding systems. Inparticular, these results enable even low-resource tasks to benefit from very deep unidirectional architectures.Our major contribution is further generalizing these findings to deep bidirectional architectures, allowing the same pre-trained model to successfully tackle a broad set of NLP tasks. While the empirical results are strong, in some cases surpassing human performance, important future work is to investigate the linguistic phenomena that may or may not be captured by BERT.
BERT related Resources
The title | instructions | additional |
---|---|---|
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Original paper | 20181011 |
Reddit discussion | The authors discuss | |
BERT-pytorch | Google AI 2018 BERT pytorch implementation | |
BERT Model and Fine-Tuning | Xi Xiangyu’s thesis interpretation | |
The strongest NLP pre-training model! Google BERT swept 11 NLP task records | Paper analyses | |
【NLP】Google BERT | Lee is obsessed with reading | |
How to evaluate the BERT model? | Read the paper thought point | |
Detailed interpretation of BERT model of NLP breakthrough achievement | Octopus Maruko reads | |
BERT interpretation of Google’s strongest NLP model | AI Technology Review | |
To pre-train BERT, this is how they solved it with TensorFlow before the official code was released | Reappearance of the paper | 20181030 |
Google has finally opened the BERT code: 300 million entries and a full understanding of the heart of the machine | 20181101 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (Submitted on 11 Oct 2018)
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, Including pushing the GLUE benchmark to 80.4% (7.6% Absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD V1.1 question answering Test F1 to 93.2 (1.5%) Absolute improvement), Outperforming human performance by 2.0%. Comments: 13 pages
Abstract: This paper introduces a new language representation model BERT, which means Bidirectional Encoder representation from Transformer. Recent models of language representation (Peters et al., 2018; Radford et al., 2018), BERT aims to pre-train the depth bidirectional representation based on the left and right contexts of all layers. Thus, pre-trained BERT representations can be fine-tuned with just an additional output layer to create the current optimal model for many tasks, such as question answering and linguistic inference tasks, without much modification to the task-specific architecture.
BERT’s concept is simple, but the experimental effects are powerful. It updates current optimal results for 11 NLP tasks, including increasing GLUE benchmark to 80.4% (an absolute improvement of 7.6%), increasing MultiNLI accuracy to 86.7% (an absolute improvement of 5.6%), And increased SQuAD V1.1’s F1 score on the Q&A test to 93.2 points (up 1.5 points) — two points higher than human performance.
Subjects: Computation and Language (cs.CL) Cite as: arXiv:1810.04805 [cs.CL] (or arXiv:1810.04805v1 [cs.CL] for this version) Bibliographic data Select data provider: Semantic Scholar [Disable Bibex(What is Bibex?)] No data available yet Submission history From: Jacob Devlin [view email] [v1] Thu, 11 Oct 2018 00:50:01 GMT (227kb,D)
Reddit discussion
Google-research Bert officially reappears
Google recently released a large-scale pre-training language model based on two-way Transformer, which can efficiently extract text information and apply it to various NLP tasks. This study broke the current record of optimal performance for 11 NLP tasks using the pre-training model. If this kind of pre-training holds up in practice, and NLP missions can perform very well with very little data, BERT will become a veritable backbone network.
Introduction
BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.
Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: Arxiv.org/abs/1810.04… .
To give a few numbers, here are the results on the SQuAD v1.1 question answering task:
SQuAD v1.1 Leaderboard (Oct 8th 2018) | Test EM | Test F1 |
---|---|---|
1st Place Ensemble – BERT | 87.4 | 93.2 |
2nd Place Ensemble – nlnet | 86.0 | 91.7 |
1st Place Single Model – BERT | 85.1 | 91.8 |
2nd Place Single Model – nlnet | 83.5 | 90.1 |
And several natural language inference tasks:
System | MultiNLI | Question NLI | SWAG |
---|---|---|---|
BERT | 86.7 | 91.1 | 86.3 |
OpenAI GPT (Prev. SOTA) | 82.2 | 88.1 | 75.0 |
Plus many other tasks.
Moreover, these results were all obtained with almost no task-specific neural network architecture design.
If you already know what BERT is and you just want to get started, you can download the pre-trained models and run a state-of-the-art fine-tuning in only a few minutes.
Repetition bert_language_understanding
Pre-training of Deep Bidirectional Transformers for Language Understanding
Repetition BERT – keras
Keras implementation of BERT(Bidirectional Encoder Representations from Transformers)
Repetition pytorch – pretrained – BERT
PyTorch version of Google AI’s BERT model with script to load Google’s pre-trained models.
BERT’s data set, GLUE
GLUE: A Multi-task Benchmark and Analysis Platform for Natural Language Understanding
Abstract
For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems.