Abstract: This paper is a preliminary interpretation of ACL2021 NER modular interactive network for named entity recognition.
This article to share from huawei cloud community “ACL2021 NER | modular interactive network used for named entity recognition, the author: JuTzungKuei.
Paper: Li Fei, Wang Zheng, Hui Siu Cheung, LiaoLejian, Song Dandan, Xu Jing, He Guoxiu, Jia Meihuizi. Modularized InteractionNetwork for Named Entity Recognition [A]. Proceedings of the 59th AnnualMeeting of the Association for Computational Linguistics and the 11thInternational Joint Conference on Natural Language Processing (Volume 1: LongPapers) [C]. Online: Association for Computational Linguistics, 2021, 200 — 209.
Link: aclanthology.org/2021.acl-lo…
This code:
1, the
- Shortcomings of the existing NER model
The NER model based on sequence labeling is poor in long entity recognition and only focuses on word level information
Section-based NER model: Deals with segments, not individual words, and cannot capture word-level dependencies in segments
-
Boundary detection and type prediction can cooperate with each other, and the two subtasks can share information and reinforce each other
-
Proposed ModularizedInteraction Network (MIN) model
At the same time, segment-level information and word-level dependency are used
An interactive mechanism is combined to support information sharing between boundary detection and type prediction
- SOTA was achieved on three base data sets
2, introduce
-
NER: Finds and classifies named entities, person (PER), location
-
(LOC) or organization (ORG), downstream tasks: relationship extraction, entity linking, problem generation, co-citation parsing
-
Two kinds of methods
Sequence labeling can capture word-level dependencies
Segment (a span of words) : Can handle long entities
- NER: Detects the entity boundary and named entity type,
Divided into two subtasks: boundary detection and type prediction
The two tasks are related and can share information
- For example: XX is from New York University
If you know that university is an entity boundary, you are more likely to predict that the type is ORG
If you know the entity has an ORG type, you are more likely to predict the “university” boundary
- The two common methods above do not share information between subtasks
Sequence labeling: Use only boundaries and types as tags
Segmentation: First detect segments and then divide types
- This paper proposes MIN model: NER module, boundary module, type module and interaction mechanism
As the decoder of the boundary module, the pointer network captures the segment level information of each word
The segment-level information and word-level information are input into the sequence labeling model
Divide NER into two tasks: boundary detection, type prediction, and using different encoders
A mutually reinforcing interaction mechanism is proposed to fuse all information into the NER module
The three modules share word representation, using multi-tasking training
- Major contributions:
A new model, MIN, utilizes both segment-level information and word-level dependency
Boundary detection and type prediction are divided into two sub-tasks, and the information of the two sub-tasks is shared by the interaction mechanism
Three base data sets have reached SOTA
3, methods,
- NER module: RNn-Bilstm-CRF, reference Neuralarchitectures for named entity recognition
Word representation: word (BERT) + char (BiLSTM)
BiLSTM coding: bidirectional LSTM, interactive mechanism instead of direct cascade, and dynamic control of gated functions
The final NER output: H^{NER}=W^T[H;H^B;H^T;H^S] + b_HNER_=WT[H;HB;HT;HS]+_b_H^{Bdy}HBdy indicates boundary module output, H^{Type}HType indicates Type module output, and H^{Seg}HSeg indicates segment information
CRF decoding: transfer probability + emission probability
- Boundary module: bidirectional LSTM encoding H^{Bdy}HBdy, one-way LSTM decoding
Decoding: S_j = h_} {j – 1 ^ {Bdy} + h_ ^ {j} {Bdy} + h_ + 1} {j ^ {Bdy} _sj_ = _hj_ – 1 _bdy_ + _hjBdy_ + hj + 1 _bdy_d_j = LSTM (s_j, d_} {j – 1) _dj_ = LSTM (_sj_, _dj_ – 1)
Biaffine Attention mechanism:
-
Type module: BiLSTM + CRF
-
Interaction mechanism:
Self attention gets the tag enhanced boundary H^{b-e}HB_−_E, type H^{t-e}HT_−_E
BiaffineAttention calculated score \alpha^{b-e}αB_−_E
After the interaction of boundary: r_i ^ – E {B} = \ sum_ ^ {j = 1} {n} \ alpha_ {I, j} ^ – E {B} h_j ^ – E {T} riB_ – _E_ = ∑ _j = 1 _n__ alpha i_, _jB_ – _E__hjT_ – _E_
The boundary of the updated: \ overline {h} _i ^ = {Bdy} [h_i ^ – E {B}, r_i ^ – E {B}] _hiBdy_ = [_hiB_ – _E_, _riB_ – _E_]
The updated Type: \ overline {h} _i ^ = {Type} [h_i ^ – E {T}, r_i ^ – E {T}] _hiType_ = [_hiT_ – _E_, _riT_ – _E_]
- Joint training: multi-tasking
Loss function for each task
Final loss function:
\mathcal{L}=\mathcal{L}^{NER}+\mathcal{L}^{Type}+\mathcal{L}^{Bdy}L=L_NER_+L_Type_+L_Bdy_
4, the results
- Baseline (sequence labeling-based)
CNN-BiLSTM-CRF
RNN-BiLSTM-CRF
ELMo-BiLSTM-CRF
Flair(char-BiLSTM-CRF)
BERT-BiLSTM-CRF
HCRA(CNN-BiLSTM-CRF)
- Baseline (segment-based)
BiLSTM-Pointer
HSCRF
MRC+BERT
Biaffine+BERT
Extra extra extra: want to know more about AI technology dry goods, welcome to huawei cloud AI zone, there are AI programming Python and other six actual combat camp for you to learn for free.
Click follow to learn about the fresh technologies of Huawei Cloud