Abstract: This paper is a preliminary interpretation of ACL2021 NER modular interactive network for named entity recognition.

This article to share from huawei cloud community “ACL2021 NER | modular interactive network used for named entity recognition, the author: JuTzungKuei.

Paper: Li Fei, Wang Zheng, Hui Siu Cheung, LiaoLejian, Song Dandan, Xu Jing, He Guoxiu, Jia Meihuizi. Modularized InteractionNetwork for Named Entity Recognition [A]. Proceedings of the 59th AnnualMeeting of the Association for Computational Linguistics and the 11thInternational Joint Conference on Natural Language Processing (Volume 1: LongPapers) [C]. Online: Association for Computational Linguistics, 2021, 200 — 209.

Link: aclanthology.org/2021.acl-lo…

This code:

1, the

  • Shortcomings of the existing NER model

The NER model based on sequence labeling is poor in long entity recognition and only focuses on word level information

Section-based NER model: Deals with segments, not individual words, and cannot capture word-level dependencies in segments

  • Boundary detection and type prediction can cooperate with each other, and the two subtasks can share information and reinforce each other

  • Proposed ModularizedInteraction Network (MIN) model

At the same time, segment-level information and word-level dependency are used

An interactive mechanism is combined to support information sharing between boundary detection and type prediction

  • SOTA was achieved on three base data sets

2, introduce

  • NER: Finds and classifies named entities, person (PER), location

  • (LOC) or organization (ORG), downstream tasks: relationship extraction, entity linking, problem generation, co-citation parsing

  • Two kinds of methods

Sequence labeling can capture word-level dependencies

Segment (a span of words) : Can handle long entities

  • NER: Detects the entity boundary and named entity type,

Divided into two subtasks: boundary detection and type prediction

The two tasks are related and can share information

  • For example: XX is from New York University

If you know that university is an entity boundary, you are more likely to predict that the type is ORG

If you know the entity has an ORG type, you are more likely to predict the “university” boundary

  • The two common methods above do not share information between subtasks

Sequence labeling: Use only boundaries and types as tags

Segmentation: First detect segments and then divide types

  • This paper proposes MIN model: NER module, boundary module, type module and interaction mechanism

As the decoder of the boundary module, the pointer network captures the segment level information of each word

The segment-level information and word-level information are input into the sequence labeling model

Divide NER into two tasks: boundary detection, type prediction, and using different encoders

A mutually reinforcing interaction mechanism is proposed to fuse all information into the NER module

The three modules share word representation, using multi-tasking training

  • Major contributions:

A new model, MIN, utilizes both segment-level information and word-level dependency

Boundary detection and type prediction are divided into two sub-tasks, and the information of the two sub-tasks is shared by the interaction mechanism

Three base data sets have reached SOTA

3, methods,

  • NER module: RNn-Bilstm-CRF, reference Neuralarchitectures for named entity recognition

Word representation: word (BERT) + char (BiLSTM)

BiLSTM coding: bidirectional LSTM, interactive mechanism instead of direct cascade, and dynamic control of gated functions

The final NER output: H^{NER}=W^T[H;H^B;H^T;H^S] + b_HNER_=WT[H;HB;HT;HS]+_b_H^{Bdy}HBdy indicates boundary module output, H^{Type}HType indicates Type module output, and H^{Seg}HSeg indicates segment information

CRF decoding: transfer probability + emission probability

  • Boundary module: bidirectional LSTM encoding H^{Bdy}HBdy, one-way LSTM decoding

Decoding: S_j = h_} {j – 1 ^ {Bdy} + h_ ^ {j} {Bdy} + h_ + 1} {j ^ {Bdy} _sj_ = _hj_ – 1 _bdy_ + _hjBdy_ + hj + 1 _bdy_d_j = LSTM (s_j, d_} {j – 1) _dj_ = LSTM (_sj_, _dj_ – 1)

Biaffine Attention mechanism:

  • Type module: BiLSTM + CRF

  • Interaction mechanism:

Self attention gets the tag enhanced boundary H^{b-e}HB_−_E, type H^{t-e}HT_−_E

BiaffineAttention calculated score \alpha^{b-e}αB_−_E

After the interaction of boundary: r_i ^ – E {B} = \ sum_ ^ {j = 1} {n} \ alpha_ {I, j} ^ – E {B} h_j ^ – E {T} riB_ – _E_ = ∑ _j = 1 _n__ alpha i_, _jB_ – _E__hjT_ – _E_

The boundary of the updated: \ overline {h} _i ^ = {Bdy} [h_i ^ – E {B}, r_i ^ – E {B}] _hiBdy_ = [_hiB_ – _E_, _riB_ – _E_]

The updated Type: \ overline {h} _i ^ = {Type} [h_i ^ – E {T}, r_i ^ – E {T}] _hiType_ = [_hiT_ – _E_, _riT_ – _E_]

  • Joint training: multi-tasking

Loss function for each task

Final loss function:

\mathcal{L}=\mathcal{L}^{NER}+\mathcal{L}^{Type}+\mathcal{L}^{Bdy}L=L_NER_+L_Type_+L_Bdy_

4, the results

  • Baseline (sequence labeling-based)

CNN-BiLSTM-CRF

RNN-BiLSTM-CRF

ELMo-BiLSTM-CRF

Flair(char-BiLSTM-CRF)

BERT-BiLSTM-CRF

HCRA(CNN-BiLSTM-CRF)

  • Baseline (segment-based)

BiLSTM-Pointer

HSCRF

MRC+BERT

Biaffine+BERT

Extra extra extra: want to know more about AI technology dry goods, welcome to huawei cloud AI zone, there are AI programming Python and other six actual combat camp for you to learn for free.

Click follow to learn about the fresh technologies of Huawei Cloud