Abstract: This paper is a preliminary interpretation of the ACL2021 NER Bert-based hidden Markov model for multi-source weakly supervised named entity recognition.

Share this article from huawei cloud community “ACL2021 NER | BERT hidden markov model is used for multi-source weak supervision and named entity recognition”, author: JuTzungKuei.

Paper: Li Yinghao, Shetty Pranav, Liu Lucas, ZhangChao, Song Le. BERTifying the Hidden Markov Model for Multi-Source WeaklySupervised Named Entity Recognition[A]. Proceedings of the 59th Annual Meetingof the Association for Computational Linguistics and the 11th InternationalJoint Conference on Natural Language Processing (Volume 1: Long Papers) [C].Online: Association for Computational Linguistics, 2021, 6178 — 6190.

Link: aclanthology.org/2021.acl-lo…

Code: github.com/Yinghao-Li/…

1, the

  • Study content: Using noise labels with multiple weakly supervised data to learn NER

  • Noise data: incomplete, inaccurate and contradictory

  • A conditionalhidden Markov model (CHMM) is proposed.

The classic HMM model is enhanced by using BERT’s context representation capability

Learn the transfer and emission probability of words from BERT embedding and infer the potential true tags

  • The alternative training method (CHMM-ALT) was used to further improve CHMM

The labels derived from CHMM are used to fine-tune the Bert-Ner model

The output of Bert-Ner acts as an additional weak source to train CHMM

  • SOTA was achieved on four data sets

2, introduce

  • NER is the basis of many downstream information extraction tasks: event extraction, relational extraction, question and answer

There is supervision and a large amount of annotated data is required

Many fields have sources of knowledge: knowledge bases, domain dictionaries, and annotation rules

It can be used to match corpus and quickly generate large-scale noise training data from multiple angles

  • Remote supervision NER: Only the knowledge base is used as the weak supervision, and the complementary information of multi-source annotation is not used

  • The existing HMM method has limitations: one-hot word vector or no modeling

  • Contribution:

CHMM: aggregates multi-source weak labels

Alternating training method CHMM-ALT: CHMM and Bert-NER are trained in turn, using each other’s output for multi-loop to optimize multi-source weakly supervised NER performance

SOTA was obtained from four benchmark data sets

3, methods,

  • Chmm-alt trains two models: the multi-source tag aggregator CHMM and the Bert-Ner model, which in turn act as each other’s output

Phase I: CHMM based on K source x_} {1: K ^ {} (1: T) _x_1: K (1: T), and generate a denoising label y ^ (1: T)} {* _y_ ∗ (1: T), fine-tuning the BERT – NER model output \ widetilde {} y ^ {} (1: T) _y_ (1: T), As a source of additional annotations, added to the original weak label collection x_ + 1} {1: K ^ = {(1: T)} \ {x_} {1: K ^ {(1: T)}, \ widetilde {} y ^ {(1: T)} \} _x_1: K + 1 (1: T) = {_x_1: K (1: T), _y_ (1: T)}

Phase II: CHMM and Bert-Ner improve each other over several cycles, with CHMM first trained and Bert-Ner fine-tuned to update the former’s input

CHMM improves Precision and Bert-NER improves Recall

  • Hidden Markov model

Do not thin

4, the results

Extra extra extra: want to know more about AI technology dry goods, welcome to huawei cloud AI zone, there are AI programming Python and other six actual combat camp for you to learn for free.

Click follow to learn about the fresh technologies of Huawei Cloud