“This is the 15th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

The Open source community of Hugging Face, a New York-based chatbot service that focuses on NLP technology, offers a wide selection of open source pretraining models, especially its Github open source Transformers, which has a star size of over 50K.

The official website is here: huggingface.co/

This article will use the pipelines provide QuestionAnsweringPipelineAPI, through the way of question and answer to extract the information in the sentence, although the extraction effect is generally less code extraction speed, specific effect is as follows:

Figure 1 Extraction effect of MRC_Roberta model

Figure 2. Extraction effect of QA_Roberta model

You can see that QA or MRC model extraction is still very fast and basically 300ms, which is about 0.3 seconds to get the answer, so we can use a few lines of code to get a very simple information extraction module or question and answer module.

Pipelines

This article briefly introduces the Charge function of Hugging Face. Pipelines are a good and simple way to use models for reasoning. < span style = “box-sizing: border-box; line-height: 22px; display: block; word-break: inherit! Important; word-break: inherit! Important; word-break: inherit! Important;

  • AudioClassificationPipeline

  • AutomaticSpeechRecognitionPipeline

  • ConversationalPipeline

  • FeatureExtractionPipeline

  • FillMaskPipeline

  • ImageClassificationPipeline

  • ImageSegmentationPipeline

  • ObjectDetectionPipeline

  • QuestionAnsweringPipeline

  • SummarizationPipeline

  • TableQuestionAnsweringPipeline

  • TextClassificationPipeline

  • TextGenerationPipeline

  • Text2TextGenerationPipeline

  • TokenClassificationPipeline

  • TranslationPipeline

  • ZeroShotClassificationPipeline

Piepline official website documentation is here: official documentation

See the task Summary for examples.

The implementation process

Three steps:

1. Go to Hugging Face to download Bert, QA or MRC models in Chinese and save them in a select directory:

MRC_Roberta model: huggingface co/luhua/chine…

QA_Roberta model: huggingface co/uer/Roberta,…

2. Load Tokenizer and Model. Which Tokenizer and Model can be used to query documentsAutoModelForQuestionAnsweringandAutoTokenizer

import time 
from transformers import AutoModelForQuestionAnswering,AutoTokenizer,pipeline
import torch, json, logging
logging.getLogger().setLevel(logging.INFO)

model = AutoModelForQuestionAnswering.from_pretrained("./PretrainModel/luhua_ChinesePretrainMRC_RobertaWwmExtLarge/")
tokenizer = AutoTokenizer.from_pretrained("./PretrainModel/luhua_ChinesePretrainMRC_RobertaWwmExtLarge/")
QA_model = pipeline('question-answering', model=model, tokenizer=tokenizer)
Copy the code

3. Create a Pipline and format the questions and body to get the result

%%time
QA_model2 = pipeline('question-answering', model=model, tokenizer=tokenizer)
QA_input = {'question': "The bad side of Diabetes.".'context': Diabetes is a group of metabolic diseases characterized by hyperglycemia. Hyperglycemia is caused by defective insulin secretion, impaired insulin biology, or both. Long-term hyperglycemia leads to chronic damage and dysfunction of various tissues, especially the eyes, kidneys, heart, blood vessels and nerves.}
QA_model(QA_input)

"" Wall time: 305 ms {'score': 2.974448580062017E-05, 'start': 83, 'end': 92, 'answer': 'chronic impairment, dysfunction '} ""
Copy the code

Can you return more than one answer?

  • You only need to configure parameters

Example:

Parameter interpretation

Pipeline (” question – answering “) will jump to QuestionAnsweringPipeline this class, finally will return a dictionary or a dictionary list, we basically see __call__ parameters in this method,

  • Top_k: Number of answers returned, default is 1; If a sufficient number of answers are not returned in the body, less than topk answers are returned
  • Doc_stride: Controls the size of overlapping blocks. The default value is 128
  • Max_answer_len: The maximum length of the predicted answer. The default is 15
  • Max_seq_len: total length of combined sentences (body + question) after word segmentation. Default is 384
  • Max_question_len: the maximum length of a question after segmentation. The default is 64
  • Handle_impossible_answer: It can accept impossible as an answer. The default value is False

The function is very interesting, what fun can be more try, NLP meng new part-time daydream top player, uneducated, there are mistakes or imperfect place, please criticize!!