A, Rasa

Rasa is an open source machine learning framework for building contextual AI assistants and chatbots. Rasa has two main modules:

  • Rasa NLU: for understanding user messages, including intent recognition and entity recognition, which converts user input into structured data.
  • Rasa Core: is a conversation management platform for holding conversations and deciding what to do next.

Rasa X is a tool that helps you build, improve, and deploy AI Assistants powered by the Rasa framework. Rasa X includes a user interface and REST apis.

Build your contextual chatbots and AI assistants with Rasa

Github address: RasaHQ/ Rasa

PIP install

pip install rasa_nlu
pip install rasa_core[tensorflow]
Copy the code

This figure shows the basic steps of how a helper built using Rasa responds to a message:

The term

  • intentsIntention:
  • pipeline:
  • storyCore Model learns from real session data in the form of training “stories”. Stories are real conversations between users and assistants.
  • domain: defines the universe in which the assistant resides: what user input it should receive, what actions it should be able to predict, how it should respond, and what information it should store

Second, the Rasa_NLU

Rasa NLU used to be a stand-alone library, but it is now part of the Rasa framework.

Rasa_NLU is an open source, locally-deployable corpus annotation tool called RASA NLU Trainer. It can support any language by itself, and Chinese needs to be added to a specific Tokenizer as part of the process because of its specificity.

Rasa NLU is used for intention recognition and entity extraction in chatbots. For example, the following sentence:

"I am looking for a Mexican restaurant in the center of town"

Return structured data:

{
  "intent": "search_restaurant"."entities": {
    "cuisine" : "Mexican"."location" : "center"}}Copy the code

Rasa_NLU_Chi, asa fork version of Rasa_NLU, joins Jieba asa Chinese tokenizer to achieve Chinese support.

This section briefly describes the process of building a locally deployed domain-specific Chinese NLU system based on Rasa_NLU_Chi.

The target

  • Enter: Test text in Chinese
  • Output: Structured data that identifies the corresponding intent and entity in the text

1. Pipeline

Rasa NLU supports different Pipeline implementations, including spaCy, MITIE, MITIE + SkLearn, and TensorFlow. SpaCy is the official recommended Pipeline implementation. MITIE has been Deprecated.

The pipeline used in this example is MITIE+Jieba+ SKlearn, and the configuration file of rasa NLU is config_jieba_MITIe_sklearn. Yml is as follows:

language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor_zh.dat"  // Load the mitie model
- name: "tokenizer_jieba"   // Use jieba to divide words
- name: "ner_mitie"   // Mitie named entity recognition
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_mitie"  // Feature extraction
- name: "intent_classifier_sklearn" // Sklearn's intent classification model
Copy the code

2. Preparation: Training MITIE model files

Since MITIE is used in pipeline, a trained MITIE model is needed (Chinese word segmentation is carried out first). MITIE model is obtained by unsupervised training, which is similar to Word embedding in Word2vec and requires a large amount of Chinese corpus. Training of this model requires high memory requirements and is very time-consuming, so model files generated by Chinese Wikipedia and Baidu Encyclopedia shared by netizens are directly used.

Link: pan.baidu.com/s/1kNENvlHL… Password: p4vx

3. Rasa_nlu corpora

After obtaining MITIE word vector model, Rasa NLU model can be trained with annotated corpus. Rasa provides a data annotation platform: RASA-NLU-Trainer

So what does labeled data look like?

The marked corpus is stored in json file in the following format, including text, intent, and entities. In the entity, start and end are the start and end indexes corresponding to the entity in text.

Data /examples/rasa/ Demo-rasa_zh. json is used as an example.

{
  "rasa_nlu_data": {
    "common_examples": [{"text": "Hello"."intent": "greet"."entities": []}, {"text": "I'm looking for a place to eat."."intent": "restaurant_search"."entities": []}, {"text": "I want to eat hot pot."."intent": "restaurant_search"."entities": [{"start": 2."end": 5,
            "value": "Hot pot"."entity": "food"}]}}Copy the code

4. Training model

Up to now, we have obtained the marked corpus required for training and MITIE file of word vector model. Now you can train the Rasa_NLU model.

Insert installation:

  • The source code to install
    $git clone https://github.com/crownpku/Rasa_NLU_Chi.git / / clone source $CD Rasa_NLU_Chi $python setup. Py install / / install dependenciesCopy the code

Model training command

python -m rasa_nlu.train -c sample_configs/config_jieba_mitie_sklearn.yml --data data/examples/rasa/demo-rasa_zh.json --path models --project nlu
Copy the code

Required parameters:

  • Training profile:-c
  • Training corpus:--data
  • Save path of model:--path
  • Project Name:--project

After model training is complete, the trained model file is saved in the path specified by –path. If the model name is specified during training (i.e. –project), the model is stored in the models/project_name/model_** directory. The models/chat_nlu_test/model_20190821-160150 structure is as follows:

5. Test and verify

  • Start the service
    python -m rasa_nlu.server -c sample_configs/config_jieba_mitie_sklearn.yml --path models
    Copy the code
  • Test the service (open a new terminal and use the curl command to retrieve the result)
    curl -XPOST localhost:5000/parse -d '{" q ":" the weather forecast for tomorrow ", "project" : "nlu", "model" : "model_20190821-160150"}'
    Copy the code

The results are as follows:

Third, Rasa Core

Rasa Core is a conversation engine for building AI assistants and is part of the open source Rasa framework.

The Rasa Core message processing flow is known from the conversation management module described above. It should be responsible for coordinating the various modules of the chatbot to maintain the structure and state of the human-machine conversation. The key technologies involved in the dialogue management module include dialogue behavior recognition, dialogue state recognition, dialogue strategy learning, behavior prediction, dialogue reward and so on. Here is the Rasa Core message processing flow:

  • First, the user’s input Message is passed toInterpreter(NLU module), which is responsible for identifying intents in messages and extracting entity data;
  • Second, Rasa Core passes the Interpreter extracted intent and recognitionTrackerObject, whose main purpose is to track conversation state;
  • Third, usepolicyrecordTrackerObject’s current state and select to execute the correspondingactionWhere, the action is recorded in the Track object;
  • Finally, the result output returned by the action is executed to complete a human-computer interaction.

Rasa Core contains two things: Stories and Domain.

  • domain.yml: includes the domain to which the dialog system is applicable, including intention sets, entity sets and corresponding sets
  • story.md: Training data set, original conversation mapping in domain.

1. Stories

Stories can be understood as the scene flow of the conversation, and we need to tell the machine what our multi-round scene is. The Story sample data is the sample to be trained by Rasa Core dialogue system, which describes the possible Story plots in the process of human-machine dialogue. The dialogue model required by the human-machine dialogue system can be obtained through the training of Stories samples and domains.

Stories are stored in an MD file. The symbol of the story is explained as follows:

symbol instructions
# # Sotry title
* Intended and filled slots
action

Example:

## simple_story_with_multiple_turns
* affirm OR thank_you
    - utter_default
* goodbye
    - utter_goodbye
> check_goodbye ## story_04649138 * greet - utter_ask_howcanhelp * inform{"location": "london", "people": "two", "price": "moderate"} - utter_on_it - utter_ask_cuisine * inform{"cuisine": "spanish"} - utter_ask_moreupdates * inform{"cuisine": "british"} - utter_ask_moreupdates * deny - utter_ack_dosearch - action_search_restaurants - action_suggest * affirm - utter_ack_makereservation * thankyou - utter_goodbyeCopy the code

As shown above, a few points to note:

  • > check_* : Used to modularize and simplify training data, i.e. story reuse.
  • OR: Used to deal with the possibility of more than two intentions in the same story, which is conducive to simplifying the story, but the corresponding training time equals to training more than two stories, so intensive use is not recommended.

Visual stories

Rasa Core provides the RASA_Core. Visualize story module to visualize the story process.

Command:

python -m rasa_core.visualize -d domain.yml -s data/sotries.md -o graph.html -c config.yml
Copy the code

Parameters:

  • -m: Specifies the run module
  • -d: Specifies the domain. Yml file path
  • -s: Specifies the story path
  • -o: Specifies the output file name
  • -c: Specifies the Policy configuration file.

Finally, you get a graph.html in the project root directory, which you can open in a browser.

For details, refer to the source code rasa/core/ VISUALIZE. Py.

2. Domain

Doma.yml defines all the information that a conversational robot should know, which is the brain frame, specifying intents, entities, slots, and actions. Its species, Intents and entities are consistent with those marked in Rasa NLU model training samples. Slot corresponds to flagged entities, and Actions are actions taken by the dialogue robot in response to user requests. In addition, the Templates section of domain.yml defines template messages for utter_ type actions so that the conversational robot can automatically respond to those actions.

item instructions
intents intentions
entities Entity information
slots Word slots, the information you want to track in a conversation
actions An action made by a robot
templates The template statement of the reply

Intention of the intents

Use the – symbol to indicate each intention

intents:
  - greet
  - goodbye
  - search_weather
Copy the code

Physical entities

Entities, that is, all entities marked out in the sample

entities:
  - city
Copy the code

Groove slot.

Slots are the memory of the robot. They act as key-value stores that can be used to store information provided by users (for example, their hometown) as well as information collected about the outside world (for example, the results of database queries). For the weather query, for example, the dialog bot must know the location and date before it can query. So in domain-yml, you need to define two slots in the slots section, city and Datatime, and matches to store the results of the last query. The following is an example:

slots:
  city:
	type: text
	initial_value:"Beijing"
  datatime:
	type:text
	initial_value:"Tomorrow"
  matches:
	type:unfeaturized
	initial_value:"none"
Copy the code

Slot Types:

  • Text SlotText:
  • Boolean Slot: a Boolean value
  • Categorical Slot: accepts the values listed in the enumeration
    slots:
       risk_level:
          type: categorical
          values:
          - low
          - medium
          - high
    Copy the code
  • Float Slot: floating-point
    slots:
       temperature:
          type: floatMax_value min_value: 100.0:100.0Copy the code

    Defaults: max_value = 1.0, min_value = 0.0After max_value and min_value are set, the values greater than max_value and less than min_value are set to max_value and min_value.

  • List Slot: column phenotype data, and the length does not affect the conversation
  • Unfeaturized Slot: Stores data that does not affect the session flow.

If the value itself is important, use the categorical or bool slot. And float and list slots. If you just want to store some data but don’t want it to affect the session flow, use the unfeaturized slot. Type indicates the data type stored in the slot. Initial_value indicates the initial value of the slot. The value is optional.

slots:
  name:
    type: text
    initial_value: "human"
  matches:
    type:unfeaturized
Copy the code

actions

When the Rasa NLU recognizes the intent of the user’s input information, the Rasa Core dialog management module will respond to it with action. Rasa Core supports three actions:

  • default actions: A set of default actions that can be used without definition
    • Action_listen: Listens for action
    • Action_restart: resets the status
    • Action_default_fallback: This action is executed by default when the Rasa Core confidence is lower than the set threshold.
  • utter actionsTo:utter_To send only one message to the user as feedback. The definition is simple, just in the domain-.yml fileactions:The field is defined asutter_To begin with. The specific response content will be defined intemplatesPart. If there is noutter_With this prefix, the action will be identified as Custom Actions.
    actions:
      - utter_greet
      - utter_cheer_up
    Copy the code
  • custom actions: Custom actions, which allow the developer to perform any action and report back to the user, are key to action rounds. Need to be indomain.ymlIn the fileactionsParts are defined first, then in the specifiedwebserverImplement it in. Among them, webserverurlThe address inendpoint.ymlSpecified in the file. A small Python SDK is available to make it easy for users to write custom actionsrasa_core_sdk. More on that later.
    actions:
      - action_search_weather
    Copy the code

templates

This time defines specific utter Actions reply contents, and each Utter Actions can define multiple reply messages. When a user initiates an intent, such as “Hello! When utter_greet operation is triggered, Rasa Core will automatically select one message from the action template as the result and feed it back to the user.

templates:
  utter_greet:
    - text: "Hello! May I help you?"
    - text: "Hello! Please state the specific business you want, for example, 'id number'."
    - text: "Hello!
Copy the code

Utter_default is the default ACTION_DEFAULt_fallback of Rasa Core. When the Rasa NLU recognizes the intention and its confidence is lower than the set threshold, the template in UTTER_default will be executed by default.

In addition to replying to a simple Text Message, Rasa Core also supports adding buttons and images after a Text Message, and accessing values in the slot (or returning None if the slot is filled). Here’s an example:

  utter_introduce_self:
    - text: "Hello! I am your AI robot."
      image: "https://i.imgur.com/sayhello.jpg"
  utter_introduce_selfcando:
    - text: "I can check the weather for you."
      buttons:
    	- title: "Good"
          payload: "ok"
   	- title: "Can't"
          payload: "no"
  utter_ask_city:
    - text: "Where do you want {datetime} for the weather?"  
  utter_ask_datetime:
    - text: "What day would you like to check {city}?"
Copy the code

An example:

intents:
  - greet
  - goodbye
  - affirm
  - deny
  - search_weather
 
slots:
  city:
	type: text
  matches:
	type: unfeaturized

entities:
  - city

actions:
  - utter_greet
  - utter_cheer_up
  - utter_did_that_help
  - utter_happy
  - utter_goodbye

templates:
  utter_greet:
  - text: "Hey! How are you?"

  utter_cheer_up:
  - text: "Here is something to cheer you up:"
    image: "https://i.imgur.com/nGF1K8f.jpg"

  utter_did_that_help:
  - text: "Did that help you?"

  utter_happy:
  - text: "Great carry on!"

  utter_goodbye:
  - text: "Bye"
  
  utter_default:
-text: "Little X is still learning, please say something else."
-text: "X is learning, you can try again when I upgrade ~"
-text: "Sorry, master, you want to query the function of x has not learned ~"
Copy the code

3. Train your conversational model

Once you have the doma.yml and sotries. Md data ready, you can train the model. The input data to the model is the history of the conversation, and the lable is the next decision action. Models are essentially multiple categories of num_actions categories.

The training orders are as follows:

python -m rasa_core.train -d domain.yml -s stories.md -o models/chat1
Copy the code

Parameter Description:

  • D or -- domain: refers to theDomain. Yml fileThe path of the
  • - or - s stories: specifyStories. Md fileThe path. You can save stories in a single MD file or separate them into multiple MD files (in a single directory)
  • - or - o out: refers to the dialogueThe output path of the modelSave the trained model file
  • - or c - c: specifyPolicy Specification file

Examples of data required for training:

  • Domain. Yml file:
    intent:
      - greet
      - goodbye
      - search_weather
    entities:
      - city
    actions:
      - utter_greet
      - utter_goodbye
      - utter_ask_city
    templates:
      utter_greet:
      - text: "Hello?"
      - text: "Hello again."
    
      utter_goodbye:
      - text: "Goodbye"
      - text: "See you next time."
    
      utter_ask_city:
      - text: "Where would you like to inquire about the weather?"
    Copy the code
  • Stories. Md file:
    ## search weather
    * greet
      - utter_greet
    * Search_weather {"datatime" : "tomorrow "} -utter_ask_city
    * goodbye
      - utter_goodbye
    Copy the code

Training network structure:

Generate model file

Testing the dialogue model

After training, we have a dialogue model, so let’s test it out.

The test command is as follows:

python -m rasa_core.run -d models/chat1
Copy the code

Parameter -d: Specifies the model path

Note: The RASA_core dialogue model is tested at this time, and core_NLU model is not added. Therefore, intention recognition cannot be carried out yet. Only specific answers can be returned according to known intentions (input intentions). Therefore, when we test, we need to manually enter the intent defined in domain-yml, starting with the/symbol. For example, enter greet intention: /greet

Test the chatbot

Through the previous steps, you have trained the Rasa_nlu intention recognition model and the Rasa_Core dialogue model. Next, the overall test of both.

  • Enter: text to be tested
  • Output: robot response
  • Intermediate process: includes intention recognition, entity recognition, and session flow

Run -d models/chat1 -u models/nlu/model_20190820-105546 Run the rasa_core.run -d models/chat1 -U models/nlu/model_20190820-105546 command

Parameter explanation;

  • -d: modeldir Specifies the dialogue model path (that is, the model path for Rasa_core training)
  • -u: Model path of Rasa NLU training
  • --port: Specifies the port number on which the Rasa Core Web application runs
  • --credentials: Specifies the input channels attribute
  • endpoints: Specifies the URL where Rasa Core connects to other Web servers, such as NLU Web
  • -o: Specifies the output path of the log file
  • --debug: Displays debugging information

Refer to blog post:

Rasa_NLU_Chi Chinese processing Practice Spring Boot

Rasa Core Development Guide

Rasa Usage Guide 01-U012526436 blog – CSDN Blog