review

Dialogue System (DS for short) is a System that enables people and machines to communicate with each other through natural language. In addition to answering users’ questions in natural language with accurate and concise answers, DS pays more attention to the interaction with people, the understanding of people’s intentions, the perception of the atmosphere of dialogue, and the diversity and individuality of answers.

classification

Interactive mode

According to the interaction model, it can be divided into text customer service robot and voice customer service robot.

  • Text customer service robot

It mainly includes NLU, DM, NLG and other technologies

  • Voice customer service robot

In addition to NLU, DM, NLG and other technologies, it also includes ASR and TTS.

Scenario and function types

Question-and-answer robot

Question-answering robot mainly relies on the powerful knowledge base and can give specific replies to the questions raised by users. The accuracy of the reply content is high, but it is limited to the single-round dialogue interaction of question and answer, and the context information is not processed. Currently, it is mostly used in the customer service field.

Question answering technology based on retrieval matching

Question answering based on retrieval matching can find the most similar question from a given question data set by means of semantic matching. This method is simple and effective. In business scenarios where the questions and answers are relatively fixed, it usually achieves a good conversation effect.

Question and answer technology based on reading comprehension

Document Understanding (DU) based question answering technology aims to complete the question answering task around the content of a given chapter Document. Specifically, the technology provides correct answers to document-related questions raised in user conversations for a given document or documents. The technology is mainly realized through several technical processes such as question comprehension, paragraph sequencing and machine reading comprehension.

In q&A based on reading comprehension, machine reading comprehension, which enables machines to read natural language texts and then answer relevant questions, is the core and the key to improve the effectiveness of reading comprehension based dialogues.

Task robot

The robot meets the user’s specific task needs through multiple rounds of dialogue and interaction. The robot mainly understands the user’s intention through dialogue state tracking, asking slots and clarifying, and then replies or calls API to fulfill the user’s task requirements, such as booking tickets and ordering food.

Implementation technology

At present, there are two main implementation technologies of task-based dialogue system, one is based on pipeline, the other is based on end-to-end implementation.

Pipeline

The classical structure diagram of the dialogue system is also called regular dialogue system.

The whole system has four core modules, which are respectively a pipeline composed of NLU, DST, DPL and NLG in series. Each module can be designed independently and cooperate among modules to complete task-based dialogue.

  1. Natural language understanding (NLU) : mainly to understand the semantics of the dialogue generated in the process of human-computer interaction;
  2. Dialog state tracker (DST) : Manages the state of each round of the conversation, including the historical state record and the current state output.
  3. Dialog policy (DPL) : the next system response policy based on the current dialog state;
  4. Natural language Generation (NLG) : Translates the semantics of the output of a conversation policy into natural language.
End to end

End -to-end conversation system, mainly combined with deep learning technology. A method of mining the overall mapping relationship from user’s natural language input to system’s natural language output through massive data training driven by data model, while ignoring the intermediate process.

As far as the overall application of the current industry is concerned, although the end-to-end method has high flexibility and extensibility, it also has high requirements on the quality and quantity of data, and also has problems such as uncontrollability and uninterpretability. Therefore, most of the industrial dialogue system is still based on the rule of pipeline implementation.

The business scenario

The business scenarios of task-based man-machine dialogue system can be divided into two types: information retrieval and service satisfaction.

Information retrieval class

The information retrieval dialogue system receives the instructions sent by users in the form of language. After dialogue understanding, dialogue management, instruction execution and language generation, it queries the information needed by users and organizes language return, such as intelligent customer service.

Service satisfaction class

The process of service satisfaction class dialogue system is roughly the same as that of information retrieval class. The difference is that the service satisfaction dialogue system usually does not need to give feedback to users in the form of natural language, but only completes the instructions issued by users through dialogue understanding, dialogue management and command execution, such as Tmall genie.

Chattering robot

The interaction between the robot and the user is relatively open, the user has no clear purpose, and there is no standard answer for the robot response. There is no requirement on the accuracy of the reply content, and the reply is mainly interesting and personalized to meet the emotional needs of users.

The technical framework of chatty intelligent conversation has two main forms, one is chatty conversation system based on retrieval technology, the other is chatty conversation system based on end-to-end technology.

Chat system based on retrieval technology

The chat system based on retrieval technology usually consists of retrieval model, matching model and sorting model. For a given user input words, first in dialogue corpus by retrieval model to find the similar scenarios and return more than one candidate dialogue quiz on, and then by the matching model to determine whether a candidate quiz on can be a reasonable reply, current with household words finally made some sort model according to the application scenario adapter after the custom give the final reply. The system has advantages in terms of smoothness and wit of responses because it can directly use human responses. However, it lacks freedom and cannot generate new sentences that have never existed, thus it is considered to be unable to achieve ideal conversational intelligence.

Chat system based on end-to-end modeling technology

The chat system based on the end-to-end modeling technology regards the conversation problem as the translation problem, regards the previous part of the conversation as the source language, regards the following part of the conversation as the target language, and uses the sequential neural network generation model to generate the conversation response directly. For a given user input utterance, the end-to-end modeling technique first encoder network is used to encode the input utterance into an intermediate semantic representation, and then the intermediate semantic representation is gradually decoded into a dialogue echo with the help of a decoder network. The end-to-end model can overcome the defect that the retrieval model cannot generate a new reply, and can generate a reply that has never appeared in the dialog instance library. However, due to the lack of corresponding intervention in user discourse and reply mechanism, it is difficult to ensure that the generated reply is rich in information and consistent with the context.

The business and social value of intelligent conversation systems

Industry level (B-side) value increase

  • Financial industry — problem consultation, outbound call, work order management and other customer service scenarios.
  • Retail — provide personalized product suggestions for users, and actively push related products.
  • Manufacturing – Embedded intelligent dialogue systems enable users to control electrical equipment with conversational interaction functions.
  • Government affairs — intelligent terminal can quickly and accurately guide the masses through multiple rounds of dialogue to locate the process of handling affairs, receiving and reviewing the required materials, greatly reducing the waiting time.

Consumer (C-side) experience upgrade

  • Enrich your digital life — shopping, cooking, food delivery, travel, housekeeping, games, watching movies, fitness and other activities can all be assisted by intelligent conversation products such as voice assistants.
  • Assist children’s education — In the scene of early education, accompanying robot guides children to read. In K12 education, tutor robot can help improve learning efficiency and enrich the fun of learning.
  • Improve social welfare — Monitor the physical state of the elderly at any time through intelligent voice wearable devices, and receive phone calls under voice commands to provide timely assistance. The emergence of “intelligent voice nursing bed” enables patients to control the state of the nursing bed by themselves through dialogue.
  • Serving special groups — For special groups with visual and language impairments, intelligent voice customer service and voice assistant in the market now provide convenience for personal affairs.