Author’s brief introduction  

Haiqing Chen, a senior technical expert of Alibaba Intelligent Service Department, has been engaged in work and research related to intelligent human-computer interaction field in Ali for 8 years and led the team to build Alibaba intelligent interactive robot system. This article is from Chen Haiqing’s share on “Ctrip Technology Salon — HUMAN-machine Semantic Interaction AI”.

* The video is provided by “IT Guru Talk” and lasts about 43 minutes. Please watch IT on WiFi *

1 Introduction to intelligent human-computer interaction field

1.1 Industry classification and current application status

Today, with the continuous development of global artificial intelligence, Internet companies including Google, Facebook, Microsoft, Amazon and Apple have launched their intelligent personal assistant and robot platforms one after another.

Intelligent human-computer interaction gradually plays a huge role and value in intelligent customer service, task assistant, smart home, intelligent hardware, interactive chat and other fields through anthropomorphic interactive experience. As a result, companies are treating smart chatbots as the gateway level of the future. Today, with the further development of the market, chatbots can be divided into customer service, entertainment, assistant, education, service and other types of products and services.

Figure 1 shows some of the chatbots.

Figure 1: A summary of some chat-bots

1.2 Status of Ali Xiaomi in the field of e-commerce

In July 2015, Ali launched its own intelligent personal assistant — Ali Xiaomi, an intelligent human-computer interaction product centering on service, shopping guide and task assistant in the field of e-commerce. Through the combination of e-commerce and intelligent human-computer interaction, the traditional service industry model changes and experience improvement.

During last year’s Singles’ Day, the overall intelligent service volume of Almi reached 6.43 million, of which the intelligent resolution rate reached 95%. The proportion of intelligent service in the whole service volume (total service volume = intelligent service volume + online manual service volume + telephone service volume) also reached 95%, becoming the absolute main force of service during singles’ Day.

2. Technical practice of Alibaba in the field of e-commerce

2.1 Overview of Ali Xiaomi Technology

Intelligent human-computer interaction system, commonly known as chatbot system or BOT system. Figure 2 is a flow chart of human-computer interaction:

Figure 2: The flow of human-computer interaction

The core is NLU(natural language understanding), which is processed through the dialogue system and finally gives the answer through the way of natural language generation. For example, the word “apple” has at least two meanings, one for a fruit and one for a well-known Internet company.

2.1.1 Intention and matching layered technical architecture system

In ali Xiaomi’s scenario in the field of e-commerce, there are robots of customer service, assistant and chat. These robots, because of their different goals, can not be solved by the same set of technical framework. Therefore, we firstly abstracted the architecture in a hierarchical and scenario-specific way, and then used different machine learning methods to carry out technical design according to different layers and scenarios. First, we divide the dialogue system into two layers:

1. Intention recognition layer: identify the real intention of the language, classify the intention and extract the intention attribute. Intention determines the subsequent process of domain recognition. Therefore, the intention layer is a process of clarifying and reasoning intentions by combining contextual data model and domain data model.

2. Question and answer matching layer: the process of matching and identifying questions and generating answers. In the dialogue system of Ali Xiaomi, we divide three typical problem types according to business scenarios, and adopt different matching processes and methods according to the three types:

  1. Question and answer: for example, “What if I forget my password?” → The matching method of knowledge graph based + retrieval model is adopted

  2. Task-based: for example, “I’d like to book a flight from Hangzhou to Beijing for tomorrow” → matching of intentional decisions +slots filling and deep reinforcement learning

  3. Conversational type: for example, “I’m in a bad mood” → the combination of retrieval model and Deep Learning

Figure 3 shows ali Xiaomimei’s intention and matching layered technical architecture.

Figure 3: Technical architecture based on intent matching layering

2.1.2 Introduction to intention recognition: Combining the practice of deep-learning model of user behavior

Usually intention recognition abstract into the classification problem in machine learning, in ali small honey technology scheme in addition to the traditional text characteristic, consider the existence itself in the field of dialogue semantic intention of incomplete situation, we have also joined in real-time, offline behavior of users and users themselves related characteristics, through the deep learning plan to build model, Predict user intentions, as shown in Figure 4:

Figure 4: Classification of deep learning intentions based on user behavior

In the classification prediction model based on deep learning, we have two specific selection schemes: one is multi-classification model, the other is binary classification model. Multi-classification model has the advantage of fast performance, but the whole model needs to be retrained when the classification domain needs to be extended. The advantage of the binary classification model is that the original model can be reused when the domain scene is extended, and the platform can be extended. The disadvantage of the binary model is also obvious, and the overall performance is not as good as multi-classification. Therefore, different types of models can be made in specific scenes and data volume.

The overall technical way that Xiaomi uses DL for intentional classification is that the behavior factors and text features are separately processed by Embedding, and then multi-classification or binary classification is processed by vector overlay. Here, the text feature dimension can be vectorized by the traditional bag of Words method or Deep Learning method. Specific figure:

Figure 5: Network structure of deep learning intention classification based on user behavior

Overview of the matching model: Describes the three matching models in the industry

At present, the mainstream intelligent matching technology is divided into the following three methods:

1. Rule-based Matching

2. Based on the Retrieval Model

3. Based on Deep Learning Model

In the technical scene of Ali Xiaomi, we adopted the method prototype based on template matching, retrieval model and deep learning model to construct the conversation system in different scenes (q&A, task-based and conversational).

2.2 Technical practice of ali Xiaomi in three fields

2.2.1 Intelligent shopping guide: Intelligent shopping guide based on enhanced learning

Intelligent shopping guide mainly through support and users of multiple rounds of interaction, continuous understanding and clear user intentions. On this basis, the interactive process of shopping guide is optimized by using deep reinforcement learning. Figure 6 shows the technical architecture diagram of the intelligent shopping guide.

Figure 6: Architecture diagram of intelligent shopping guide

Here are two central questions:

                                      

A) Understand the user’s intention in multiple rounds of interaction.

B) Optimize the sorting results and interaction process according to the user’s intended results.

The following mainly introduces the understanding of the guide’s intention and the optimization of interactive strategies of deep reinforcement learning.

2.2.1.1 Intention understanding and intention management of intelligent shopping guide

The intention understanding under intelligent shopping guide is mainly to identify the goods that users want to buy and the corresponding attributes of the goods. Compared with the traditional intention understanding, it also brings several new challenges.

First, users tend to express themselves in short sentences. Therefore, to identify the user’s intention, it is necessary to combine the user’s multiple rounds of conversation with the boundary of the intention.

Second, in multiple rounds of interaction, the user will constantly add or modify the sub-intent of the intention, so it is necessary to maintain a set of currently recognized intentions.

Third, there are mutual exclusion, similarity, upper and lower relations among commodity intentions. Different relationships correspond to different intention management.

Fourth, attribute intents are classified and mutually exclusive.

For phrase expression, we maintain an intention heap through category management and attribute management, so as to better solve the phrase expression, intention boundary and specific intention switching and modification logic. At the same time, for the large commodity database problem, we use the knowledge graph combined with semantic index to make the commodity identification become very efficient. Now we introduce category management and attribute management respectively.

Category Management based on knowledge graph and semantic index

Category management in the scene of intelligent shopping guide is divided into category identification and category relation calculation. Below is the category relationship architecture diagram.

Figure 7: Category Management architecture diagram

Category identification:

The recognition scheme based on knowledge graph and the discriminant model based on semantic index and DSSM are adopted.

A) Recognition scheme based on commodity knowledge graph:

Based on the complex structuring ability of knowledge graph, do category identification of goods. Is the basis of our commodity identification.

B) Solutions based on semantic index and DSSM commodity identification model:

The advantage of knowledge graph recognition scheme is high accuracy, but it cannot cover all cases. Therefore, we propose a product recognition scheme based on semantic index and DSSM.

Figure 8: Commodity identification scheme based on semantic indexing and DSSM

Construction of semantic index:

Generally, semantic indexing is constructed in ontology-based and LSI based ways. We use a combination of search click data and word vector to construct semantic index. It mainly includes the following steps:

Step 1: Use search click behavior to extract word segmentation to category candidates.

Step 2: Based on the word vector, calculate the similarity between word segmentation and candidate classes and reorder the index.

Commodity identification based on DSSM:

DSSM is a supervised deep semantic matching network for Query and DOC matching, which can better solve the problem of lexical gap and capture the intrinsic semantics of sentences. Based on DSSM, this paper constructs the similarity calculation model of Query and candidate classes. The ACC of the model is about 92% on the test set.

Figure 9: Network structure diagram of DSSM model

Sample construction: Positive samples for training are constructed by searching query and clicking categories in the search log. In the negative sample, some similar categories are retrieved by using query and clicked categories as seeds, and the categories not in the positive sample are taken as negative samples. The ratio of positive and negative samples is 1:1.

Category relation calculation:

Category relation calculation is mainly used in the intention management of intelligent shopping guide. Here, several relations are mainly considered: upper and lower relation and similarity relation. For example, if the user’s first intention is to buy clothes, when the next intention says to buy a water cup, the properties of the previous clothes should not be inherited to the water cup. On the other hand, if the user says pants at this point, since pants is the next word of clothing, the previous property on the clothing should be inherited.

There are two schemes to calculate the upper and lower relation:

A) Use relational operation based on knowledge graph.

B) Extraction through user’s search query.

Two schemes of similarity calculation:

A) Based on the same upper word. For example, Xiaomi and Huawei are similar when their top words are mobile phones.

B) Semantic similarity of category words based on fast-text.

Attribute management based on knowledge graph and similarity calculation

The following is an architecture diagram of property management:

Figure 10: Property management architecture diagram

On the whole, attribute management includes two core modules of attribute identification and attribute relation calculation, which is similar to category management. I won’t go into details here.

2.2.1.2 Exploration and attempt of deep reinforcement learning

Reinforcement learning is the mapping learning of agent from environment to behavior. The goal is to maximize the function value of reward signal (reinforcement signal), and the environment provides reinforcement signal to evaluate whether the action is good or bad. Agent obtains an optimal decision-making strategy by constantly exploring the external environment, which is suitable for sequential decision-making problems. Figure 11 is an illustration of model and environment interaction for reinforcement learning.

Figure 11: Interaction diagram of enV-Model

Deep reinforcement learning is a kind of reinforcement learning combined with deep learning. It mainly uses the powerful nonlinear expression ability of deep learning to express the state and decision logic on state faced by the agent.

Currently we use DRL mainly to optimize our interaction strategy. Therefore, our setup is that the user is the ENV in reinforcement learning and the machine is the Model. Action is an interaction between this round and whether the search results are generated.

Design of states:

The design of the state here mainly considers the user’s multiple rounds of intention, user group division, and product information of each round of interaction as the state perceived by the current machine.

State = (intent1, query1, price1, is_click,query_item_sim,… , power, user_inter, age)

Intent1 represents the current intent of the user, and Query1 represents the user’s original Query. Price1 represents the average price of the item currently presented to the user, is_click represents whether this interaction occurred, and query_item_sim represents the similarity between Query and item. Power is the user’s purchasing power, user_inter is the user’s interest, and age is the user’s age.

Reward design:

The ultimate measure is the number of transactions and clicks and rounds of conversations. Therefore, the design of reward mainly includes the following three aspects:

A) The user’s reward for clicking is set to 1

B) Set transaction to [1 + math.log(price + 1.0)]

C) Set the rest to 0.1

Selection of DRL scheme:

In this specific scheme, DQN, Policy-gradient and A3C are mainly adopted.

2.2.2 Intelligent service: technical practice based on knowledge graph construction and retrieval model

Intelligent services are characterized by the concept of domain knowledge, high correlation between knowledge, and high requirements for accuracy.

Based on the characteristics of q&A scenario, we adopt the method of knowledge graph construction and retrieval model to design the core matching model in technology selection.

We will from two angles of knowledge map building for abstraction, one is the physical dimension of digging, digging a phrase dimension, through the accumulation of a large number of belong to and the Internet data on taobao platform, through the way of theme model for mining, mark and cleaning, again by presetting good relationship and the definition of the relationship between the entity ultimately form a knowledge map. The basic mining framework flow is as follows:

Figure 12: Entity and phrase mining process for the knowledge graph

The example of knowledge graph constructed by mining is shown in Figure 13:

Figure 13: An example of a concrete knowledge graph

The matching mode based on knowledge graph has the following advantages:

(1) Support contextual session recognition and reasoning between entities in the design of dialogue structure and flow

(2) Usually in the general type of question and answer accuracy is relatively high (of course, reasoning type of scenes need special design, will be a little complicated)

There are also obvious disadvantages:

(1) At the initial stage of model construction, there may be loose data and coverage problems, leading to the loss of coverage of matching;

(2) Incremental maintenance of the knowledge graph costs more than traditional QA Pair in knowledge maintenance;

Therefore, in the q&A design of Ali Xiaomi, we still integrate the traditional dialogue matching based on the retrieval model.

Its online basic process is divided into:

(1) Question preprocessing: word segmentation, anaphora resolution, error correction and other basic text processing processes;

(2) Retrieval recall: recall possible matching candidate data from candidate data by retrieval;

(3) Calculation: by combining Query with the context model and candidate data, the distance calculation method between texts (cosine similarity, editing distance) and the classification model are used for calculation;

(4) The final product process design is carried out according to the returned candidate set scoring threshold;

Offline processes are divided into:

(1) Indexing of knowledge data;

(2) Offline text model construction: for example, term-weight calculation;

The overall process of the retrieval model is shown in Figure 14:

Figure 14: Flow chart for retrieving the model

2.2.3 Intelligent chat: a chat application based on the combination of retrieval model and deep learning model

Intelligent chat is characterized by non-goal-oriented, unclear semantic intention, semantic relevance and gradualism, and relatively low accuracy.

Open Domain-oriented chatbots are currently a challenge in both academia and industry. There are usually two ways to do dialogue design at this stage:

One is the model generation method of Deep Learning, which is very popular in academia. Encoder-Decoder model is used to generate Sequence to Sequence through LSTM, as shown in Figure 15:

Figure 15: SeQ2SEQ network structure diagram

GenerationModel:

Advantages: Answers are generated in a deep semantic way and are not limited by the size of corpus

Disadvantages: the model is not explanable, and it is difficult to guarantee consistency and reasonable answers

Another way is to construct the question and answer matching of discourse through the traditional retrieval model.

RetrievalModel:

Advantages: The answers are in the preset corpus, controllable, the matching model is relatively simple, and the interpretation is strong

Disadvantages: it lacks some semantics to some extent, and has the limitation of fixed corpus

Therefore, in the chat engine of Ali Honey, we combined the advantages of the two models and formed the core of the chat engine of Ali Honey. Firstly, the candidate set data is retrieved through the traditional retrieval Model, and then candidate set is Rerank through the Seq2Seq Model. If the candidate set exceeds the specified threshold, the output will be carried out after reordering, and if the threshold is less than, the answer will be generated through the Seq2Seq Model. The overall flow chart 16:

Figure 16: Small talk module of Honey

3. Development and prospect of future technology

At present, the field of artificial intelligence is still in the stage of weak artificial intelligence, especially from perception to cognition, there is still a lot of room for improvement. Intelligent human-computer interaction in the goal-oriented field has been able to be closely combined with the actual industrial scene and produce great value. With the continuous development of artificial intelligence technology, the development of intelligent human-computer interaction in the future will continue to improve. For the development of future technology, we are worth looking forward to and looking forward to:

1. The continuous accumulation of data, as well as the continuous improvement and construction of the domain knowledge map will continuously promote the continuous improvement of intelligent human-computer interaction;

2. The construction of task-oriented vertical segmentation robots will be a growing point of growth in the future, and open Domain interactive robots will need to continue to improve and explore in the future;

3. With the continuous improvement of distributed computing capability, deep learning will continue to develop in the field of NLP(Natural language processing) after sweeping the fields of image and speech, and the academic research in the field of dialogue and QA will continue to be active.

In the future, with the continuous combination and accumulation of academia and industry, it is expected that scenes in artificial intelligence movies will be realized as soon as possible, and everyone can have their own intelligent “little honey”.

 

References:

[1] : Huang P S, He X, Gao J, et al. Learningdeep structured semantic models for web search using clickthrough data[C]// ACMInternational Conference on Conference on Information & KnowledgeManagement. ACM, 2013:2333-2338.

[2] Minghui Qiu and Feng-Lin Li. MeChat: A Sequence to Sequence andRerank based Chatbot Engine. ACL 2017

[3] Dzmitry Bahdanau, Kyunghyun Cho, and YoshuaBen- gio. 2015. Neural machine translation by jointly learning to align andtranslate. In Proceedings of ICLR 2015

[4]Matthew Henderson. 2015. Machine learning fordialog state tracking: A review. In Proceedings of The First InternationalWorkshop on Machine Learning in Spoken Language Processing.

[5] Mnih V, Badia A P, Mirza M, et al.Asynchronous Methods for Deep Reinforcement Learning[J]. 2016

[6] Li J, Monroe W, Ritter A, et al. DeepReinforcement Learning for Dialogue Generation[J]. 2016.

[7] Sordoni A, Bengio Y, Nie J Y. Learning concept embeddings for queryexpansion by quantum entropy minimization[C]// Twenty-Eighth AAAI Conference onArtificial Intelligence. AAAI Press, 2014:1586-1592.

To download the lecturer’s PPT, please click “Read the original article” at the bottom of the article.

Recommended reading:

  • Ctrip Technology Salon — The Next Generation of Internet: Human-machine Semantic Interaction AI

  • Ctrip online risk control system architecture

  • How to construct a real-time computing platform based on Spark Streaming

  • Construction, inference and application of large-scale knowledge graph