Question answering system is a classic problem in the field of natural language processing. It is used to answer questions raised by people in the form of natural language and has a wide range of applications. Its classic application scenarios include intelligent voice interaction, online customer service, knowledge acquisition, emotional chat and so on. The common classification is: generative, retrieval question answering system; Single round question answering, multi-round question answering system; Open domain, domain specific question and answer system. This paper mainly deals with the question and answer system of retrieval type and specific domain, which is usually called intelligent customer service robot.
In the past, the construction of customer service robots usually required the transformation of Domain Knowledge into a series of rules and Knowledge maps. The construction process relies heavily on “artificial” intelligence, and changing scenarios and users requires a lot of repetitive work.
With the application of deep learning in natural language processing (NLP), machine reading can find answers to matching questions directly and automatically from documents. The deep language model converts questions and documents into semantic vectors to find the final matching answer. This paper uses Google open source Bert model and Milvus open source vector search engine to quickly build a dialogue robot based on semantic understanding.
| overall architecture
This paper realizes a question and answer system through semantic similarity matching, and the general construction process is as follows:
- Get a large number of Chinese questions with answers in a particular domain (referred to in this article as the standard problem set).
- Bert model is used to convert these problems into feature vectors and store them in Milvus. Meanwhile, Milvus will assign a vector ID to these feature vectors.
- Store these ids and their corresponding answers in PostgreSQL.
When a user asks a question:
- It is transformed into feature vector by Bert model
- The similarity retrieval of feature vectors in Milvus was carried out to obtain the ID of the standard problem most similar to the problem
- Get the answer in PostgreSQL.
The system architecture diagram is as follows (blue line is the import process, yellow line is the query process) :
Next, I will teach you how to build an online question and answer system.
| build steps
You need to install Milvus and Postgresql. For details, see the official website.
1. Data preparation
The experimental data in this paper are from: github.com/SophonPlus/… .
We sorted out 330,000 pieces of data from the financial data set in the FAQ system under this project. Combined with this group of data, we can quickly build an intelligent customer service robot of XX Bank.
2. Generate feature vectors
This system uses a model that Bert has pre-trained. Before start the service, you need to download the model: storage.googleapis.com/bert_models…
The model is used to transform the problem library into feature vectors for subsequent similarity retrieval. More Bert services can be found at github.com/hanxiao/ber…
3. Import Milvus and PostgreSQL
Normalized feature vectors generated above are imported into Milvus for storage, and then J imports the ID returned by Milvus and the answer to the corresponding question of the ID into PostgreSQL. Table structure in PostgreSQL:
4. Get answers
Users input a question and find the most similar question in Milvus library after generating feature vectors through Bert. The cosine distance is used in this paper to represent the similarity between two sentences. Since all vectors are normalized, the closer the cosine distance between two feature vectors is to 1, the higher the similarity is. In practice, we can set a threshold value of 0.9. When the most similar distance retrieved is less than the threshold value, the system will return a hint that related problems are not included in the system.
| system demonstration
The initial system interface is as follows:
Enter your question in the dialog box and you will receive the corresponding answer. As shown in figure:
| summary
Is the q&A system set up above very simple? With the Bert model, you don’t need to classify and label the corpus in advance. At the same time, thanks to the high performance and scalability of open source vector search engine Milvus, the system can support hundreds of millions of levels of corpus. The Milvus vector search engine has been incubated by the Linux AI (LF AI) Foundation. Welcome to join the Milvus community. Let’s speed up the mass deployment of AI technology.
IO /cn/ Scenario…
Detailed steps (with code) :
Github.com/milvus-io/b…
github.com
* the original source: mp.weixin.qq.com/s/nHsg8Iu8B…
Want Milvus to teach you how to build an intelligent question-answering robot? Sign up now for the live broadcast on 4/23 at 7pm!
Using Bert and Milvus to quickly build intelligent question-answering robots
www.huodongxing.com
| welcome to Milvus community
Github.com/milvus-io/m… | source
Milvus. IO |’s official website
Milvusio.slack.com | Slack community
Zhihu.com/org/zilliz-… | zhihu
Zilliz.blog.csdn.net | CSDN blog