The answer to life, universe and everything is 42 –deepThought
Review images
At this year’s F8 developer conference,Facebook talked up its vision for the future of chatbots. With these chatbots, users can complete many tasks in a conversation, such as shopping online, inquiring about flights, organizing meetings and more. Instead of downloading a bunch of apps, you can open a simple text dialog box and say, ‘Oh, my god, my third wish is three more wishes.
You might think: Hi Siri
Perhaps another reshuffle of user entry, which explains the rush of big tech companies
origin
I’ve always been interested in natural language processing (NLP), and I’ve been interested in machine learning/deep learning for the better part of a year, and chatbots are a combination of both.
The earliest interest in chatbots probably dates back to college. At that time, I paid attention to the little yellow chicken that swept Renren for a while, but later found that it just called a closed source cloud service, and turned to play with AIML.
Recently, I like to take a look at the courses at Starbucks after work (recently, I am working on Deep Learning with Udacity) and write my blog. I am also doing the same today. I am afraid that I will spend more time on Deep Learning in the future (ESPECIALLY interested in RNN)
Chatbot & Open Source framework
There are plenty of cloud services out there, from Facebook to Microsoft, that have their own frameworks. Open source projects, by contrast, are less glamorous, perhaps because of their early start, and they are still holding out for big ideas.
We went to Github and ChatterBot looked cool, with active projects, clean documentation and clean code.
Given the small size of the project, the source code is easy to read, making it a good scaffold for building your own smart chatbot
ChatterBot
ChatterBot is a machine learning-based chatbot engine built on Python that can learn from existing conversations. The project is designed to allow it to tap into any language
The principle of
An untrained ChatterBot does not have the knowledge needed to talk to its users. Each time the user types a sentence, the robot stores it, along with a reply sentence. As the robot receives more input, the number and accuracy of questions it can answer increases. How does the program respond to user input? First, match the sentence that is closest to the user’s input from the known sentence (how to measure the similarity, you can think about it), then find the most likely response, so how to get the most likely response? The frequency of each response to the input question (matched) was determined by all the people who communicated with the machine
Installation and use
The installation
pip install chatterbot
The basic use
from chatterbot import ChatBot from chatterbot.training.trainers import ChatterBotCorpusTrainer chatbot = ChatBot (" myBot ") ChatBot. Set_trainer (ChatterBotCorpusTrainer) # use English corpus training it ChatBot. "train" (" chatterbot. Corpus. English ") # Chatbot. Get_response ("Hello, how are you today?" )Copy the code
Using Chinese corpus
I have filled this with the Chinese corpus, the author has merged my submission into the master, it is not yet packaged and published to pYPI, if you want to use the default Chinese expectation training you need to do this:
https://github.com/gunthercox/ChatterBot pip3 install. / ChatterBot python3 # be used, otherwise there will be a unicode problem, temporarily doesn't have time to do python2 compatibleCopy the code
Using a Chinese corpus to train robots
from chatterbot import ChatBot from chatterbot.training.trainers import ChatterBotCorpusTrainer deepThought = ChatBot("deepThought") deepthought.set_trainer (ChatterBotCorpusTrainer) # Train it using a Chinese corpus DeepThought. "train" (chatterbot. Corpus. "Chinese") # corpusCopy the code
To start playing
Print (deepThought. Get_response (" Nice to meet you ") print(deepThought. Get_response (" Hi, how are you? ) print(deepThought. Get_response (" Complex is better than obscure ") # Resist the temptation to guess.")) # print(deepThought. Get_response (" What is the ultimate answer to life, the universe, and everything in it?" ))Copy the code
FAQ (Unofficial)
The default configuration
By default, ChatterBot uses JsonDatabaseAdapter as storage adapter and ClosestMatchAdapter as logic adapter, Use VariableInputTypeAdapter as the input Adapter
Read-only mode
Chatbot = chatbot (“wwjtest”, read_only=True) // Otherwise the bot learns every input
Create your own training classes
chatterbot/training
Create your own Adapters
Refer to the ClosestMatchAdapter and VariableInputTypeAdapter used by default
For example, we can write an input/output adapters for connecting to wechat (I prefer Werobot).
An example of IO is Chatterbot-Voice. This adapters let us use voice to communicate with our robot. It’s very simple
case
There are already many kinds of robots in the case
How are trained models distributed
/database.db (see jsonDatabase.py), which is not an SQLite database, but jsonDB, which encapsulates JSON (see jsondb/db.py)
Algorithm related
By default, the ClosestMatchAdapter is used as the logic adapter to find the sentence that is closest to the user’s input
The core code is:
from fuzzywuzzy import process
closest_match, confidence = process.extract(
input_statement.text,
text_of_all_statements,
limit=1
)[0]
Copy the code
Here we use fuzzywuzzy, refer to fuzzywuzzy#process
Fuzzywuzzy is used to calculate the direct similarity of sentences, and the string similarity algorithm adopted is Levenshtein Distance(editing Distance algorithm).
Levenshtein Distance
Edit Distance, also known as Levenshtein Distance (also known as Edit Distance), is the minimum number of Edit operations required to convert two strings from one to the other. The greater the Distance between them, the more different they are. Permitted editing operations include replacing one character with another, inserting a character, and deleting a character. (Quoted from Wikipedia)
From the above description, we can see that this algorithm is applicable to any text, and when we use process.extract, the accuracy of similarity measurement will not be affected by using Chinese. Of course, we can also see the shortcomings of this algorithm, it can not understand semantic similarity, even synonyms completely unable to deal with. This is an obvious shortcoming, and it is necessary to re-implement a logic Adapter that measures text similarity
From Fuzzywuzzy import fuzz fuzz. Ratio (u" Hello ", u" hello!" Ratio (u" Hello ", u" hello ") #100Copy the code
Other algorithms
Time_adapter. Py uses the NaiveBayes: From Textblob. Classifiers Import NaiveBayesClassifier, which is currently the only place where textblob is referenced
[(” What time is it”, 1), XXX, XXX…]
The use of me
Currently, NLTK’s word_tokenize, Wordnet and stopWords are mainly used
todo
- Make this project more suitable for training Chinese corpus
- Write a Logic Adapter using other text similarity algorithms
- Add Chinese stop words, etc. (instead of NLTK stop words)
- Use SnownLP and Jieba to replace existing dependencies (NLTK and Textblob)
- Fork is a project that uses its architecture to rewrite a more suitable One for Chinese
Chat corpus
Chat corpus involves privacy. There are almost no publicly available Chinese corpus on the Internet. We open our imagination:
- Siri to Xiaoice (with wechat API is dialogue programmable)
- Plato’s Dialogues
- The analects of Confucius
pit
ChatterBot itself supports PYTHon2 /python3, and currently only supports Python2 if you want to use Chinese
Python2中文 解 决 :
statement_list = self.context.storage.get_response_statements()
The resulting statement_list is a list of incorrectly encoded sentences (codec problem)
For a solution, see my blog: Notes on coding
conclusion
The current project provides a beautiful bot skeleton and plug-in design, which is very convenient to insert powerful functions, which is also my favorite part of the project. In terms of chat Bot functions, the functions are relatively simple and clear