Using TensorFlow to implement context chat-bots

In our daily conversation, the situation is the most important. We will use TensorFlow to build a chatbot framework and add some context processing mechanisms to make the bot more intelligent.

“The Whole World in your Hand.” – Betty Newman – Maguire (www.bettynewmanmaguire.ie/)

Have you ever wondered why so many chatbots lack conversational scenarios?

Given the importance of context in all dialog scenarios, how do you add this feature?

Next, we’ll create a chatbot framework and model the conversation using an island scooter rental store as an example. This small business chatbot needs to handle some simple questions about rental hours, rental options, etc. We also want the robot to be able to process contextual information, such as querying rental information for the same day. If we can solve this problem, we will save a lot of time.

As for the construction of chatbot, we carried out the following three steps:

We will use TensorFlow to write a dialogue intent model.
Next, we’ll build a chatbot framework that handles conversations.
Finally, we’ll show you how to incorporate context information into our reactive processor.

In the model, we will use the TFLearn framework, which is a high-level API for TensorFlow, and we will use IPython as a development tool.

1. We will use TensorFlow to write the dialogue intent model.

For the full Notebook documentation, click here.

For a chatbot framework, we need to define a conversational intent structure. The simplest and most convenient way is to use a JSON file, as shown below:

chat-bot intents

Each session intent contains:

Tag (unique name)
Patterns (sentences that our neural network text classifier needs to classify)
Response (a sentence to be used as a response)

We’ll also add some basic context elements later.

First, let’s import some of the packages we need:

# things we need for NLP
import nltk
from nltk.stem.lancaster import LancasterStemmer
stemmer = LancasterStemmer()

# things we need for Tensorflow
import numpy as np
import tflearn
import tensorflow as tf
import random
Copy the code

If you’re not familiar with TensorFlow, check out this tutorial or this tutorial.

# import our chat-bot intents file
import json
with open('intents.json') as json_data:
    intents = json.load(json_data)
Copy the code

The JSON files in the code can be downloaded here, and we can start organizing the files, data, and classifiers of the code.

words = [] classes = [] documents = [] ignore_words = ['?'] # loop through each sentence in our intents patterns for intent in intents['intents']: for pattern in intent['patterns']: # tokenize each word in the sentence w = nltk.word_tokenize(pattern) # add to our words list words.extend(w) # add to documents in our corpus documents.append((w, intent['tag'])) # add to our classes list if intent['tag'] not in classes: classes.append(intent['tag']) # stem and lower each word and remove duplicates words = [stemmer.stem(w.lower()) for w in  words if w not in ignore_words] words = sorted(list(set(words))) # remove duplicates classes = sorted(list(set(classes))) print (len(documents), "documents") print (len(classes), "classes", classes) print (len(words), "unique stemmed words", words)Copy the code

We created a list of files (each sentence), each sentence consisting of a few stems, and each document belonging to a specific category.

27 documents
9 classes ['goodbye', 'greeting', 'hours', 'mopeds', 'opentoday', 'payments', 'rental', 'thanks', 'today']
44 unique stemmed words ["'d", 'a', 'ar', 'bye', 'can', 'card', 'cash', 'credit', 'day', 'do', 'doe', 'good', 'goodby', 'hav', 'hello', 'help', 'hi', 'hour', 'how', 'i', 'is', 'kind', 'lat', 'lik', 'mastercard', 'mop', 'of', 'on', 'op', 'rent', 'see', 'tak', 'thank', 'that', 'ther', 'thi', 'to', 'today', 'we', 'what', 'when', 'which', 'work', 'you']
Copy the code

For example, the stem tak would match take, taking, takers. In the real world, we could remove some useless entries, but that’s enough here.

Unfortunately, this data structure cannot be used in TensorFlow, and we need to transform this data further: from words to tensors of numbers.

# create our training data training = [] output = [] # create an empty array for our output output_empty = [0] * len(classes) # training set, bag of words for each sentence for doc in documents: # initialize our bag of words bag = [] # list of tokenized words for the pattern pattern_words = doc[0] # stem each word  pattern_words = [stemmer.stem(word.lower()) for word in pattern_words] # create our bag of words array for w in words: bag.append(1) if w in pattern_words else bag.append(0) # output is a '0' for each tag and '1' for current tag output_row  = list(output_empty) output_row[classes.index(doc[1])] = 1 training.append([bag, output_row]) # shuffle our features and turn into np.array random.shuffle(training) training = np.array(training) # create train and test lists train_x = list(training[:,0]) train_y = list(training[:,1])Copy the code

Please note that our data order has been scrambled. TensorFlow will select some of these data as test data to test the accuracy of the trained model.

If we look at a single X vector and a single y vector, then this is a bag of words model, one for the pattern to be matched, one for the target to be matched.

train_x example: [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1] 
train_y example: [0, 0, 1, 0, 0, 0, 0, 0, 0]
Copy the code

Next, let’s build our model.

# reset underlying graph data
tf.reset_default_graph()
# Build neural network
net = tflearn.input_data(shape=[None, len(train_x[0])])
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, len(train_y[0]), activation='softmax')
net = tflearn.regression(net)

# Define model and setup tensorboard
model = tflearn.DNN(net, tensorboard_dir='tflearn_logs')
# Start training (apply gradient descent algorithm)
model.fit(train_x, train_y, n_epoch=1000, batch_size=8, show_metric=True)
model.save('model.tflearn')
Copy the code

This model uses a 2-layer neural network model, which is the same as in this article.

interactive build of a model in tflearn

We are done with this part of the work, and now we need to save our models and documentation so that we can use them later in the code.

# save all of our data structures
import pickle
pickle.dump( {'words':words, 'classes':classes, 'train_x':train_x, 'train_y':train_y}, open( "training_data", "wb" ) )
Copy the code

Build our chatbot framework

This part, the complete code is here.

We will build a simple state machine to process the responses and use our intent model as our classifier, as mentioned in the previous section. If you want to see how chatbots work, you can click here.

We need to import the same package as in the previous section, and then un-pickle our models and sentences as we did in the previous section. Keep in mind that our chatbot framework is built separately from our model — there is no need to refactor the model unless the intent pattern changes and we need to rerun our model. If you have hundreds of intents and thousands of patterns, the model can take minutes to build.

# restore all of our data structures
import pickle
data = pickle.load( open( "training_data", "rb" ) )
words = data['words']
classes = data['classes']
train_x = data['train_x']
train_y = data['train_y']

# import our chat-bot intents file
import json
with open('intents.json') as json_data:
    intents = json.load(json_data)
Copy the code

Next, we need to import the model we just trained with TensorFlow (TFLearn framework). Note that your first step is to define the TensorFlow model structure, as we did in Part 1.

# load our saved model
model.load('./model.tflearn')
Copy the code

Before we can start working with dialog intents, we need a way to enter a word bag of data from the user. And this is the same method that we used before.

def clean_up_sentence(sentence):
    # tokenize the pattern
    sentence_words = nltk.word_tokenize(sentence)
    # stem each word
    sentence_words = [stemmer.stem(word.lower()) for word in sentence_words]
    return sentence_words

# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(sentence, words, show_details=False):
    # tokenize the pattern
    sentence_words = clean_up_sentence(sentence)
    # bag of words
    bag = [0]*len(words)  
    for s in sentence_words:
        for i,w in enumerate(words):
            if w == s: 
                bag[i] = 1
                if show_details:
                    print ("found in bag: %s" % w)

    return(np.array(bag))
Copy the code

p = bow("is your shop open today?" , words) print (p) [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0]Copy the code

Now we can start building our response handler.

ERROR_THRESHOLD = 0.25
def classify(sentence) :
    # generate probabilities from the model
    results = model.predict([bow(sentence, words)])[0]
    # filter out predictions below a threshold
    results = [[i,r] for i,r in enumerate(results) if r>ERROR_THRESHOLD]
    # sort by strength of probability
    results.sort(key=lambda x: x[1], reverse=True)
    return_list = []
    for r in results:
        return_list.append((classes[r[0]], r[1]))
    # return tuple of intent and probability
    return return_list

def response(sentence, userID='123', show_details=False) :
    results = classify(sentence)
    # if we have a classification then find the matching intent tag
    if results:
        # loop as long as there are matches to process
        while results:
            for i in intents['intents'] :# find a tag matching the first result
                if i['tag'] == results[0] [0] :# a random response from the intent
                    return print(random.choice(i['responses']))

            results.pop(0)
Copy the code

Each sentence passed to response() is classified. Our classifier uses the model.predict() function for category prediction, which is very fast. The probability value returned by the model is consistent with our defined intent, which is used to generate a list of potential responses.

If one or more classification results are higher than the threshold value, a label matching the intent will be selected and processed. We treat our list of categories as a stack and look for a suitable match from that stack until a best one is found or until the stack empties.

As an example, the model will return the most likely tag and its probability.

classify('is your shop open today? ')/(0.9264171123504639) 'opentoday',Copy the code

Notice, “Is your shop open today?” Not any pattern in this intention: “Pattern: [“Are you open today?”, “When do you open today?”, “What Are your hours today?”] “. However, the terms “open” and “today” are very useful for our patterns (they are decisive in choosing intents).

We now produce a result from user input data:

response('is your shop open today? ') Our hours are 9am-9pm every dayCopy the code

Some more examples:

response('do you take cash? ') We accept VISA, Mastercard and AMEX response('what kind of mopeds do you rent? ') We rent Yamaha, Piaggio and Vespa mopeds response('Goodbye, see you later') Bye! Come back again soon.Copy the code

Let’s design a chatbot with some basic context, such as a trailer rental chatbot.

context

We are calling to deal with a question about the rental of a motorcycle and ask some questions about the rental. It should be very easy to understand the user’s problem and the context should be clear. If the user asks for “today,” the contextual rental information goes into the time frame, so it’s best if you can also specify which bike it is, so you don’t waste time communicating.

To achieve this, we need to add another concept “state” to the framework. This requires a data structure to maintain this new concept and the original intent.

Because we need our state machine to be very easy to maintain, restore, copy and so on, it is important that we keep the data in a data structure such as a dictionary.

Next, we give the response process of the basic context:

# create a data structure to hold user context
context = {}

ERROR_THRESHOLD = 0.25
def classify(sentence) :
    # generate probabilities from the model
    results = model.predict([bow(sentence, words)])[0]
    # filter out predictions below a threshold
    results = [[i,r] for i,r in enumerate(results) if r>ERROR_THRESHOLD]
    # sort by strength of probability
    results.sort(key=lambda x: x[1], reverse=True)
    return_list = []
    for r in results:
        return_list.append((classes[r[0]], r[1]))
    # return tuple of intent and probability
    return return_list

def response(sentence, userID='123', show_details=False) :
    results = classify(sentence)
    # if we have a classification then find the matching intent tag
    if results:
        # loop as long as there are matches to process
        while results:
            for i in intents['intents'] :# find a tag matching the first result
                if i['tag'] == results[0] [0] :# set context for this intent if necessary
                    if 'context_set' in i:
                        if show_details: print ('context:', i['context_set'])
                        context[userID] = i['context_set']

                    # check if this intent is contextual and applies to this user's conversation
                    if not 'context_filter' in i or \
                        (userID in context and 'context_filter' in i and i['context_filter'] == context[userID]):
                        if show_details: print ('tag:', i['tag'])
                        # a random response from the intent
                        return print(random.choice(i['responses']))

            results.pop(0)
Copy the code

Our context state is a dictionary that will contain the state of each user. I’ll use a unique identity for each user (for example, cell#). This allows our framework and state machine colleagues to maintain the state of multiple users.

# create a data structure to hold user context
context = {}
Copy the code

We added context information to the intention processing flow as follows:

                if i['tag'] == results[0][0]:
                    # set context for this intent if necessary
                    if 'context_set' in i:
                        if show_details: print ('context:', i['context_set'])
                        context[userID] = i['context_set']

                    # check if this intent is contextual and applies to this user's conversation
                    if not 'context_filter' in i or \
                        (userID in context and 'context_filter' in i and i['context_filter'] == context[userID]):
                        if show_details: print ('tag:', i['tag'])
                        # a random response from the intent
                        return print(random.choice(i['responses']))
Copy the code

If an intent wants to set context information, we can do this:

{" tag ":" oped ", "Patterns" : [" Can we renta moped? ", "I'd like to renta moped",...] , "Responses" : [" Are you looking to rent today or later this week? "] , "context_set" : "rentalday}"
Copy the code

If another intent wants to be associated with the context, it can do this:

{" tag ":" today ", "Patterns" : [" Today "], "Responses" : [" For rentals today please call 1-800-mymoped ",... , "context_filter" : "rentalday}"Copy the code

In this way, if a user enters “today” without contextual information, the user intent for that “today” will not be processed. If the user enters “today” as a time response to us, the intention tag “rental” is triggered, then the intention will be processed.

response('we want to rent a moped')
Are you looking to rent today or later this week?

response('today')
Same-day rentals please call 1-800-MYMOPED
Copy the code

Our context information has also changed:

context
{'123': 'rentalday'}
Copy the code

We define our “greeting” intent to clear contextual information, just as we say hello, to signal the start of a new conversation. We also added the “show_details” parameter to help us see the information inside the application.

response("Hi there!" , show_details=True) context: '' tag: greeting Good to see you againCopy the code

Let’s try typing the word “today” again, and something interesting happens.

Response ('today') We're open every day from 9am-9pm classify('today') [('today', 0.5322513580322266), (' openToday ', 0.2611265480518341)]Copy the code

First of all, we respond differently to “today” without contextual information. Our classification yielded two suitable intents, but “OpenToday” was chosen. So the randomness is high, and context is important!

response("thanks, your great")
Happy to help!
Copy the code

Now the only thing to think about is how to place the conversation in context.

State handling

That’s right, your robot will be your personal robot, not so generic. Unless you want to rebuild the state, reload your model and document — you’ll need to load a model state every time you call your robot framework.

This is not that difficult, you can run a stateful chatbot framework in your own process and call it using RPC (Remote procedure Call) or RMI (remote method call), which I recommend using Pyro

User interfaces (clients) are usually stateless, such as HTTP or SMS.

Your chatbot client will be called via the Pyro function and your status service will be handled by it. Isn’t that cool?

Here’s a step-by-step guide to building a Twilio SMS bot client. Here’s a step-by-step guide to building a Facebook bot.

Do not store state in local variables

All state information must be placed in data structures such as dictionaries, which can be easily persisted, reloaded, or copied at atomic state.

Each user’s conversation and context are stored under the user ID, which must be unique.

One of the biggest concerns is that we will copy some of the user’s conversation information for scenario analysis, and if this information is stored in temporary variables, it will be very difficult to process.

So, now you’ve learned how to build a chatbot framework, a bot that can remember contextual information, and how to analyze text. Future chatbots will also be context-sensitive, which is a big trend.

We associate the construction of intentions with the conversational response of context, so we can create a variety of conversational environments.

Go and try it!

Source: Medium

Using TensorFlow to implement context chat-bots

1. We will use TensorFlow to write the dialogue intent model.

Build our chatbot framework

context

State handling

Do not store state in local variables

Related Posts

A Preliminary Study on advertising Model (II)

Handwritten digit recognition based on MATLAB RBF handwritten digit recognition

Construction of ElasticSearch system