Tensorflow is based on the deep learning framework of graph structure and realizes the interaction between graph and computing kernel through session.

Tensorflow basic mathematical operation usage.

Import tensorFlow as TF sess = tf.session () a = tf.placeholder("float") b = tf.placeholder("float") c = tf.constant(6.0)  d = tf.mul(a, b) y = tf.mul(d, c) print sess.run(y, feed_dict={a: 3, b: A = 3}) [[1.1, 2.3], [3.4, 4.1]] Y = tf. Matrix_inverse (A) print sess. Run (Y) sess. Close ()Copy the code

Major number operations.

tf.add
tf.sub
tf.mul
tf.div
tf.mod
tf.abs
tf.neg
tf.sign
tf.inv
tf.square
tf.round
tf.sqrt
tf.pow
tf.exp
tf.log
tf.maximum
tf.minimum
tf.cos
tf.sin
Copy the code

Major matrix operations.

Tf.diag # generating a diagonal matrix tF.transpose tF.matmul tf.matrix_crude # Computing the value of the determinant tf.matrix_inverse # Computing the inverse of the matrixCopy the code

Tensorboard use. Tensorflow code, first build graph, and then execute, the intermediate process debugging is not convenient, provide a tensorboard tool debugging. Training prompts you to write event files to the directory (/ TMP/TFLEARn_logs /11U8M4/). Execute the command open http://192.168.1.101:6006 see tensorboard interface.

tensorboard --logdir=/tmp/tflearn_logs/11U8M4/
Copy the code

Graph and the Session.

import tensorflow as tf with tf.Graph().as_default() as g: with g.name_scope("myscope") as scope: # With this scope, Sess = tf.Session(target= ", graph = g, Placeholder); Print "graph version:", G sion # 0 a = tf.placeholder("float") print a. placeholder Print "graph version:", g. sion # 1 b = tf.placeholder("float") print "graph version:", g.version # 2 c = tf.placeholder("float") print "graph version:", g.version # 3 y1 = tf.mul(a, A * b print "graph version:", g. sion # 4 y2 = tf.mul(y1, c) # g.version # 5 operations = g.get_operations() for (i, op) in enumerate(operations): Print "= = = = = = = = = = = = operation", I + 1, "= = = = = = = = = = =" print op # a structure, including: Name, op, attr, input, etc. Assert 1.graph is g assert sess.graph is g print "================ graph object address ================" print sess.graph print "================ graph define ================" print sess.graph_def print "================ sess str ================" print sess.sess_str print sess.run(y1, feed_dict={a: 3, b: Print sess.run(fetches=[b,y1], feed_dict={a: 3, b: Print sess.run({'ret_name':y1}, feed_dict={a: 3, b:}, options=None, run_metadata=None) print sess.run({'ret_name':y1}, feed_dict={a: 3, b:}) 3}) # {'ret_name': 9.0} Assert tf.get_default_session() is not sess with sess.as_default(): # select sess as the default session Tf.get_default_session () is sess h = sess.partial_run_setup([y1, y2], [a, Res = sess.partial_run(h, y1, feed_dict={a: 3, b: {a: 3, b: {a: 3, b: {a: 3, b: {a: 3, b:}) Res = sess.partial_run(h, y2, feed_dict={c: Print "partial_run res:", res sess.close()Copy the code

Tensorflow Session is Graph and executor medium. Session.run() serializes Graph, fetches, and feed_dict into byte arrays. Call tf_session. TF_Run (see/usr/local/lib/python2.7 / site – packages/tensorflow/python/client/session. Py). Tf_session. Dynamic link library _pywrap_tensorflow TF_Run call. So _pywrap_tensorflow. TF_Run interface (see/usr/local/lib/python2.7 / site – packages/tens Orflow/python/pywrap_tensorflow. Py). The Dynamic Link library is the TensorFlow multilingual Python interface. _pywrap_tensorflow.so and Pywrap_tensorflow. py are automatically generated by SWIG, tensorFlow core language C, and various scripting language interfaces are generated by SWIG.

10 lines of critical code for linear regression. Solving linear regression problems with gradient descent is the simplest introduction to TensorFlow (10 lines of critical code).

# -*- coding: Utf-8 -* -import numpy as np import tensorflow as tf # Vectors_set = [] for I in xrange(num_points): vectors_set = [] for I in xrange(num_points): X1 = np.random. Normal (0.0, 0.55) y1 = x1 * 0.1 + 0.3 + np.random. Normal (0.0, 0.03) vectors_set.appEnd (x1, 0.55) X_data = [v[0] for v in vectors_set] y_data = [v[1] for v in vectors_set] Variable(tf.random_Uniform ([1], -1.0, 1.0), name='W') B = tf.variable (tf.zeros([1]) Name ='b') # Calculate the estimated value y y = W * x_data + b # Take the mean square error between estimated value Y and actual value y_data as the loss Name = 'loss') # gradient descent method is used to optimize the parameters of the optimizer = tf. Train. GradientDescentOptimizer (0.5) # training process is to minimize the error value "train" = optimizer.minimize(loss, Name ='train') sess = tf.session () #print sess.graph_def init = tf.initialize_all_variables() sess.run(init) # Print "W =", sess.run(W), "b =", sess.run(b), "loss =", sess.run(loss) # execute 20 times for step in xrange(20): Sess.run (W) print "W =", sess.run(W), "b =", sess.run(b), "loss =", sess.run(loss) For tensorBoard use writer = tf.train.summaryWriter ("./ TMP ", sess.graph)Copy the code

A diagram shows how linear regression works. Tensorboard to read data, execute:

tensorboard --logdir=./tmp/
Copy the code

Open http://localhost:6006/ GRAPHS to expand a list of key nodes. Graph is the graph structure generated by the code. Graph describes the whole process of gradient descent to solve the linear regression problem. Each node represents a step of the code.

Analyze linear regression graph in detail. W and b. The code has three operations on W Assign, read, and train. Assign is based on random_UNIFORM assignment.

W = tf. Variable (tf) random_uniform ([1], 1.0, 1.0), the name = 'W')Copy the code

Tf. Random_uniform graph. Read the corresponding:

y = W * x_data + b
Copy the code

Train corresponds to the gradient descent training process operation.

There are three operations on B: Assign, read, and train. Initialize the value with Zeros assignment. W and B calculate update_W and update_b by gradient descent, and update the values of W and B. Update_W and update_b are calculated based on three inputs: learning_rate, W/b current value, and gradients. The critical gradient descent process.

loss = tf.reduce_mean(tf.square(y - y_data), name='loss')
Copy the code

Take y-y_data as input. X is not x_data, but a temporary constant 2. Two times y minus y_data, which is obviously the derivative of y minus y_data squared. Take 2(y-y_data) as input and generate incremental update_b of parameter B after various processing. Add_grad (based on y-y_data) and W (based on y-y_data) and y (based on add_grad). http://stackoverflow.com/questions/39580427/how-does-tensorflow-calculate-the-gradients-for-the-tf-train-gradientdescent Opti, a simple operation in one step is converted into many node diagrams by Tensorflow. Detailed nodes are not analyzed in depth, but only expressed in operation diagrams, which is not of great significance.

Tensorflow’s seQ2SEQ model is based on one-hot word embedding. Each word is replaced by a number to represent the relationship between words. Word2vec multidimensional vector is used for word embedding to represent the relationship between words. Based on THE idea of SEQ2SEQ, the multi-dimensional word vector is used to realize the model, which is expected to have higher accuracy.

Principle of SEQ2SEQ model. For reference, Sequence to Sequence Learning with Neural Networks. Core idea: ABC is the input statement, WXYZ is the output statement, EOS is to mark the end of a sentence, and LSTM is the training unit. LSTM is characterized by long and short memory, which can determine the following words according to the input of multiple words. LSTM knowledge reference http://deeplearning.net/tutorial/lstm.html model encoder and decoder share the same LSTM layer, Shared parameters, Separate https://github.com/farizrahman4u/seq2seq green is encoder, yellow is decoder, orange arrow LSTM layer status message (memory), only to the state of the decoder encoder information. Each timing input of the decoder is the previous timing output. By inputting “How are you” in different timing, the model can automatically output “W I am fine” word by word. W is the special identifier, the last output of the encoder and the trigger signal of the decoder. Each timing input of the decoder is directly forced to be changed to “W I am fine”, which is transmitted from the training sample input X, and Y is still the predictive output “W I am fine”. In this way, the trained model is the encoder decoder model. Using the training model prediction, “W I am fine “can be output by predicting the previous sequential output as the input during decoding.

Corpus preparation. At least 300W chat corpus is used for word vector training and SEQ2SEQ model training, and the richer the corpus, the better the quality of word vector training. Cut the word:

python word_segment.py ./corpus.raw ./corpus.segment
Copy the code

Cut the word files into “|” disjunctive question and answer to:

cat ./corpus.segment | awk '{if(last! ="")print last"|"$0; last=$0}' | sed 's/| /|/g' > ./corpus.segment.pairCopy the code

Train word vectors. Use Google Word2vec to train word vectors:

word2vec -train ./corpus.segment -output vectors.bin -cbow 1 -size 200 -window 8 -negative 25 -hs 0 -sample 1e-5 -threads 20 -binary 1 -iter 15
Copy the code

Corpus. Raw Raw corpus data, vectors. Bin Generated word vector binary file. https://github.com/warmheartli/ChatBotCourse/blob/master/word_vectors_loader.py generated word vector binary loading method.

Create the model. Use the tensorflow+ TFLearn library to achieve.

# First we apply variable space for the input sample data, as follows. Where self.max_seq_len refers to the maximum number of words in a cut word sentence, self. word_vec_DIM is the dimension of the word vector, where SHAPE specifies that the input data is an indefinite number of samples, each sample contains max_seq_len*2 words. Each word is represented as a word_VEC_DIM dimensional floating point number. Input_data = tflearn. Input_data (shape=[None, Self.max_seq_len *2, self.word_vec_DIM], dtype=tf.float32, name = "XY") # That's the question sentence part, Encoder_inputs = tf.slice(input_data, [0, 0, 0], [-1, self.max_seq_len, self.word_vec_dim], inputs for encoder_inputs = tf.slice(input_data, [0, 0, 0], [-1, self.max_seq_len, self.word_vec_dim], Name ="enc_in") # select max_seq_len-1, i.e. answer sentence part, as the input of decoder. Notice that we only take max_seq_len-1 because we have to put a GO flag in front of it to tell the decoder that we're going to start decoding, Inputs decoder_inputs_tmp = tf.slice(input_data, [0, self.max_seq_len, 0], [-1, self.max_seq_len-1, self.word_vec_dim], name="dec_in_tmp") go_inputs = tf.ones_like(decoder_inputs_tmp) go_inputs = tf.slice(go_inputs, [0, 0, 0], [-1, 1, Self. word_vec_dim]) decoder_inputs = tf.concat(1, [go_inputs, decoder_inputs_tmp], name="dec_in") The returned encoder_output_tensor expands into tflearn. Regression can recognize the form (? , 1, 200); States = tflearn. LSTM (Encoder_inputs, self. word_vec_DIM, return_state=True, encoder_inputs, self. Scope ='encoder_lstm') encoder_output_sequence = tf.pack([encoder_output_tensor], axis=1) Inputs: GO first_dec_input = tf.slice(inputs, [0, 0, 0], [-1, 1, self.word_vec_dim]) The initialization state of the decoder is the States generated by the encoder. Note: Decoder_output_tensor = tflearn.lstm(first_dec_input, self.word_vec_DIM, initial_state=states, return_seq=False, reuse=False, Scope =' decoder_lSTm ') # decoder_output_sequence_single = decoder_output_sequence_single = Tf. pack([decoder_output_tensor], axis=1) decoder_output_sequence_list = [decoder_output_tensor] # then loop max_seq_len-1 Take each word vector of DECOder_inputs for the next decoder input and add the result to the decoder_output_sequence_list. Reuse =True, scope=' decoder_lSTm 'reuse=True, scope=' decoder_lSTm' reuse=True, scope=' decoder_lSTm 'reuse=True, scope=' decoder_lSTm' next_dec_input = tf.slice(decoder_inputs, [0, i+1, 0], [-1, 1, self.word_vec_dim]) decoder_output_tensor = tflearn.lstm(next_dec_input, self.word_vec_dim, return_seq=False, reuse=True, scope='decoder_lstm') decoder_output_sequence_single = tf.pack([decoder_output_tensor], Axis =1) decoder_output_sequence_list.append(decoder_output_tensor) # Decoder_output_sequence = tf.pack(decoder_output_sequence_list, decoder_output_sequence_list, axis=1) real_output_sequence = tf.concat(1, [encoder_output_sequence, Decoder_output_sequence]) net = tfLear.regression (real_output_sequence, optimizer=' SGD ', learning_rate=0.1, loss='mean_square') model = tflearn.DNN(net)Copy the code

After model creation, summarize ideas:

1) The training input X and Y are the encoder decoder input and predictive output respectively; 2) X is divided into two halves, the first half is the encoder input, and the last half is the decoder input; 4) The real value of the sample is used as the decoder input for training. There will be no WXYZ part in the actual prediction, and the output of the previous time sequence will be used as the input of the next time sequenceCopy the code

Training model. Instantiate the model and feed the data for training:

model = self.model()
model.fit(trainXY, trainY, n_epoch=1000, snapshot_epoch=False, batch_size=1)
model.load('./model/model')
Copy the code

TrainXY and trainY are assigned by loading corpus.

Load the word vector to word_vector_dict, read the corpus file to look up word_vector_dict word by word, and assign the vector to question_seq and answer_seq:

Def init_seq(input_file): """ file_object = open(input_file, 'r') vocab_dict = {} question_seq = [] answer_seq = [] line = file_object.readline() if line: line_pair = line.split('|') line_question = line_pair[0] line_answer = line_pair[1] for word in line_question.decode('utf-8').split(' '): if word_vector_dict.has_key(word): question_seq.append(word_vector_dict[word]) for word in line_answer.decode('utf-8').split(' '): if word_vector_dict.has_key(word): answer_seq.append(word_vector_dict[word]) else: break question_seqs.append(question_seq) answer_seqs.append(answer_seq) file_object.close()Copy the code

Question_seq and answer_seq, construct trainXY and trainY:

def generate_trainig_data(self): xy_data = [] y_data = [] for i in range(len(question_seqs)): question_seq = question_seqs[i] answer_seq = answer_seqs[i] if len(question_seq) < self.max_seq_len and len(answer_seq) < self.max_seq_len: sequence_xy = [np.zeros(self.word_vec_dim)] * (self.max_seq_len-len(question_seq)) + list(reversed(question_seq)) sequence_y = answer_seq + [np.zeros(self.word_vec_dim)] * (self.max_seq_len-len(answer_seq)) sequence_xy = sequence_xy +  sequence_y sequence_y = [np.ones(self.word_vec_dim)] + sequence_y xy_data.append(sequence_xy) y_data.append(sequence_y)  return np.array(xy_data), np.array(y_data)Copy the code

Construct training data creation model, training:

python my_seq2seq_v2.py train
Copy the code

Finally generate./model/model model file.

Effect prediction. To train the model, enter a sentence to predict the answer:

predict = model.predict(testXY)
Copy the code

TestXY has no Y part and the last output is used as the next input:

for i in range(self.max_seq_len-1): # next_dec_input = tf.slice(decoder_inputs, [0, i+1, 0], [-1, 1, Self. word_vec_DIM] next_dec_input = decoder_output_sequence_single decoder_output_tensor = tflearn.lstm(next_dec_input, self.word_vec_dim, return_seq=False, reuse=True, scope='decoder_lstm') decoder_output_sequence_single = tf.pack([decoder_output_tensor], axis=1) decoder_output_sequence_list.append(decoder_output_tensor)Copy the code

Word vector is a multidimensional floating point number, and word vector is predicted by cosine similarity matching. Cosine similarity matching method:

def vector2word(vector):
    max_cos = -10000
    match_word = ''
    for word in word_vector_dict:
        v = word_vector_dict[word]
        cosine = vector_cosine(vector, v)
        if cosine > max_cos:
            max_cos = cosine
            match_word = word
    return (match_word, max_cos)
Copy the code

There were a few vector_cosine implementations:

def vector_cosine(v1, v2): if len(v1) ! = len(v2): sys.exit(1) sqrtlen1 = vector_sqrtlen(v1) sqrtlen2 = vector_sqrtlen(v2) value = 0 for item1, item2 in zip(v1, v2): value += item1 * item2 return value / (sqrtlen1*sqrtlen2) def vector_sqrtlen(vector): len = 0 for item in vector: len += item * item len = math.sqrt(len) return lenCopy the code

Prediction:

python my_seq2seq_v2.py test test.data
Copy the code

The first column of output is to predict the cosine similarity between the output vector and the nearest word vector, and the third column is to predict the Euclidean distance of the prediction vector. Max_seq_len with a fixed length of 8, the output sequence will end with some extra words, and a threshold truncation is set according to cosine similarity or other indicators. All the code https://github.com/warmheartli/ChatBotCourse/blob/master/chatbotv2/my_seq2seq_v2.py.

References: “Python natural language processing”, “me basic tutorial With me and Python library build machine learning application “http://www.shareditor.com/blogshow?blogId=119 http://www.shareditor.com/blogshow?blogId=120 http://www.shareditor.com/blogshow?blogId=121

Welcome to recommend machine learning opportunities in Shanghai. My wechat account is Qingxingfengzi