Master leads the door 6 steps to teach you to run through an AI program!

Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~

This article was published by Cloud Computing Foundation in cloud + Community

Source code download address please click on the original view.

Learning machine learning for the first time, write an article mark, hoping to solve some doubts for those who will be in the pit. This article introduces some introduction to machine learning, from the installation environment to running the introduction to machine learning MNIST Demo.

Content outline:

Environment set up
Understand the Tensorflow operation mechanism
MNIST(handwritten number recognition) Softmax sex line regression
MNIST Deep Convolutional Neural Network (CNN)
The tools tools
CPU & GPU & multi GPU
Learning materials

1 Environment Building (Windows)

Install virtual environment Anaconda to facilitate Python package management and environment isolation.

Anaconda3 4.2 www.anaconda.com/downloads, built-in python 3.5.

Create a TensorFlow isolation environment. Open Anaconda Prompt and run the following command

Conda create -n tensorflow python=3.5Create a virtual environment called TensorFlow for Python 3.5
activate tensorflow Activate the environment
deactivate Exit the current virtual environment. You don't have to do that
Copy the code

The CPU version

pip install tensorflow Install via package management
pip install whl-file # Tensorflow-CPU install package by downloading WHL file: http://mirrors.oa.com/tensorflow/windows/cpu/tensorflow-1.2.1-cp35-cp35m-win_amd64.whl, cp35 refers to python3.5
Copy the code

* * the GPU version. ** My laptop is equipped with NVIDIA graphics card and can be installed with CUDA. The GPU is much faster than THE CPU, but the laptop’s video memory is not large, so the small model can still run. It is suggested to run the large model locally with CPU and train on Tesla platform.

Note: Choose the correct version of CUDA and cuDNN, not just the latest version, tensorFlow may not support it.Copy the code

Tensorflow now supports CUDA 9 & cuDNN 7. I installed CUDA 8 & cuDNN 6.

CUDA8.1 developer.nvidia.com/cuda-80-ga2…

Cudnn 6 developer.nvidia.com/cudnn cudnn bales decompression, put Files in the corresponding directory cuda installation, C: \ Program Files \ NVIDIA GPU Computing Toolkit \ cuda \ v8.0, Bin corresponds to bin, include corresponds to include, and then adds the bin directory to the environment variable path.

pip install tensorflow-gpu Install via package management
pip install whl-file # http://mirrors.oa.com/tensorflow/windows/gpu/tensorflow_gpu-1.2.1-cp35-cp35m-win_amd64.whl
Copy the code

Some Python toolkits installed. PIP install is the only way to install it

(tensorflow) D:\> pip install opencv-python   # OpencV, Tensoflow virtual environment
(tensorflow) D:\> pip install scipy    # image read write, scipy.misc.imread
(tensorflow) D:\> pip install Pillow   #PIL/Pillow, there is a pit, compressed PNG image, transparent channel quality drop in 1.x version resolution, upgrade
Copy the code

2 Understand the operating mechanism of Tensorflow

In the code.Note the notes

import tensorflow as tf
hello_world = tf.constant('Hello World! ', dtype=tf.string) # constant tensor
print(hello_world) Then hello_world is a tensor, you'll need to output an operation
#out: Tensor("Const:0", shape=(), dtype=string)
hello = tf.placeholder(dtype=tf.string, shape=[None])# placeholder tensor, assigned at sess.run
world = tf.placeholder(dtype=tf.string, shape=[None])
hello_world2 = hello+world # Add tensor
print(hello_world2)
#out: Tensor("add:0", shape=(? ,), dtype=string)
#mathX = tf. Variable ([1.0, 2.0])You will have a tensor.Y = tf.constant([3.0, 3.0]) mul = tf.multiply(x, y)Tensor dot product
#logicalrgb = tf.constant([[[255], [0], [126]]], dtype=tf.float32) logical = tf.logical_or(tf.greater(rgb,250.), tf.less(rgb, 5.)# logical operation, RGB >250 or <5 is marked as True, other False
where = tf.where(logical, tf.fill(tf.shape(rgb),1.), tf.fill(tf.shape(rgb),5.))The #True position is assigned 1 and the False position is assigned 5
# Enable the default image.
# sess = tf.Session()
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())# Variable initialization
    result = sess.run(hello_world) Fetch, get the tensor result
    print(result, result.decode(), hello_world.eval())#`t.eval()` is a shortcut for calling `tf.get_default_session().run(t)`.
    #out: b'Hello World! ' Hello World! b'Hello World! '# drop 'b' indicates byteString format, decode to string format
    print(sess.run(hello, feed_dict={hello: ['Hello']}))
    #out: ['Hello']
    print(sess.run(hello_world2, feed_dict={hello: ['Hello'], world: [' World! ']}))#Feed, placeholder assignment
    #out: [b'Hello World!']
    print(sess.run(mul))
    #out: [ 3. 6.]
    print(sess.run(logical))
    # out: [[[True] [True] [False]]] # RGB in 250 > or < 5 position marked as True, False
    print(sess.run(where))
    # out: [[[1] [1] [5]]] True position assignment # 1, False location assignment 5
#sess. Close ()#sess requires close if it is not defined with
Copy the code

Tensor. Tensor.run (Tensor), tensor.eval (), tensor.eval (), tensor.eval () A Tensor is A symbolic handle to one of the outputs of an Operation, It does not hold the values of that operation’s output, but instead provides a means of computing those values in a TensorFlow.
Tensorflow operation process: define calculation logic, construct Graph => through Session, obtain result data. See links for basic usage.

3 MNIST(handwritten number recognition) Softmax sex line regression

Analysis of the

MNIST is an entry-level computer vision dataset that contains a variety of handwritten digital images:

It also contains a tag for each image that tells us what the number is. For example, the four images above are labeled 5,0,4,1.

Data set image size 28×28, single channel grayscale. The storage style is as follows:

MNIST handwritten digit recognition is designed to input such a 28×28 image containing handwritten digits and predict the numbers contained in the image.

Softmax linear regression believes that the number in the image is N and the possibility is used by each pixel in the image

The probability of the number I, the probability of all the numbers (0-9), the confidence of all the numbers, and then the most likely number as the predicted value.

Evidence can be calculated as follows:

Among them

Represents the weight,

Represents the offset quantity of the digital class I, and j represents the pixel index of the given picture X (0~28×28=784) for pixel summation. That is, the sum of x weights of each pixel value of the image, plus a bias B, to obtain the probability value.

The purpose of introducing SoftMax is to normalize the possibilities so that the sum of all possibilities is 1. This converts these possibilities into probability y:

Begin to implement

data

X sample size 28×28 = 784

The Y sample looks like this

read

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True) #total 55000, one_hot, image X format 1 dimensional array, size 784
batch_xs, batch_ys = mnist.train.next_batch(batch_size) # Batch read
Copy the code

Build a Graph (Graph)

The Inference from input X to output predicted value Y

x = tf.placeholder(tf.float32, [None, 784], name="input")#None indicates batch size to be determined
with tf.variable_scope("inference") :[P]. Inference
    W = tf.Variable(tf.zeros([784, 10])) # initial value is 0, size 784x10
    b = tf.Variable(tf.zeros([10])) # initial value 0 size 10
    y = tf.matmul(x, W) + b # Matrix multiplication
Copy the code

Loss Loss function is generally classified by cross entropy. Here, Softmax cross entropy is used. Cross entropy is used to measure the difference information between two probability distributions. The formula of cross entropy is as follows:

with tf.variable_scope("loss"):
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y), name="loss")
    # Softmax cross entropy formula: z * -log(Softmax (x)) + (1-z) * -log(1-softmax (x)) # x: logits, z: label
Copy the code

There are many ways to calculate loss, including L1 Loss, L2 Loss, SigmoID cross entropy, combined loss, custom Loss…

Accuracy: The probability that the predicted value is the same as the actual value. The y value output by matrix multiplication is an array. The tF.argmax function may find the maximum element subscript from the data. If the maximum subscript of the predicted value is the same as the maximum subscript of the true value, it is correct.

with tf.variable_scope("accuracy"):
    accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)), tf.float32), name="accuracy")
Copy the code

Training. The goal of Training is to minimize Loss approximation and predict value approximation to true value. Tensorflow is implemented through Optimizers. ** In y = Wx+b, W and B will be assigned initial values (random or 0) at the beginning of the training. After Optimizer optimization, Loss approximates the minimum value, so that W and B constantly approach the ideal value. **W, b together 784×10+10 parameters.

train_step = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(loss)
Copy the code

The minimize function: Updates parameters to minimize Loss, including two steps: calculate the gradient; Update parameters.

grad_var = compute_gradients(loss) # return (gradient, variable) pairs
apply_gradients(grad_var) # Update parameters in the opposite direction of parameter gradient to make Loss smaller
Copy the code

GradientDescentOptimizer Tensorflow implements SGD (random gradient Descent). Its disadvantage is that it depends on the current batch and fluctuates greatly.

Other Optimizers: See links. MomentumOptimizer, MomentumOptimizer, AdadeltaOptimizer, AdamOptimizer, RMSPropOptimizer, AdadeltaOptimizer…

Session: Tensorflow requires a Session to perform reasoning. There are two ways to create a Tensorflow. InteractiveSession sets itself up as a default Session, and with that Session, tensor.eval() can be executed.

sess = tf.Session()
sess = tf.InteractiveSession()
Copy the code

You can also set the default session by:

with sess.as_default(): xx.eval()
with tf.Session() as sess: xx.eval()
Copy the code

Configure gPU-related session parameters:

sess_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)# allow software emulation with no GPU or insufficient GPU
sess_config.gpu_options.allow_growth = True Dynamic application for video memory. Not adding will apply all, resulting in other training procedures can not run
#sess_config.gpu_options.per_process_gpu_memory_fraction = 0.8
sess = tf.InteractiveSession(config=sess_config)
Copy the code

The training process of a network is an iterative (forward + reverse) process. The forward algorithm calculates the output of each layer of the network from front to back, and the reverse algorithm calculates the parameter gradient from back to front to optimize the parameters and reduce Loss. The process is shown as follows:

Note: Output network Loss and Accuracy every once in a while to check the effect. Cache network parameters every once in a while to prevent sudden interruption and restore them.

Saving and recovery of model parameters:

Check point: default saving mode.

Pb: used by mobile.

NPZ: dictionary saving mode, {name: value}, numpy saving mode. Save parameters by name and restore parameters by name. The save and restore methods are self-controlled and can be selectively saved and restored. See [tools.py] save_npz_dict & load_and_assign_npz_dict in nearby code.

saver = tf.train.Saver(max_to_keep = 3, write_version = 2)
save_path = saver.save(sess, FLAGS.out_model_dir+'/model.ckpt')# Check point
output_graph_def = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, output_node_names=['output'])Specify the output node name, which needs to be defined in the network
with tf.gfile.FastGFile(FLAGS.out_model_dir+'/mobile-model.pb', mode='wb') as f:
    f.write(output_graph_def.SerializeToString()) # pb way
tools.save_npz_dict(save_list=tf.global_variables(), name=FLAGS.out_model_dir+'/model.npz', sess=sess) # PNZ way
Copy the code

Recovery:

#check point
saver = tf.train.Saver(max_to_keep = 3, write_version = 2)
model_file=tf.train.latest_checkpoint(FLAGS.log_dir)
if model_file:
     saver.restore(sess, model_file)
#npz
tools.load_and_assign_npz_dict(name=FLAGS.log_dir+'/model.npz', sess=sess)) Displays information about network parameters to check whether network parameters are correct.Copy the code

def print_all_variables(train_only=False):
    if train_only:
        t_vars = tf.trainable_variables()
        print(" [*] printing trainable variables")
    else:
        t_vars = tf.global_variables()
        print(" [*] printing global variables")
    for idx, v in enumerate(t_vars):
        print(" var {:3}: {:15} {}".format(idx, str(v.get_shape()), v.name))
Copy the code

Visualization. Tensorflow provides the tensorBoard visualization tool. You can run a command to open a Web service and enter the url http://localhost:6006 to view it

tensorboard --logdir=your-log-path Do not use Chinese in #path
The corresponding log path needs to be specified in the training process to write relevant information
# Refer to the summary and writer-related keyword codes in attachment [sample.py].
Copy the code

Graph visualization:

Visualization of training process:

Batch size = 128, training set, validation set It can be seen that loss is converging and accuracy is improving. Because the training set curve reflects the loss and accuracy of the current batch, the batch size is relatively low and the jitter is large. And the verification set is all the pictures to test, the curve is smooth.

4 MNIST Deep Convolutional Neural Network (CNN)

In Softmax linear regression network, output Y is a linear combination of input X, that is, y = Wx+b, which is a linear relationship. In many problems, the solution can not be completed by linear relation. In deep learning, multi-layer convolutional neural network can combine nonlinear activation function to simulate more complex nonlinear relation, and the effect is often better than a single linear relation. Firstly, the MNIST prediction model constructed by CNN (Convolutional Neural Network) is studied, and then each Network layer is introduced one by one.

MNIST CNN Inference diagram. Multiple network layers are 0 0 in between the input and output: 0 0 Conv Convolution Pool Full Link FC Dropout The original image data X was input from bottom up and then the probability prediction output Y of each digital classification was obtained after serial processing at each layer. The results of Inference were transferred to Loss for iterative training, as shown in the figure

You can see that the AdamOptimizer is used.

0 0 Is changing the logical structure of the data, e.g. [1, 784] => [1, 28, 28, 1] 0 Here, since the handwritten image stored in the input data store is one-dimensional data, the format is changed to [batch_size, height, width, Channels]

    with tf.name_scope('reshape') :#scope
        inputs = tf.reshape(inputs, [-1, 28, 28, 1])
        #[BATch_size, height, width, channels], Batch size=-1 indicates inputs
        #batch_size=inputs_size/(28x28x1)
Copy the code

Conv2d convolution, convolution kernel (yellow) is multiplied by Image element (green), and the output element value (red) is obtained by summing. Each Channel (Channel) of Image corresponds to a different convolution kernel, and parameters of the convolution kernel in the Channel are shared. All input channels are multiplied by their kernel to obtain a channel value of the output. If the output has multiple channels, it will repeat many times, and the kernel is also different. So there’s going to be input_channel_count * output_channel_count convolution cores. What is trained in the convolution layer is the convolution kernel.

def conv2d(x, W): #W: filter [kernel[0], kernel[1], in_channels, out_channels]
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
Copy the code

Tf. Nn. Conv2d:

Data_format: logical structure of input and output data. NHWC: Batch height width channel. NCHW: Batch channel height width. The NHWC format is commonly used. In some input data, each channel data is stored separately, which is more suitable for NCHW.

Input: Input, data_format=NHWC, shape is Batch, in_height, in_width, in_channels, Tensor.

The filter: Convolution kernel, shape is filter_height, filter_width, in_channels, out_channels, There are convolution kernels of in_channels*out_channels (filter_height, filter_width). The more input-output channels, the more calculation.

Strides: strides is 1, stride_h, stride_w, 1. Generally, stride_H and stride_w are equal, indicating the number of steps each time in the vertical and horizontal direction of the convolution kernel. The stride for the GIF is 1.

Padding: padding method when data is not aligned in convolution calculation; VALID: discard redundant; SAME: add zeros at both ends so that the excess can be counted.

Output: The shape is batch, out_height, out_width, out_channels

output[b, i, j, k] =
    sum_{di, dj, q} input[b, strides[1] * i + di, strides[2] * j + dj, q] *
                    filter[di, dj, q, k]
Copy the code

Activation function, used in conjunction with convolution. Activation function does not really want to activate anything. In neural network, the function of activation function is to add some nonlinear factors to the neural network, so that the neural network can better solve complex problems.

Tf.nn. relu is the activation function, which performs nonlinear processing on the convolution output, and its function is as follows:

Others include sigmoid:

Tanh:

Pool pooling, including maximum pooling and average pooling. The calculation is similar to convolution calculation, but without convolution kernel, the maximum or average value of the range covered by the kernel is calculated. The input channel corresponds to the output channel, without multi-layer accumulation. The input is the same as the output number of channels. The output height and width depend on strides.

if is_max_pool:
    x = tf.nn.max_pool(x, [1,kernel[0],kernel[1],1], strides=[1,stride[0],stride[1],1], padding=padding, name='pool')
else:
    x = tf.nn.avg_pool(x, [1,kernel[0],kernel[1],1], strides=[1,stride[0],stride[1],1], padding=padding, name='pool')
Copy the code

Dropout, which randomly deletes data so that the network can train accurate results on the deleted data, makes the network more adaptable and less overfitting.

x = tf.nn.dropout(x, keep_prob) # Keep_prob retention ratio, keep_prob=1.0 indicates no dropout
Copy the code

Batch normalize (BN) : indicates batch normalization. Not listed in Inference and not used in demo, but it is also a layer commonly used in network. BN usually acts before nonlinear mapping, that is, normalizing Conv results. The general order is convolution -> BN -> activation function.

BN benefits: improve training speed, accelerate loss convergence, increase network adaptability, and solve gradient disappearance and explosion problems in the process of back propagation by certain procedures. Please stamp for details.

The core of Full Connection (FC) is matrix multiplication

Softmax sex line regression is an FC. In CNN, full connection often appears in the last few layers and is used to make a weighted sum of the features previously designed. Tensorflow provides the corresponding function tf. Layers. Dense.

The following figure shows the shape of the parameters to be trained in the model and the shape of the output data of each layer (when batch_size=1). The relevant code is in the attachment [tool.py]. The purpose is to facilitate the view of their own build network structure is in line with expectations. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 channels) -conv-> [1x28x28x32] -pool-> [1x14x14x32] -conv-> [1x14x14x64] -pool-> [1x7x7x64] -fc-> [1×1024] -fc-> [1×10] (probability of each type of numbers)

Training effect, please refer to attachment [cnn.py] for detailed code.

A web visual handwriting recognition DEMO, scs.ryerson. Ca /~aharley/vi…
CNN family of classic networks, such as LeNet, AlexNet, VGG-Net, GoogLeNet, ResNet, U-Net, FPN. They are also made up of basic network layer elements (described above) stacked like building blocks.

VGG, pictured below, is a well-known feature extraction and classification network. It consists of multi-layer convolutional pooling layer, and finally uses FC for feature fusion to achieve classification. Many networks do feature extraction based on the previous convolutional pooling layer, and then develop their own businesses.

5 Tool Tool class

[tool.py] is a self-wrapped utility class based on TensorFlow, which is attached. The advantage is that the future programming is more convenient, the code structure is better. There are also ready-made open source libraries on the Internet, such as TensorLayer, Keras and Tflearn. The purpose of self-packaging is to better understand the TensorFlow API, and the controllability of self-packaging is stronger, if the control is whether the parameters are trained and log printed.

The Inference code of MNIST CNN network is shown below:

6 CPU & GPU & multi GPU

CPU, Tensorflow default all cpus/CPU :0, default to all cpus, can be specified by code.

sess_config = tf.ConfigProto(device_count={"CPU": 14}, allow_soft_placement=True, log_device_placement=False)
sess_config.intra_op_parallelism_threads = 56
sess_config.inter_op_parallelism_threads = 56
sess = tf.InteractiveSession(config=sess_config)
Copy the code

GPU, Tensorflow default occupied/GPU :0, you can specify the device to determine which GPU the code is running on. The following

with tf.device('/device:GPU:2') : a = tf. Constant ([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape = [2, 3], name ='a') b = tf. Constant ([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape = (3, 2), name ='b')
   c = tf.matmul(a, b)
# The following code configuration prevents the GPU from being fully occupied, using as much memory as possiblesess_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False) sess_config.gpu_options.allow_growth = True Sess_config.gpu_options. per_process_gpu_memory_fraction = 0.8sess = tf.interactivesession (config=sess_config)Copy the code

With multiple Gpus, you can control which Gpus your program uses by running the following command on the terminal to set CUDA visible GPU blocks.

exportCUDA_VISIBLE_DEVICES = 2, 3Copy the code

In Tensorflow, multi-GPU programming is more awkward, the data is better, and the code writing is more complex than Caffe’s.

In Tensorflow you need to write your own code to control the multi-GPU loss and gradient merge. Here is an official example. [tmp-main-gpus- not available. Py] can be used for reference, but it is not available here.

7 Learning Materials

I have collected some machine learning related materials to share with you. I have only seen a very small part, still learning….

Google recently crash course developers out of machine learning. Google. Cn/machine – lea…
Machine Learning by Andrew Ng open.163.com/special/ope…
I love the machine learning blog www.52ml.net/
XiaoLei zhuanlan.zhihu.com/xiaoleimlno machine learning notes…
Lilicao blog www.cnblogs.com/lillylin/
Cats and Dogs fight Zhihu column zhuanlan.zhihu.com/alpha-smart…
Computer vision, machine learning field source collection zhuanlan.zhihu.com/p/26691794
A collection object detection handong1587. Making. IO/deep_learni…
Machine Learning Notes Feisky.xyz/MACHINE – Lea…

Question and answer

How to synthesize human voice using ARTIFICIAL intelligence?

reading

AI landing practice in multiple scenarios

AI basic knowledge overview | artificial intelligence institute

Artificial intelligence, angel or devil

Cloud, college courses, special recommend | tencent technology test team leader, in combination with 8 years experience in detail for you hot and cold separation principle

This article has been authorized by the author to Tencent Cloud + community, more original text pleaseClick on the

Search concern public number “cloud plus community”, the first time to obtain technical dry goods, after concern reply 1024 send you a technical course gift package!

Massive technical practice experience, all in the cloud plus community!

Master leads the door 6 steps to teach you to run through an AI program!

Content outline:

1 Environment Building (Windows)

2 Understand the operating mechanism of Tensorflow

3 MNIST(handwritten number recognition) Softmax sex line regression

4 MNIST Deep Convolutional Neural Network (CNN)

5 Tool Tool class

6 CPU & GPU & multi GPU

7 Learning Materials

Related Posts

EasyLeetCode02, add the two numbers

Zhouyi Compass Deployment and Simulation (10) | August challenge

Nearly 100 presentations, 14 immersive application scenarios… Don’t don’t believe it! This is a real brain-burning event