The wave of artificial intelligence is sweeping the world, many words are always lingering in our ears, such as artificial intelligence, machine learning, deep learning and so on. The concept of “artificial intelligence” was first proposed as early as 1956, using computers to construct complex machines with the same essential properties as human intelligence. After decades of development, ARTIFICIAL intelligence exploded after 2012, thanks to rising data volumes, improved computing power and the emergence of machine learning algorithms (deep learning). But current research is focused on weak ARTIFICIAL intelligence, which allows machines to see and feel and to understand and reason with a degree of understanding, and is expected to lead to some major breakthroughs in this area. Most of the ARTIFICIAL intelligence in movies is about strong artificial intelligence, that is, enabling machines to acquire the ability to adapt to solve some problems they have not encountered before, which is difficult to be realized in the real world.

If there is any hope of a breakthrough in ARTIFICIAL intelligence, how is it achieved and where does “intelligence” come from? This is largely due to a method of implementing ARTIFICIAL intelligence called machine learning.

I. Machine learning concepts

Machine learning is an approach to artificial intelligence.

At its most basic, machine learning uses algorithms to parse data and learn from it, then make decisions and predictions about events in the real world. Unlike traditional software programs that are hard-coded to solve specific tasks, machine learning takes large amounts of data and “trains” them, using algorithms to learn from the data how to perform a task. Machine learning originated from the early field of artificial intelligence. Traditional algorithms include decision tree, clustering, Bayesian classification, support vector machine, EM, Adaboost and so on. In terms of learning methods, machine learning algorithms can be divided into supervised learning (such as classification problems), unsupervised learning (such as clustering problems), semi-supervised learning, integrated learning, deep learning and reinforcement learning.

The application of traditional machine learning algorithms in fingerprint recognition, face detection based on Haar, object detection based on HoG feature and other fields has basically reached the requirements of commercialization or the commercialization level of specific scenes, but every step is extremely difficult until the emergence of deep learning algorithm.

2. Concept of deep learning

Deep learning is a technology that enables machine learning.

It is not an independent learning method, but also uses supervised and unsupervised learning methods to train deep neural networks. However, due to the rapid development of this field in recent years, some unique learning methods have been proposed (such as residual network), so more and more people regard it as a learning method alone.

The original deep learning is a learning process that uses deep neural networks to solve feature expression. Deep neural network itself is not a new concept, which can be roughly understood as a neural network structure containing multiple hidden layers. In order to improve the training effect of deep neural network, the neuron connection method and activation function are adjusted accordingly. In fact, a lot of ideas had been put forward in the early years, but due to the lack of training data and backward computing ability, the final effect was not satisfactory. Deep learning, as the hottest machine learning method at present, is not meant to be the end of machine learning. There are at least the following problems:

1. The deep learning model needs a large amount of training data to show the magic effect, but in real life, it often encounters the problem of small samples. In this case, the deep learning method cannot be started, and the traditional machine learning method can deal with it.

2. In some fields, traditional simple machine learning methods can be well solved, and complex deep learning methods are not necessary;

The idea of deep learning is inspired by the human brain, but it is by no means a simulation of the human brain.

Therefore, there is also a difference between machine learning frameworks and deep learning frameworks. Essentially, the machine learning framework covers a variety of learning methods for classification, regression, clustering, anomaly detection, and data preparation, and can also include neural network methods. The deep learning or deep neural network (DNN) framework covers a variety of neural network topologies with many hidden layers, including multi-step processes for pattern recognition. The more layers in the network, the more complex the features that can be extracted for clustering and classification. Common Caffe, CNTK, DeepLearning4j, Keras, MXNet and TensorFlow are deep learning frameworks. Scikit-learning and Spark MLlib are machine learning frameworks. Theano straddles both categories.

The rest of this article will focus on caffe, TensorFlow, and Keras. Scikit-learning and Spark MLlib are more suitable if you only need to use traditional machine learning algorithms.

3. Comparison of deep learning frameworks

Neural network generally includes two stages: training and testing. Training is the process of extracting model parameters by CPU or GPU from training data and neural network model (AlexNet, RNN and other neural network training frameworks Caffe, etc.). The test is to run the test data with the trained model (neural network model + model parameters) and check the results. Caffe, Keras and TensorFlow unify and abstract the data involved in the training process to form a usable framework.

(a) Caffe

1, concept,

Caffe is a clear and efficient deep learning framework and a widely used open source deep learning framework. Caffe was the Github star of deep learning until Tensorflow came along. The main advantages are: Easy to use, network structure is defined in the form of configuration files, do not need to use code to design the network. Training speed, modular components, can be easily expanded to new models and learning tasks. However, Caffe’s initial design target is only for images, without considering text, speech or time series data. Therefore, Caffe has good support for convolutional neural network, but not sufficient support for time series RNN and LSTM. The Models folder of Caffe project contains many commonly used network models, such as Lenet, AlexNet, ZFNet, VGGNet, GoogleNet, and ResNet.

2. Module structure of Caffe

Caffe abstracts the data in the network from low to high into Blob, each network Layer into Layer, the whole network into Net, and the solution method of the network model into Solver.

1.Blob refers to data in the network, including training data, parameters of each layer of the network, and data transmitted between networks.Blob data can also be stored on cpus and Gpus for synchronization.

2.Layer is the abstraction of various layers in neural network, including the convolution Layer and the lower sampling Layer, as well as the full connection Layer and various activation function layers. At the same time, each Layer implements forward propagation and back propagation, and transmits data through Blob.

3.Net is the representation of the entire network, which is composed of various layers connected before and after, and is also the network model constructed.

4.Solver defines the solution method for Net network model, which records the training process of the network, saves the parameters of the network model, interrupts and restores the training process of the network. Custom Solver can realize different network solution methods.

3. Installation method

CUDA, SNappy, LevelDB, gflags, Glog, Szip, LMDB, OpenCV, HDF5, BLAS, Boost, ProtoBuffer, etc.

Caffe’s official website: http://caffe.berkeleyvision.org/;

Caffe lot: https://github.com/BVLC/caffe; Caffe Installation Tutorial:



Caffe can be installed on a CPU or GPU. The GPU version requires a graphics card and CUDA

4. Use Caffe to build neural network

Build neural network flow chart

While Step 2 is the core operation and caffe’s biggest headache, Keras abstracts it at a higher level, allowing users to quickly write the model they want to implement.

(2) Tensorflow

1, concept,

TensorFlow is an open source software library that uses data flow diagrams for numerical calculations. The nodes in the graph represent mathematical operations, while the edges of the graph represent arrays of multidimensional data (also known as tensors) that pass between nodes. Flexible architectures allow computing to be deployed to one or more cpus or Gpus in a server or mobile device using a single API. Tensorflow related concepts are explained as follows:

1) Symbolic computing Symbolic computing first defines variables, and then establishes a “computation diagram”, which specifies the computational relationships between variables. Symbolic computation is also known as a data flow diagram. The process is shown in Figure 2-1 below, where data flows along black lines with arrows.

[Example of 2-1 Data Flow Diagram]

Data flow diagrams describe mathematical calculations using directed diagrams of “nodes” and “lines”. ① “node” is usually used to refer to the applied mathematical operation, but it can also refer to the beginning of a feed in/the end of a push out, or the end of a read/write in a persistent variable. ② “line” indicates the input/output relationship between nodes. ③ The multidimensional data array that flows on the line is called a “tensor”.

Tensor, you can think of it as a natural extension of vectors, matrices, for a wide range of data types, the orders of tensors are also called dimensions. A tensor of order 0, a scalar, is a number. Tensors of order 1, vectors, are an ordered set of numbers. A tensor of order 2, a matrix, is an ordered array of vectors. A tensor of order 3, a cube, is a set of matrices arranged up and down. And so on.

At present, there are two main ways to express tensors: ① Th mode or channels_first mode, which is used by Theano and caffe. Tf mode or channels_last mode. TensorFlow uses this mode.

Example to illustrate the difference between the two modes: For 100 RGB3 channel 16×32 (height 16 width 32) color graph, th representation: (100,3,16,32) TF representation: (100,16,32,3) The only difference is that the position of channel number 3 is different.

2. Module structure of Tensorflow

Tensorflow/core directory contains TF core module code, as shown in Figure 2-2:

[Figure 2-2 TensorFlow code module structure]

3. Installation method

1, the official website to download naconda installation: https://www.anaconda.com/download/;

1) Install Py3 + CMD: conda create -n py3.6 python=3.6 Anaconda; CMD: activate Py3.63) activate TSF preinstall CMD: conda create -n tensorflow python=3.6; Activate tensorflow; Install TSF: PIP install — ignore-installed — upgrade tensorflow; PIP install — ignore-installed — upgrade tensorflow-gpu; 5) Exit the virtual environment CMD: deactivate Py3.6.

4. Use Tensorflow to build neural network

Building a neural network using Tensorflow mainly includes the following 6 steps: 1) defining the function of adding neural layers; 2) Prepare training data; 3) Define the node to prepare to receive data; 4) Define the neural layer: the hidden layer and the prediction layer; 5) Define the Loss expression; 6) Optimizer is selected to minimize loss; 7) Initialize all variables and use sess.run Optimizer to iterate multiple times to learn.

5. Sample code

Tensorflow constructs a neural network to recognize handwritten numbers, and the specific code is shown as follows:

Import tensorflow as import numpy as np def add_layer(inputs, in_size, out_size, activation_function=None): # add one more layer and return the output of this layer Weights = tf.Variable(tf.random_normal([in_size, Biases = tf.Variable(tf.zeros([1, out_size]) + 0.1) Wx_plus_b = tf.matmul(inputs, Weights) + biases if activation_function is None: outputs = Wx_plus_b else: outputs = activation_function(Wx_plus_b) return outputs # 1. X_data = Np.linspace (-1,1,300)[:, NP.newaxis] noise = NP.random. Normal (0, 0.05, X_data.shape) y_data = np. Square (x_data) -0.5 + noise # 2. # define placeholder for inputs to network XS = tf.placeholder(tF.float32, [None, 1]) ys = tf.placeholder(tf.float32, [None, 1]) # 3. Defining the neural layer: # add hidden Layer input is xs, L1 = add_layer(xs, 1, 10, activation_function=tf.nn.relu) # add output layer Prediction = add_layer(L1, 10, 1, Activation_function =None) # 4 # The error between prediciton and real data loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys-prediction)),  reduction_indices=[1])) # 5. Select Optimizer to minimize Loss # This line defines how to reduce loss, Is more 0.1 train_step = tf. Train. GradientDescentOptimizer (0.1). Minimize # (loss) is the important step to initialize init = all variables Tf.initialize_all_variables () sess = tf.session () # There are no operations defined above until sess.run(init) # iterate 1000 times sess.run optimizerfor i in range(1000): Sess. run(train_step, feed_dict={xs: x_data, ys: {xs: x_data, y: {xs: x_data, y: {xs: x_data, y: {xs: x_data, y: {xs: x_data, y: {xs: x_data, y: {xs: x_data, y: {xs: x_data, y: y_data}) if i % 50 == 0: # to see the step improvement print(sess.run(loss, feed_dict={xs: x_data, ys: y_data}))Copy the code

(3) Keras

1, concept,

Keras is written by pure Python and based on Tensorflow, Theano and CNTK backend, equivalent to the upper interface of Tensorflow, Theano and CNTK, known as 10 lines of code to build a neural network, with simple operation, easy to use, rich documentation, easy environment configuration and other advantages. It simplifies the difficulty of writing neural network construction code. At present, there are fully connected network, convolutional neural network, RNN and LSTM algorithms encapsulated.

Keras has two types of models, Sequential and functional. Functional models are more widely used, and Sequential models are a special case of functional models.

1) Sequential model: single input and single output, with a single path to the bottom, and only adjacent relationships between layers, not inter-layer connections. This model is fast to compile and relatively simple to operate.

2) Functional Model (Model) : Multiple input and multiple output, arbitrary connection between layers. This model compiles slowly.

2. Module structure of Keras

Keras is mainly composed of five modules. The relationship between modules and functions of each module are shown in Figure 3-1:

[Figure 3-1 KerAS module structure diagram]

3. Installation method

Keras is installed in the following three steps: 1) Install Anaconda (Python); 2) The Python distribution for scientific computing supports Linux, Mac and Windows operating systems, provides package management and environment management functions, which can easily solve the problems of multiple python versions coexist, switch, and various third-party package installation; 3) Install numPY, KERAS, Pandas, tensorflow libraries using PIP or Conda; Download address: https://www.anaconda.com/what-is-anaconda/.

4. Use Keras to build a neural network

Using KerAS to build a neural network, including 5 steps, respectively for model selection, network layer construction, compilation, training and prediction. The KERAS module used in each step is shown in Figure 3-2:

[3-2 Steps of constructing neural network using KerAS]

5. Sample code

Kears builds a neural network to recognize handwritten numbers, and the specific code is as follows:

from keras.models import Sequential from keras.layers.core import Dense, Dropout, Activation from keras.optimizers import SGD from keras.datasets import mnist import numpy Select model "" model = Sequential()" second step: Model.add (Dense(500,input_shape=(784,))) 28*28=784 model. Add (Activation(' TANH ')) # The Activation function is TANH Model. Add (Dropout(0.5)) # Use the 50% Dropout model Hidden layer nodes 500 model.add(Activation(' TANh ')) model.add(Dropout(0.5)) model.add(Dense(10)) # Output results are 10 categories, So the dimension is 10 model.add(Activation('softmax')) # last layer uses softmax as Activation function. Step 3: Decay = 1E-6, Decay =1e-6, Momentum =0.9, nesterov=True Model.compile (Loss ='categorical_crossentropy', optimizer= SGD, Class_mode ='categorical') # use cross entropy as the loss function. Step 4: train. Fit some parameters batch_size: Group the total number of samples, the number of samples each group contains epochs: training times shuffle: Validation_split: What percentage is taken out for cross-validation verbose: screen display mode 0: no output 1: output progress 2: X_train, y_train, (X_test, Y_test) = mnist.load_data() # mist input data dimension (num, 28, 28) X_train = X_train. Shape [0] X_train.shape[1] * X_train.shape[2]) X_test = X_test.reshape(X_test.shape[0], X_test.shape[1] * X_test.shape[2]) Y_train = (numpy.arange(10) == y_train[:, None]).astype(int) Y_test = (numpy.arange(10) == y_test[:, None]). Astype (int) model. The fit (X_train Y_train, batch_size = 200, epochs = 50, shuffle = True, verbose = 0, validation_split = 0.3) Evaluate (X_test, Y_test, batch_size=200, verbose=0) evaluate(X_test, Y_test, batch_size=200, verbose=0) Evaluate (X_test,Y_test,batch_size=200,verbose=0) print("") print("The test loss is %f" % scores) result = model.predict(X_test,batch_size=200,verbose=0) result_max = numpy.argmax(result, axis = 1) test_max = numpy.argmax(Y_test, axis = 1) result_bool = numpy.equal(result_max, test_max) true_num = numpy.sum(result_bool) print("") print("The accuracy of the model is %f" % (true_num/len(result_bool)))Copy the code

(4) Comparison of advantages and disadvantages of the framework

Compare the dimensions Caffe Tensorflow Keras
To fit the difficulty 1, do not write code, only in the.prototxt file to define the network structure can be completed model training. 2. The installation is complicated, and designing the network structure inside the.prototxt file is relatively limited, which is not as convenient and free as designing the network structure in Python. 3. The configuration file cannot adjust the super parameters programmatically, and it cannot easily support the operations such as cross verification and Grid Search. 1, easy installation, rich teaching resources, according to the sample can quickly build the basic model. 2, there is a certain threshold of use. Both programming paradigms and mathematical statistics make it difficult for people without a machine learning or data science background to learn. 3. Because of its flexibility, it is a relatively low-level framework that requires a lot of code to reinvent the wheel. 1, simple installation, designed to allow users to carry out the fastest prototype experiment, so that the idea into the result of the shortest process, very suitable for the most cutting-edge research. 2, API is easy to use, users only need to put together the advanced modules, can design neural network, reduce the programming and reading other people’s code understanding overhead.
Framework of maintenance The GitHub project is maintained by the Berkeley Vision and Learning Center (BVLC). Defined as the most popular and recognized open source deep learning framework, the framework is well-structured, with production-grade high quality code, developed and maintained by the Google team, and supported by capabilities. Still developed and supported by the Google team, the API is packaged in TensorFlow as tf.keras; Microsoft maintains its CNTK backend; Amazon AWS is also developing MXNet support. Other supporting companies include NVIDIA, Uber, apple (via CoreML).
Support language C++/Cuda C++ python (Go, Java, Lua, Javascript, or R) Python
Encapsulate algorithms 1. The convolutional neural network CNN is well supported, with a large number of well-trained classical models (AlexNet, VGG, Inception) and state-of-the-art models (ResNet, etc.) collected in Model Zoo. 2. The time series RNN and LSTM are not fully supported 1. Support CNN and RNN, as well as deep reinforcement learning and other computationally intensive scientific calculations (such as solving partial differential equations). 2. Computational graphs must be built as static graphs, which makes many calculations difficult to implement, especially beam search, which is often used in sequence prediction. 1, support CNN and cyclic network, support cascade model or any graph structure model, from CPU calculation to GPU acceleration without any code changes. 2, there is no enhanced learning toolbox, it is very troublesome to modify the implementation. The package is too advanced, training details cannot be changed, penalty details are difficult to change.
Deployment model 1. The program runs stably and the code quality is high, which is suitable for the production environment with strict stability requirements. It is the first mainstream industrial-level deep learning framework. Caffe’s underlying software is C++, which can be compiled on various hardware environments with good portability. Caffe supports Linux, Mac and Windows, and can be compiled and deployed on mobile devices such as Android and iOS. 1, good performance, can run multiple large-scale deep learning models at the same time, support model life cycle management, algorithm experiments, and efficient use of GPU resources, so that trained models can be put into the actual production environment more quickly and easily. 2. Flexible portability, the same code can be easily deployed to PC, server or mobile device with any number of cpus or Gpus almost without modification. 1. Easy to deploy, using TensorFlow, CNTK and Theano as the back end, simplifying the complexity of programming and saving the time of trying new network structures. 2. The more complex the model, the greater the benefit, especially in the model highly dependent on weight sharing, multi-model combination, multi-task learning and other models, the performance is very prominent.
performance Currently, only single-gpu training is supported, not distributed training. Support distributed computing, so that the Tensor Processing Unit (TPU) or GPU cluster can compute in parallel and train a model together. The communication between different devices is not very well optimized, and the distributed performance has not reached the optimal Multiple Gpus cannot be used directly, and the processing speed for large-scale data is not as fast as other frameworks that support multiple Gpus and distributed data. The speed of TensorFLow Backend is much slower than that of TensorFLow backend.

