The sample code hosted on: www.github.com/dashnowords…
Blog park address: “Da Shi lives in big front” original blog directory
[TOC]
TensorFlow is an open source machine learning framework from Google that provides extended solutions for browsers, mobile devices, IOT devices, and large production environments. Tensorflow. js is an extension of the JavaScript language version. Front-end developers can implement deep learning directly in the browser environment, which readers who have tried to configure the environment will know what that means. The browser environment has a natural advantage in building interactive applications, while the end-to-end machine learning can not only share some of the computing burden in the cloud, but also provide better privacy. Meanwhile, node.js can continue to use JavaScript on the server side, which is very friendly for front-end developers. In addition to providing a unified style of terminology and apis, different extensions of TensorFlow can reuse models through transfer learning (many well-known deep learning models can be found in Python source code), or customize their own deep neural networks from pre-trained models. To get you up to speed, the TensorFlow website offers tutorials on JavaScript versions, tutorials, and out-of-the-box pre-training models to help you understand deep learning. For those who are interested in Deep Learning, it is recommended to read The book Neural Networks and Deep Learning by Michael Nielsen, an American quantum physicist. It clearly explains the basic process and principles of Deep Learning.
Tensorflow.js
The Tensor is the basic data structure of TensorFlow. It’s the extension of vectors and matrices to higher dimensions, and from a programming point of view, the core data is nothing more than a multidimensional array. Maybe you remember the Tensor class Vector2 that you defined in Particle Animation for vector calculation, it can actually be seen as a simplified form of Tensor in two dimensions. The Tensor data type makes it easy to construct tensors of various dimensions. It supports structural operations such as slicing, deformation, merging and splitting. It also defines operators for linear algebra. So the transmission of information in the neural network comes through the Flow of the Tensor. At Google I/O 2018, the tensorflow. js team of engineers introduced the framework’s layered architecture. In addition to the bottom layer, which addresses differences in programming language and platform, the framework is designed to better support developers of different work nature. Tensorflow.js also provides two different apis at the application layer: The high-level API is called the Keras API (Keras is an open source artificial neural network library written in Python) or the Layer API. It is used by software and application developers to quickly build, train, evaluate, and apply deep learning models. Lower-level apis, also known as Core apis, are often used to enable researchers to customize neural networks in more low-level detail and are more difficult to use.
The work of Tensorflow. js is still carried out around neural network, and the basic working process includes the following typical steps:
Below we’ll take a look at the data fitting example provided on the tensorflow.js website.
The Define stage is the first step to use tensorflow. js. In this stage, you need to initialize the neural network model. In this way, complex models such as tensor deformation processing, convolutional neural network and cyclic neural network can be realized. When the built-in model cannot meet the requirements, the model layer can be customized. The high-level API of TensorFlow can help developers to complete the construction of neural network structure with declarative coding, and the sample code is as follows:
/* Create model */
function createModel() {
const model = tf.sequential();
model.add(tf.layers.dense({inputShape: [1].units: 1.useBias: true}));
model.add(tf.layers.dense({units: 1.useBias: true}));
return model;
}
Copy the code
The Compile stage requires some preset parameters for the training process. You can review the working process of BP neural network introduced in the previous chapter and then understand the following sample code:
model.compile({
optimizer: tf.train.adam(),
loss: tf.losses.meanSquaredError,
metrics: ['mse']});Copy the code
Loss (loss) used to define the loss function, it is the actual output and the desired output of neural network between the standard deviation of quantitative evaluation, the most commonly used loss loss function is the mean square error (tf) losses) meanSquaredError), other loss function can be view in TensorFlow API documentation; An optimizer is an algorithm used by a neural network to adjust its weight after the error backpropagation has finished. The purpose of weight adjustment is to make the loss function reach a minimum value, so the idea of “gradient descent” is usually used for approximation. The gradient direction refers to the direction in which the function changes most significantly at a certain point, but the actual situation is often not so simple. Suppose the following figure is the loss function curve of a neural network:
It can be seen that the shape of the loss function, the position of the initial parameters and the step size of the optimization process may affect the training process and training results. Therefore, the optimization algorithm needs to be specified in the Optimizer configuration item to achieve better training results. The metrics configuration item is used to specify the metrics for the model, and in most cases the loss function can be used directly as the metric.
Fit phase performs the work of model training (Fit itself means fitting), and the training cycle can be started by calling the Fit method of the model. The official example code is as follows (the parameters received by the Fit method are input tensor set, output tensor set and configuration parameters respectively) :
const batchSize = 32;
const epochs = 50;
await model.fit(inputs, labels, {
batchSize,
epochs,
shuffle: true.callbacks: tfvis.show.fitCallbacks(
{ name: 'Training Performance'},'loss'.'mse'] and {height: 200.callbacks: ['onEpochEnd']})});Copy the code
Related parameters are described as follows (for other parameters, refer to the official development documents) :
-
BatchSize (batchSize) indicates the number of samples used in each loop. The value ranges from 32 to 512
-
Epochs specifies the total number of cycles for the data across the entire training set
-
Shuffle refers to whether training samples are shuffled in each EPOchs
-
Callbacks specify callbacks during training
The training of neural network is carried out in cycles. Assuming that the total training sample size is 320, the training process described in the sample code above is as follows: First, the neural network was trained with samples with subscripts 0~31, then the weight was updated with optimizer, and then the weight was updated with samples with subscripts 32~63 until all the data in the total sample had been used once. The above process was called an epoch, and then the order of the whole training samples was disrupted. In 50 more rounds, the callbacks callback parameter is directly associated with the TFVIS library, a dedicated visualization tool module provided by TensorFlow.
The Evaluate phase requires the evaluation of the training results of the model. The Evaluate method of the model instance can be called to obtain the values of the loss function and metrics using the test data. You may have noticed that TensorFlow pays more attention to how to use sample data when customizing the training process, and does not take “measurement less than a given threshold” as a condition for training termination (for example, the errorthRESH parameter can be set in brain-.js). In the construction and design of complex neural networks, Developers are likely to need to build informal training test, measure the final is not necessarily can reduce below a given threshold, as training termination conditions is likely to make the training process in an infinite loop, so use fixed number of training with visual tools to observe the training process is more reasonable.
Predict stage is the stage of using neural network model for prediction, which is also the part with the highest participation of front-end engineers. After all, the results of model output are only data. How to use these prediction results to make some more interesting or intelligent applications may be the problem that front-end engineers should pay more attention to. As can be seen from the previous process, the capabilities provided by Tensorflow. js are based on the neural network model, and it is difficult to use the application layer directly. Developers usually need to rely on the pre-training model provided by the official model repository or use other third-party applications built on tensorflow. js. For example, the face recognition framework face-api.js (which can realize fast face tracking and identity recognition in the browser and Node.js), the machine learning framework ML5.js (which can directly call API to realize more specific tasks such as image classification, posture estimation, character matting, style migration and object recognition), Handtrack.js for hand tracking, etc. If TensorFlow is too obscure for you, try building some interesting applications using these higher-level frameworks first.
2. Use Tensorflow. js to construct convolutional neural network
Convolutional neural network
Convolutional Neural Networks (CNN) is a deep learning model that is widely used in the field of computational vision, and it has excellent performance in processing images and other data with meshlike features. In information processing, the convolutional neural network will keep the pixels of space structure, mathematical calculations through multiple layers for feature extraction, and then converts the signal feature vector to its access to the traditional neural network structure, the eigenvectors corresponding to feature extraction of image is smaller when provided to the traditional neural network, The number of parameters you need to train is also reduced. The basic working principle of convolutional neural network is shown as follows (the number of layers in the figure is not fixed) :
In order to understand the work flow of convolutional networks, it is necessary to understand the meanings of the terms convolution and pooling.
Convolution layer need to convolution computation of input information, it USES a grid of window area (also known as convolution kernels or filter) to traverse the input image processing, filter each window units are usually has its own weight, starting at the upper left of the input image, the weight and the window covering the area of numerical multiplication and obtain a new results after accumulation, Then slide the filter window to the right a fixed distance (usually 1 pixel) and repeat the process. When the right side of the filter window coincides with the right edge of the input image, move the window down the same distance and repeat the process from left to right again. New row and column data can be obtained until all regions are traversed. Each time a different filter is applied to the input image, an output will be added to the convolutional layer. In real deep networks, multiple filters may be used, so it is common to see multiple layered images in the convolutional layer in the schematic diagram of the convolutional neural network. It is not difficult to calculate that for an image whose input size is MM, the unilateral size of the new image is M-N+1 after processing with NN filter. For example, a gray scale image whose input size is 88 can be obtained by convolution calculation with the filter 33, as shown below:
Different filter can recognize the image in the different characteristics of tiny filters in the image above, for example, for a pure color area, the size of 33 convolution computation results are 0, assuming that there is now a white with black border on, then the filter in the upper of the results will be very small, and a line of the middle and one line below the results are close to zero, The cumulative result of the convolution calculation will also be mapped to a small negative number, which is equivalent to the filter recording the typical features in a 33 region in 1 pixel, thus achieving the purpose of feature extraction. Obviously, if the filter above is rotated by 90°, it can be used to identify the vertical boundary in the image. Since the convolution calculation will reduce the features in a region to a point, the output information of the convolution layer is also called feature mapping map. In the code warehouse of this chapter, the author implemented a simple convolution calculation program based on Canvas. You can modify the parameters of the filter in the source code to observe the processed image, which is just like adding various interesting filters to the image:
The above figure shows the effects of horizontal edge detection, vertical edge detection and oblique edge detection respectively.
Consider the pooling layer (also known as the blending layer, merging layer, or downsampling layer), which is usually used immediately after the convolution layer. The value of the adjacent pixels in the image is usually close to, which causes the convolution layer of output to produce a large number of information redundancy, such as a horizontal edge in convolution layer around the pixel edge may be detected level, but the fact is that they represent the same characteristics of the original image, pooling layer is to simplify convolution is the purpose of the output of the information, Each unit it outputs can be considered to summarize the characteristics of a region in the previous layer. The commonly used maximum pooling layer is to select a maximum value within the region as the mapping of the entire region in the pooling layer (this is not the only pooling calculation method). Assuming that the convolution layer output of 66 in the previous example is followed by a maximum pooling layer that uses a 22-size window for region mapping, a 3*3 image output will eventually be obtained, as shown in the following figure:
It can be seen that, without considering the influence of depth, the 8*8 input image in the example has become 3*3 after being processed by the convolution layer and pooling layer. For the subsequent fully connected neural network, the number of input features has been greatly reduced. In this chapter, the code warehouse also provides a visual example of image changes after “convolution layer + maximum pooling layer” processing. The intuitive effect is actually image scaling. It can be seen that the scaled image still retains the typical features before pooling:
When analyzing complex images, the “convolution + pooling” mode may be connected in series many times in the network so that features can be extracted step by step from images. In the actual development process, in order to solve specific computational vision problems, developers may need to consult relevant academic papers and build relevant deep learning networks by themselves, which are usually represented by very concise symbols. In the next section, we will take the classic Lenet-5 model as an example to learn relevant knowledge.
Build lenET-5 model
Lenet-5 is an efficient convolutional neural network model that is introduced in almost all tutorials on MNIST handwritten digital image recognition, Lenet-5 was proposed in the paper gradient-based Learning Applied to Document Recognition, and the schematic diagram of the structure given in the paper is as follows:
It can be seen that there are 7 layers in the model, and their meanings and related explanations are shown in the following table:
The serial number | category | tag | details |
---|---|---|---|
/ | The input layer | INPUT 32X32 | Enter a 32×32 pixel image |
C1 | Convolution layer | C1:feature maps 6@28×28 | Convolution layer, 6 output feature maps, each size 28×28 (convolution kernel size 5×5) |
S2 | Pooling layer | S2:f.maps6@14×14 | Pool layer, downsampling the output of the previous layer, and output 6 feature mapping maps, each size 14×14 (downsampling window size 2×2) |
C3 | Convolution layer | C3:f.maps16@10×10 | Convolution layer, 16 output feature maps, each size is 10×10 (convolution kernel size is 5×5) |
S4 | Pooling layer | S4:f.maps16@5×5 | Pooling layer, down-sampling the output of the previous layer, output 16 feature mapping maps, each size 5×5 (down-sampling window size 2×2) |
C5 | Convolution layer | C5:layer 120 | Convolution layer, 120 output feature maps, each size 1×1 (convolution kernel size 5×5) |
F6 | The connection layer | F6:layer 84 | Full connection layer, using 84 neurons |
/ | Output layer | OUTPUT 10 | Output layer, 10 nodes, representing 0 to 9 a total of 10 numbers |
When completing similar image classification tasks, the constructed convolutional neural network does not need to be completely consistent with Lenet-5 model, but only needs to be fine-tuned or extended according to actual needs. For example, in the official tutorial of “Using CNN to recognize handwritten numbers” of Tensorflow. js, 8 convolutional kernels are used in C1 layer. The entire F6 full connection layer was removed, but still good recognition rates were achieved. The layers API provided by tensorflow. js makes it easy to generate custom convolutional and pooling layers. The sample code is as follows:
model = tf.sequential();
// Add layer C1 in LENet-5
model.add(tf.layers.conv2d({
inputShape: [32.32.1].// Input the shape of the tensor
kernelSize: 5.// Convolution kernel size
filters: 6.// The number of convolution kernels
strides: 1.// Convolution kernel move step
activation: 'relu'.// Activate the function
kernelInitializer: 'varianceScaling' // Convolution kernel weight initialization mode
}));
// Generate layer S2 in LENet-5
model.add(tf.layers.maxPooling2d({
poolSize: [2.2].// Sliding window size
strides: [2.2]// Slide the window to move the step
}));
Copy the code
The sample code provided in the official tutorial visualizes the training process using the TFJS-VIS library. You can clearly see the structure of the neural network, the changes of metrics during the training, and the summary of the predicted results of the test data:
Speech command recognition based on transfer learning
Usually have millions of complex deep learning model parameters, even be able to set up the neural network, small and medium-sized developers didn’t have enough data and machine resources to start training it, it requires the developers will have been trained in related tasks to the new model, the model reuse of reducing deep learning model building and training of natural barriers, Let more application layer development this can be involved.
Transfer learning is when A model trained with dataset A is used to solve tasks related to another dataset B, which usually requires some adjustments to the model and retraining it with dataset B. Fortunately, with the training results based on data set A, the number of new samples and training time required to retrain the model will be significantly reduced. Adjust the training model is the basic method of the output layer to replace it for their needs in the form of, and leave the other feature extraction part of the network, for the same type of task, be retained part still can complete the task of feature extraction, and classification of similar signals are analyzed, but if the characteristics of the data set and data set B difference is too big, The new model may still fail to achieve the desired effect, so it is necessary to carry out more customization and transformation of the pre-training model (such as adjusting the number or parameters of the convolutional layer and pooling layer in the convolutional neural network). Relevant theories and methods will not be expanded in this chapter. The pre-training model officially provided by Tensorflow. js can realize image classification, object detection, posture estimation, face tracking, text malicious detection, sentence coding, voice command recognition and other very rich functions. In this section, the “voice command recognition” function is taken as an example to understand the technologies related to transfer learning.
The official speech recognition model of Tensorflow.js is speech-commands, which can be classified for one-second audio snippets at a time. It has been trained with nearly 50,000 voice samples, and when used directly, it can recognize English phonetic numbers (zero ~ nine), directions (up, down, Right, left) and some simple instructions (such as yes, no, etc.). On the basis of this pre-training model, it can be transformed into a Chinese instruction recognizer with a small number of new samples, isn’t it convenient? When an audio signal processing, will first through the fast Fourier transform to transform it into frequency domain signal, and then extract the features of it into deep learning network is analyzed, for the use of simple instructions scene, only need to categorize several voice instruction is ok, don’t need computer languages or real semantic analysis, Therefore, an English instruction recognizer can be easily transformed into a Chinese instruction recognition tool. The essence of speech command function is to classify short speech. For example, in training, the “left” sound fragment is marked as “right”, and the trained neural network will classify it as “right” when it hears “left”. The basic steps of transferring learning using pre-training model speech-Command are as follows:
Official with the extension of the library will be enveloped, specific implementation provided to developers of application layer API is very easy to use, this chapter provides a complete example of code storehouse, you can use to collect his voice samples to generate the Chinese instruction, and transfer of training model again, and try to use it to control “the pac-man” the role of the game:
Recommended course
-
Speech h.EE.ntu.edu.W /~ TLkagk /ind…
-
Wu En of “machine learning” online tutorials (address: www.coursera.org/learn/machi…
-
MIT 6.S191 Introduction to Deep Learning (introtodeeplearning.com/)
-
Stanford CS231. N the convolutional neural network and the computational vision (http://cs231n.stanford.edu/)