A list,
CNN has been applied in the following fields to varying degrees: image processing field (the most important application field) — image recognition and object recognition, image labeling, image theme generation, image content generation, object labeling, etc. Video processing field — Video classification, video standards, video prediction and other natural language processing (NLP) field — conversation generation, text generation, machine translation and other aspects — robot control, game, parameter control and so on
2. Network structure of CNN
2.1 Traditional neural network
The figure above shows the structure of traditional neural network, which is a fully connected structure, which makes parameter training more difficult. There are also possible phenomena of gradient explosion and gradient disappearance in BP solution. In addition, local minima are common in cost functions of non-convex targets in deep structures (involving multiple layers of nonlinear processing units), which is the main source of training difficulties. These factors have caused the inapplicability of the traditional neural network, so it is not widely used.
2.2 Convolutional Neural Networks (CNN)
The network structure of CNN is shown in the figure above. CNN can effectively reduce the complexity of feedback neural network (traditional neural network). Common CNN structures include Lenet-5, AlexNet, ZFNet, VGGNet, GoogleNet, ResNet and so on. Among them in LVSVRC2015 champion ResNet is AlexNet more than 20 times, is VGGNet 8 times; In terms of these structures, one direction of CNN development is the increase of layers. In this way, the approximate structure of the objective function can be obtained by using the increased nonlinearity, and better feature expression can be obtained at the same time. However, this way leads to the increase of the overall complexity of the network, making the network more difficult to optimize and easy to over-fit.
Similarities and differences between the network structure of CNN and that of traditional neural network are as follows:
(1) CNN mainly has data input Layer, convolutional Layer, RELU excitation Layer, pooling Layer, full connection Layer, and Batch Normalization Layer(which does not necessarily exist). The traditional neural network consists of data input layer, one or more hidden layers and data output layer. By comparison, it can be found that CNN still uses the hierarchical structure of traditional neural network (2) Each layer of CNN has different functions, while each layer of traditional neural network performs linear regression on the features of the previous layer and then performs nonlinear transformation operations (3) CNN uses RELU as activation function (excitation function). The traditional neural network uses sigmoID function as the activation function. (4) The pooling layer of CNN realizes the function of data dimension reduction and extracts high-frequency information of data. (5) CNN is mainly used in image classification and object recognition and other application scenarios
CNN maintains the hierarchical network structure, and different levels use different form (operation) and function data Input layers: Input Layer Convolution computing Layer: CONV Layer ReLU Incentive Layer: ReLU Incentive Layer Pooling Layer Pooling Layer: FC Layer Note: Batch Normalization Layer (possible)
2.2.1 Input Layer Like neural network/machine learning, Input data needs to be preprocessed for the following reasons: Input data units are different, which may lead to slow convergence speed and long training time of neural network. Input with a large data range may play a larger role in pattern classification, while input with a small data range may play a smaller role. Since activation functions in neural network are limited by range, Therefore, it is necessary to map the target data of network training to the range of activation function. The s-shaped activation function is flat outside the interval of (0,1), and the differentiation degree is too small. For example, the s-shaped function f(X), f(100) and F (5) differ only by 0.0067. Common data preprocessing methods are as follows: (1) Mean processing — subtracting the mean of each feature of a given data (centralizing the data set to 0) (2) Normalization operation — dividing by the variance of this feature on the basis of mean processing (normalizing the amplitude of each dimension of the data set to the same range) (3) PCA dimensionality reduction — The high-dimensional data set is projected onto the low-dimensional coordinate axis, and the projected data set is required to have the maximum variance (the correlation between features is removed for obtaining low-frequency information).(4) Whiteness — On the basis of PCA, the amplitude on each feature axis of the transformed data is normalized. Used to get the high frequency information. (5) ufldl.stanford.edu/wiki/index….
x = x – np.mean(x, 0) x = (x – np.mean(x, 0)) / np.std(x, 0)
X -= np. Mean (x, axis=0) #
Cov = Np.dot (x.T, x)/X.shape [0] # Calculate covariance
U, S, v = Np.linalg.svD (COV) # SVD decomposition
xrot = np.dot(x, u)
X = np.dot(x, u[:, :2]) # calculate pca
X = xrot/Np.sqRT (s + 1E-5) # whitening
Note: Although we have introduced the operations of PCA de-correlation and whitening, in fact, in convolutional neural networks, PCA and whitening operations are generally not applicable. In general, de-mean and normalization are often used.
Suggestion: Preprocessing data features, mean removal and normalization
2.2.2 CONV Layer
This layer is the most important layer of convolutional neural network, and it is also the origin of the name “convolutional neural network”.
In the process of identifying image of the brain, can deal with the different aspects by different cortical layer of data, such as: color, shape, light from the darkness, and the processing results of different cortex to merge mapping operations, it is concluded that the end result value, the first part is essentially a local observations, the second part is the result of the merger of a whole.
In addition, given a picture, the human eye tends to focus first on the important points (parts) and then on the whole picture. Local perception is to divide the whole image into several small Windows that can have local overlap, and to recognize local features of the image by sliding window method. Or you could say that each neuron only connects to some of the neurons in the next layer, and only senses parts of the image, not the whole image.
Image recognition based on the human brain processes, we can think of image space contact and local pixel contact more closely, and more distant pixel correlation is weak, so no need for each neuron global image perception, as long as the awareness and localized in the higher level of local information from the global information synthetically operation; Local perception.
Local association: Each neuron is treated as a filter
The window (receptive field) slides, and filter calculates local data
Related concepts: depth: depth, stride: stride, padding value: zero-padding
CONV process reference:Cs231n. Making. IO/assets/conv…
A data input, let’s say an RGB image
In neural networks, the input is a vector, but in convolutional neural networks, the input is a multi-channel image (e.g., 3 channels in this example)
- Local perception in the calculation, the image is divided into a region for calculation/consideration; So why is it possible to use local perception? We find that the closer the pixels are, the stronger the correlation is, and vice versa. Therefore, we choose to carry out local perception first, and then synthesize these local information at a higher level (FC layer) to obtain global information.
- Parameter sharing mechanism the so-called parameter sharing mechanism is that the same neuron uses a fixed convolution kernel to deconvolution the whole image. It can also be considered that a neuron only focuses on one feature. Different neurons focus on many different features (each neuron can be considered a filter).
- Overlap of sliding Windows The overlap of sliding Windows is the partial overlap of adjacent Windows in the process of sliding Windows, which is mainly to ensure the smoothness of the edges between each window after image processing. Features that reduce edge roughness between Windows. By fixing the connection weight of each neuron, the neuron can be regarded as a template. In other words, if each neuron focuses on one feature, the number of weights to calculate will be greatly reduced
4)) convolution calculation
The calculation of convolution is: the product of fixed convolution kernel matrix and window matrix of each neuron (multiplied by the corresponding position), and then the sum plus the value of bias term B can obtain the value representing the feature concerned by the neuron in the current image window.
As shown in Figure 2.4, the process of convolution calculation can be seen. Click here to view the GIF.
2.2.3 RELU excitation layer
This layer is the activation layer, and RELU function is generally used as the activation function in CNN. Its main function is to do nonlinear mapping of the output results of the convolution layer.
- Several common activation functions
Activation functions Sigmoid, TANH, ReLU, ReLU deformation and Maxout
Sigmoid function (S function)
Tanh of 2S.
Linear correction unit RELU function — — — – > – > Max {0, x} = = > no boundaries, easy appear ‘dead neurons’
Leaky ReLU function – > If x> 0, x is output. If x < 0, alphaX, where 0< alpha <1 ==> improvement of RELU
ELU function — > if x> 0, output x; If x < 0, alpha(e^ x-1), where 0< alpha <1 ==> is also an improvement on RELU
The Maxout function — > adds an activation layer - Some suggestions for activation functions
Generally, the sigmoID function should not be used as the activation function of CNN. If used, it can be used at the FC layer.
RELU is preferred as the activation function, because the iteration speed is fast, but the effect may not be good
If 2 fails, please Leaky ReLU or Maxout and the general situation will be resolved
In rare cases, TANH works well
2.2.4 Poling Layer
The pooling layer exists in the middle of the continuous convolution layer. Its main function is to reduce the number of parameters and the calculation in the network by gradually reducing the spatial size of the representation. The pooling layer operates independently on each feature map. Pooling layers can be used to compress the amount of data and parameters and reduce overfitting. In short, if the input is an image, the primary purpose of the pooling layer is to compress the image.
The data compression strategies in the pooling layer mainly include:
Max Pooling — > select the maximum value in each small window as required feature pixels (omit non-important feature pixels)
Average Pooling — > Select the Average value in each small window as the required feature pixel
The selection of important feature points in the pooling layer can reduce the dimension and prevent over-fitting to a certain extent.
2.2.5 The FC fully connected layer is similar to the structure of traditional neural network. Neurons in the FC layer are connected to all activation outputs of previous layers. In other words, all the neurons between the two layers have weighted connections; Normally, in CNN, the FC layer only appears in the tail
Through the fully connected structure, the previous output features are recombined into a complete image.
The general CNN structure is as follows:
INPUT
[[CONV -> RELU] * N -> POOL?]M
[FC -> RELU] * K
FC
2.2.6 Batch Normalization Layer(mostly used after the convolutional Layer, which makes the expected results follow the Gaussian distribution)
The Batch Normalization Layer(BN Layer) expects our results to follow the Gaussian distribution, so the output of the neurons is modified, usually after the convolution Layer and before the pooling Layer.
Thesis: Batch Exploratory: Accelerating Deep Network Training by Reducing Internal Covariate Shift;
Paper Links:Arxiv.org/pdf/1502.03…
If the output is NThe result of D, solve the mean and variance for each of D dimensions.
Normalized by means and variance.
There may be some problems in forced normalization, eg: variance is 0, etc
Advantages of Batch Normalization:
Gradient transfer (calculation) is smoother and less prone to neuron saturation (preventing gradient disappearance (gradient dispersion)/ gradient
The explosion)
The learning rate can be set higher
Less dependence on initial values
Disadvantages of Batch Normalization:
If the network layer is deep and BN layer is added, the model training speed may be slow.
Note: BN Layer caution!!
3 Advantages and disadvantages of CNN ① It uses local perception and parameter sharing mechanism (shared convolution kernel), which has high processing ability for large data sets. There is no pressure in the processing of high-dimensional data, ② it can extract the deep information of the image, and the model expression effect is good. ③ There is no need for manual feature selection, as long as the convolution kernel W and bias term B are trained, the eigenvalues can be obtained.
Disadvantages ① It is recommended to use GPU for model training because it takes a long time to adjust parameters and requires a large number of samples. ② The physical meaning is unclear and the results in each layer cannot be explained, which is also a common shortcoming of neural network.
Ii. Source code
The function main CLC close all % create face detection object faceDetector = vision. The CascadeObjectDetector; % Face Detection FaceRecognition(faceDetector); Function I = SelectPicture() [FileName,PathName] = uigetfile('*.jpg'.'Select a picture');
if isequal(FileName,0)
disp('No image selected, please re-select! ')
I = [];
elseI = imread(fullfile(PathName,FileName)); Function [I_faces, bbox] = GetFaces(faceDetector, I) % Bbox = step(faceDetector, I); Creates a shape insert object to draw the result of the detection circled by the borderif size(I, 3) = =1% grayscale image, insert white or black boxif mean(I(:)) > 128% image light, use black box shapeInserter = vision.shapeinserter ();else% image is dark, use white box shapeInserter = vision.shapeinserter ('BorderColor'.'White');
end
else% color image, insert red box shapeInserter = vision.shapeinserter ('BorderColor'.'Custom'.'CustomBorderColor'[255 0 0]); End % Draws the border to circle the result I_faces = step(shapeInserter, I, int32(bbox)); End %% Image face detectionfunction FaceRecognition(faceDetector)% Mouse click response functionBtnDownFcn(h, evt)
FaceRecognition(faceDetector); End % select file I = SelectPicture();if isempty(I)
returnEnd % Face detection [I_faces, bbox]= GetFaces(faceDetector, I); Figure 1 = figure; pos1 = get(fig1,'Position');
set(fig1,'Position'[10 pos1(2:4)]);
set(fig1,'WindowButtonDownFcn',@BtnDownFcn); % show figure(fig1) imshow(I_faces) title('Click on this image to select another image to identify')
for i = 1:size(bbox, 1)
text(bbox(i, 1), bbox(i, 2), mat2str(i), 'color'.'r') end % intBbox = int32(bbox);for i = 1:size(intbbox, 1)
xs = intbbox(i, 1);
xe = xs + intbbox(i,3);
ys = intbbox(i, 2);
ye = ys + intbbox(i,4); % createfigure
if rem(i, 16) = =1
fig2 = figure; %#ok
end
function varargout = FaceSystem(varargin)
% FACESYSTEM MATLAB code for FaceSystem.fig
% FACESYSTEM, by itself, creates a new FACESYSTEM or raises the existing
% singleton*.
%
% H = FACESYSTEM returns the handle to a new FACESYSTEM or the handle to
% the existing singleton*.
%
% FACESYSTEM('CALLBACK',hObject,eventData,handles,...) calls the local
% function named CALLBACK in FACESYSTEM.M with the given input arguments.
%
% FACESYSTEM('Property'.'Value',...). creates anew FACESYSTEM or raises the
% existing singleton*. Starting from the left, property value pairs are
% applied to the GUI before FaceSystem_OpeningFcn gets called. An
% unrecognized property name or invalid value makes property application
% stop. All inputs are passed to FaceSystem_OpeningFcn via varargin.
%
% *See GUI Options on GUIDE's Tools menu. Choose "GUI allows only one % instance to run (singleton)".
%
% See also: GUIDE, GUIDATA, GUIHANDLES
% Edit the above text to modify the response to help FaceSystem
% Last Modified by GUIDE v2. 5 20-Apr- 2018. 19:18:59
% Begin initialization code - DO NOT EDIT
gui_Singleton = 1;
gui_State = struct('gui_Name', mfilename, ...
'gui_Singleton', gui_Singleton, ...
'gui_OpeningFcn', @FaceSystem_OpeningFcn, ...
'gui_OutputFcn', @FaceSystem_OutputFcn, ...
'gui_LayoutFcn', [],...'gui_Callback'[]);if nargin && ischar(varargin{1})
gui_State.gui_Callback = str2func(varargin{1});
end
if nargout
[varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});
else
gui_mainfcn(gui_State, varargin{:});
end
% End initialization code - DO NOT EDIT
% --- Executes just before FaceSystem is made visible.
function FaceSystem_OpeningFcn(hObject, eventdata, handles, varargin)
% This function has no output args, see OutputFcn.
% hObject handle to figure
% eventdata reserved - to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
% varargin command line arguments to FaceSystem (see VARARGIN)
% Choose default command line output for FaceSystem
handles.output = hObject;
% Update handles structure
guidata(hObject, handles);
% UIWAIT makes FaceSystem wait for user response (see UIRESUME)
% uiwait(handles.figure1);
% --- Outputs from this function are returned to the command line.
function varargout = FaceSystem_OutputFcn(hObject, eventdata, handles)
% varargout cell array for returning output args (see VARARGOUT);
% hObject handle to figure
% eventdata reserved - to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
% Get default command line output from handles structure
varargout{1} = handles.output;
% --- Executes on button press in pushbutton1.
function pushbutton1_Callback(hObject, eventdata, handles)
% hObject handle to pushbutton1 (see GCBO)
% eventdata reserved - to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
global str;
global a0;
[filename,pathname]=...
uigetfile({'*.jpg';'*.bmp';'*.gif'},'choose');
str=[pathname filename]
if str~=0; a0=imread(str); % h=waitbar(0.'Pleast waiting, reading... ');
%*********
axes(handles.axes1);
axis off
imshow(a0);
title('Original image')
waitbar(1,h,'finish');
pause(0.05);
delete(h);
end
% hObject handle to pushbutton1 (see GCBO)
% eventdata reserved - to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
% --- Executes on button press in pushbutton2.
function pushbutton2_Callback(hObject, eventdata, handles)
global a0;
global dets
global i_face;
global im
i_face=0;
faceDetector = vision.CascadeObjectDetector;
[im, dets] = GetFaces(faceDetector, a0);
DisplayDetections(im, dets);
% --- Executes on button press in pushbutton3.
function pushbutton3_Callback(hObject, eventdata, handles)
% hObject handle to pushbutton3 (see GCBO)
% eventdata reserved - to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
global dets
global i_face;
global a0;
global A_face;
[M,N]=size(dets);
if (i_face>0)&(i_face<=M)
i_face=i_face- 1;
i=i_face
A_face=a0(dets(i,2):(dets(i,2)+dets(i,4)),dets(i,1):(dets(i,1)+dets(i,3))); axes(handles.axes2);axis off
imshow(A_face);
title('Faces to be recognized');
end
% --- Executes on button press in pushbutton4.
function pushbutton4_Callback(hObject, eventdata, handles)
% hObject handle to pushbutton4 (see GCBO)
% eventdata reserved - to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
global dets
global a0;
global i_face;
global A_face;
[M,N]=size(dets);
if (i_face>=0)&(i_face<M)
i_face=i_face+1;
i=i_face
A_face=a0(dets(i,2):(dets(i,2)+dets(i,4)),dets(i,1):(dets(i,1)+dets(i,3))); axes(handles.axes2);axis off
imshow(A_face);
title('Faces to be recognized');
end
Copy the code
3. Operation results
Fourth, note
Version: 2014 a