I. Introduction to LBP+SVM

The human face is a very noticeable external feature. In interpersonal communication, people can identify some mental activities of objects without contact with each other, and various expressions can replace language to express their inner feelings in many cases. People can not only express their emotions through facial expressions, but also distinguish the psychology and attitude of the communicator. According to Mehrabian, body language accounts for the largest proportion of people’s communication. Real-time video facial expression recognition is a research topic with great application value.

The complexity and delicacy of human facial expressions increase the computational complexity, and algorithms with large computational load cannot be applied to real-time video processing. Therefore, it becomes a research difficulty to choose which algorithm to extract which features to enhance timeliness. In view of this problem, scholars at home and abroad have carried out research. For example, the expression recognition method based on Gabor wavelet transform and LBP(Local Binary Pattern) feature has simple structure and easy implementation, but the recognition rate needs to be improved. It is proposed that Gabor motion energy filter combined with time-domain information and SVM can achieve facial expression classification, and its effect is better than Gabor filter. In view of the fact that facial expressions are concentrated in eyes, eyebrows, mouth and other regions, a variety of cascade facial expression recognition methods are proposed by scholars. The detailed features of different regions are formed into a feature vector according to certain rules, and the data are input into the classifier for recognition after dimensionality reduction. For example, the data transformed by Gabor is classified by radial network joint coding in the literature. In this paper, LBP was used to calculate the local texture features of facial expressions and information entropy was used to determine the cascade weighted values of each region. The above multi-regional feature cascade expression recognition method fully describes the facial expression features, but the high dimension of representation data will affect the real-time recognition.

In recent years, the cascade shape regression model has made a breakthrough in the direction of feature point retrieval. This regression model obtains the mapping method of face appearance to face shape from historical experience, and establishes the mapping method of face appearance to form. This method does not need complex face shape and its appearance modeling, high efficiency, convenient implementation, good effect. In addition, deep learning orientation of face feature point localization algorithm also has good results. The combination of deep learning and shape regression framework can better improve the accuracy of location model, which is one of the main methods of feature location.

Up to now, although the automatic facial expression recognition technology has developed rapidly under the impetus of various applications, the robust automatic facial expression recognition system has not been established yet. According to the research of domestic and foreign scholars on facial expression recognition, the above algorithms have their own limitations, and the efficiency of recognition and classification algorithms is seriously affected by the large amount of information. How to improve the efficiency of the algorithm has become an important research aspect of classification recognition algorithm.

In this paper, LBP operator is used to detect faces, and 68 key points of faces are trained by multi-level cascade regression tree model. Geometric features of facial expressions are extracted, and facial expressions are discriminated based on these features and SVM, and then 7 facial expressions are classified.

1 feature extraction algorithmFacial recognition typically involves a machine looking at a photograph of a person’s face to determine how they are feeling at the moment. The easiest way to understand expressions structurally — the different combinations of eyebrows, nose, mouth and face — is to classify these geometric positions to identify expressions. This paper proposes a method of decision expression classification using LBP(local binary mode) features combined with SVM(support vector machine). The basic process is shown in Figure 1.Figure 1 Recognition process face expression basic recognition ideas are as follows: 1) find the face in the picture, this is the most basic requirements. Otherwise, facial features can not be extracted from the information of the five senses; 2) After finding human faces, analyze their geometric and textural features to represent different expressions; 3) Select classification methods to distinguish these features.

Facial emotional features are generally extracted in the following ways: overall template matching and matching based on geometric texture features. In global template matching, templates usually use pixels or vectors. In the matching of geometric features, the main feature points are detected in the graph, and the feature vectors are obtained by the distance between the feature points and the relative size of the main part. The feature-based approach is more computationally intensive than the template-based approach, but is insensitive to face position, scale, head direction and size.

1.1 Face detection Local Binary Pattern (LBP) features will not change significantly due to rotation or illumination changes, and the LBP feature algorithm is easy to calculate, processing video facial expression recognition can be more effective, better real-time. Local refers to the texture characteristics of a pixel point on the image, in many cases refers to the relationship between this point and the surrounding pixels; Binary mode refers to the binarization of the center pixel as the threshold.

In the simplest LBP mode, an image is divided into 3×3 regions in pixels. Compare the values of the surrounding 8 pixels to the center pixel, set to 1 if the value is not less than the center pixel, or 0 otherwise. In this way, 8 pixels in the range of 3×3 can be compared with the center pixel to generate the 8-bit binary number 00010011(converting the binary number into decimal number, namely LBP code, 256 kinds in total), that is, the LBP value of the center pixel of the window can be obtained, which can reflect the texture information of the region. The calculation process is shown in Figure 2.FIG. 2 LBP operator calculation Process LBP operator value changes little due to illumination, because under different illumination, the relationship between surrounding pixels and central pixels does not change, that is, light with uniform variation to a certain extent does not affect LBP operator value. Therefore, to some extent, the algorithm can solve the influence caused by illumination. At the same time, the simple calculation of LBP algorithm is also helpful for real-time image analysis. As shown in Figure 3, LBP characteristics under different illumination have little change, which can solve some influence caused by illumination.Figure 3 LBP features under different illumination LBP features of a whole face image including the background are calculated and compared with LBP features of the face, it is impossible to locate the face. LBP features to reaction somewhere without the texture information, compare the ideas of the face detection is based on the LBP feature similarities of face image and target image to determine whether the image is face, but the process is only for face recognition, namely whether an image is face, rather than a image contains face, how to find a face in the whole image is The key. Since the face can be recognized, then in the input image intercept part of the LBP feature extraction and judgment, can determine whether the intercepted position in the image is a face, so that has been cycling through the whole image, you can locate the position of the face.

In this paper, the set window is used to slide on the image to scan the whole image at multiple scales. Figure 4 is the face detection process, and the classifier is used to determine whether it is a face. First, the detection window is aligned with the object to be detected, and then the window is moved for the next comparison until the image is traversed by the window. Finally, the original scale is scaled for multi-scale detection. Figure 5 shows the process of face positioning by multi-level cascade classification. By calculating the image features covered by the window and matching with the target features, the content of this part of the image is determined.Figure 4 Face detection process in which, the similarity of LBP feature vectors of two images is calculated to realize face recognition, the similarity formula isThis formula is the Chi-square test, which can detect the correlation of categories. In the formula, H1 and H2 are LBP histograms of the two images respectively. The smaller d is, the lower the difference is, and the closer and more similar the two images are. D is equal to 0.

Facial Landmark Detection is based on the input face image to automatically locate the key Facial feature points, such as eyes, nose tip, mouth corner, eyebrows and contour points of face parts. According to the relationship between these feature points, the current emotional state can be judged.

Because of the negative effects of illumination, posture, occlusion and other problems, it is not easy to realize face key detection. In this paper, the LBP feature is used to locate the face, and the face target is tracked, and then the face is detected by 68 key points. In this paper, geometric features were used for expression detection, and the Ensemble of Regression Tress(ERT) algorithm was used for cascade Regression, that is, the Regression tree method based on gradient enhancement learning. This method is fast and effective. At the same time, incomplete faces can still be detected.

After obtaining an image, the algorithm will generate an initial Shape, that is, a rough feature point position will be estimated first, and then Gradient Boosting method will be used to reduce the deviation square sum (loss function) of the initial shape and the tagged position (ground truth (real feature point position). The least square method was used to minimize the deviation, and the cascade regression factors of each level were obtained. Equation (2) describes the sum of squares of deviations, where Yi is the predicted value and Ti is the nominal value.Gradient Boosting algorithm selects the Gradient descent direction during iteration to ensure the best results. Loss function is used to model “on”, said if the model is not fitting, loss function, higher error rate of the model, if the algorithm can reduce loss function has been, is in constant improvement algorithm model, and the best method is the loss function on the Gradient direction of falling Gradient Boosting algorithm thoughts.

The idea of locating facial feature points can be understood as learning a regression function F, input image I, output θ, that is, the position of feature points, then θ=F(I). Generally speaking, the cascade regression model can be unified to learn multiple regression functions {f1, F2… Fn} : theta = F (I) = fn (fn – 1 (… F (theta 0, I), I)) (3) the theta I = fi (theta I – 1, I), I = 1,… ,n (4) cascade means that the input of the current function Fi depends on the output of the higher-level function FI-1, and the learning goal of each Fi is to approximate the real position of the feature point θ, where θ0 is the initial shape. Regression of the difference between the current shape θi-1 and the labeled position θ I: δ θ I =θi-θi-1. The core formula isWhere, T represents the serial number of the cascade; Sˆ(t) denotes the shape of the t stage regeller; Rt stands for t – order regressors, in which the input parameters learned by each regressor were the difference between the current shape and the labeled shape (the updated shape of the previous regressor). The features used here can be grayscale values or other features. Each regressor is composed of many trees, and the parameters of each tree are trained according to the coordinate difference between the current shape and the real shape and the randomly selected pixel pairs.

Figure 6 shows the regression process of the algorithm. ERT directly stores the updated value of shape δ S into leaf node in the process of learning Tree. The initial position δ S is the average shape (mean) after passing all learned trees Shape and δ S of all leaf nodes that pass through can be used to obtain the final key points of the face. Figure 7 shows the key points detected by the algorithm.FIG. 6 Regression process of the algorithmFigure 7. Key point markers

Facial expression recognition needs to detect the human face before facial expression recognition, and then distinguish the emotion expressed by the target object through the analysis of facial features. This needs to establish a discriminant basis, namely classifier. Collect some positive samples (images of various expressions) to build an expression library, and then train the classifier to label each state of expression, which is the training of the classifier. Facial expression recognition process input an image, classifier to identify and judge the category of the image.

Support Vector Machine (SVM) is a method of binary classification model. Support vector machine model is a linear classifier defined in feature space with maximum interval, and the learning strategy of SVM is interval maximization [13]13. The maximum classification interval of SVM can be simply understood as the existence of two types of two-dimensional data. If the data is drawn on a two-dimensional coordinate plane, the two types of data can be easily separated by a line. Theoretically, there can be countless kinds of this line, but there is always one that can meet the maximum distance between the point close to this line (positive and negative samples) and this line, which is the decision boundary obtained by SVM algorithm.

Generally speaking, an ordinary SVM is a line used to perfectly divide two types of data, as shown in Figure 8. This is the perfect line that lies in the middle of two types of data and is the same distance between them. The points closest to the dividing line are Support vectors. If the points with higher dimensions are classified, the dividing line of SVM is plane or hyperplane. The core of support vector machine is support vector, which can be used to calculate classification hyperplane by locating some specific points in the sample set as support vector, and then classify categories according to the hyperplane. SVM is a supervised machine learning classification algorithm, so for a given training sample, it is necessary to determine whether the classification of each sample is 1 or 0, that is, each sample needs to mark an exact category label for SVM training. For the features and dimensions of samples,SVM has no limitation. In this paper, geometric features of feature points are used as training objects to extract 68 features of facial features for analysis and recording, and SVM classifier is used for training classification and recognition.

FIG. 8 Maximum classification interval of support vector machine

Two, some source code

function varargout = main_gui(varargin)
% MAIN_GUI MATLAB code for main_gui.fig
%      MAIN_GUI, by itself, creates a new MAIN_GUI or raises the existing
%      singleton*.
%
%      H = MAIN_GUI returns the handle to a new MAIN_GUI or the handle to
%      the existing singleton*.
%
%      MAIN_GUI('CALLBACK',hObject,eventData,handles,...) calls the local
%      function named CALLBACK in MAIN_GUI.M with the given input arguments.
%
%      MAIN_GUI('Property'.'Value',...). creates anew MAIN_GUI or raises the
%      existing singleton*.  Starting from the left, property value pairs are
%      applied to the GUI before main_gui_OpeningFcn gets called.  An
%      unrecognized property name orinvalid value makes property application % stop. All inputs are passed to main_gui_OpeningFcn via varargin. % % *See GUI  Options on GUIDE's Tools menu.  Choose "GUI allows only one % instance to run (singleton)".
%
% See also: GUIDE, GUIDATA, GUIHANDLES

% Edit the above text to modify the response to help main_gui

% Last Modified by GUIDE v2. 5 29-Dec- 2018. 17:29:22

% Begin initialization code - DO NOT EDIT
gui_Singleton = 1;
gui_State = struct('gui_Name',       mfilename, ...
                   'gui_Singleton',  gui_Singleton, ...
                   'gui_OpeningFcn', @main_gui_OpeningFcn, ...
                   'gui_OutputFcn',  @main_gui_OutputFcn, ...
                   'gui_LayoutFcn', [],...'gui_Callback'[]);if nargin && ischar(varargin{1})
    gui_State.gui_Callback = str2func(varargin{1});
end

if nargout
    [varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});
else
    gui_mainfcn(gui_State, varargin{:});
end
% End initialization code - DO NOT EDIT


% --- Executes just before main_gui is made visible.
function main_gui_OpeningFcn(hObject, eventdata, handles, varargin)
% This function has no output args, see OutputFcn.
% hObject    handle to figure
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
% varargin   command line arguments to main_gui (see VARARGIN)

% Choose default command line output for main_gui
handles.output = hObject;

% Update handles structure
guidata(hObject, handles);

% UIWAIT makes main_gui wait for user response (see UIRESUME)
% uiwait(handles.figure1);


% --- Outputs from this function are returned to the command line.
function varargout = main_gui_OutputFcn(hObject, eventdata, handles) 
% varargout  cell array for returning output args (see VARARGOUT);
% hObject    handle to figure
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)

% Get default command line output from handles structure
varargout{1} = handles.output;


% --- Executes on button press in pushbutton1.
function pushbutton1_Callback(hObject, eventdata, handles)
% hObject    handle to pushbutton1 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
global str img cc
[filename,pathname] = uigetfile({'*.jpg';'*.bmp'},'Select picture');
str = [pathname,filename];
img = imread(str);
cc=imread(str);
subplot(1.3.1),imshow(cc);
set(handles.text5,'string',str);


% --- Executes on button press in pushbutton3.
function pushbutton3_Callback(hObject, eventdata, handles)
% hObject    handle to pushbutton3 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
close(gcf);




% --- Executes on button press in pushbutton4.
function pushbutton4_Callback(hObject, eventdata, handles)
% hObject    handle to pushbutton4 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
global cc img t
load('save.mat');
 mapping=getmapping(8.'u2'); W = [% LBP mapping2.1.1.1.1.1.2; .2.4.4.1.4.4.2; .1.1.1.0.1.1.1; .0.1.1.0.1.1.0; .0.1.1.1.1.1.0; .0.1.1.2.1.1.0; .0.1.1.1.1.1.0]; 
d=[];
 image_size = size(cc);
   dimension = numel(image_size);
   if dimension == 3
      cc=rgb2gray(cc);
   end
   
       X = double(cc);
      X=255*imadjust(X/255[0.3;1], [0;1]);
  X = imresize(X,[64 64].'bilinear'); % used'bilinear': Extended by bilinear interpolation algorithm64*64H2=DSLBP(X,mapping,W); % LBP histogram of extracted images Gray=X; Gray=(Gray-mean(Gray(:)))/std(Gray(:))*20+128;
  lpqhist=lpq(Gray,3.1.1.'nh'); % Calculate LPQ histogram of each photo d=[d;a]; P_test=d; P_test=mapminmax(P_test,0.1); %%%%%%%% above is the feature extraction part %%%%% From here is the recognition of facial expression algorithm, using support vector machine to recognize addpath SVM-km %% add support vector machine toolbox C =100;

kerneloption= 1.3; % Set the kernel parameter kernel='gaussian'; % set gaussian kernel as the kernel function of support vector machines (SVM) [ypred2, maxi] = svmmultival (P_test xsup, w, b, NBSV, kernel, kerneloption);for i=1:length(ypred2)
 
    elseif ypred2(i)==3     t='Fear';  
    elseif ypred2(i)= =4    t='Happiness';
    elseif ypred2(i)= =5    t='Sad';
    elseif ypred2(i)= =6    t='Surprise';
    end
    detector = vision.CascadeObjectDetector;
    bboxes=step(detector,img);
    FrontalFaceCART=insertObjectAnnotation(img,'rectangle',bboxes,t,'color'.'cyan'.'TextBoxOpacity'.0.8.'FontSize'.13);

   end
    set(handles.text10,'string',t);

% --- Executes during object creation, after setting all properties.
function text5_CreateFcn(hObject, eventdata, handles)
% hObject    handle to text5 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    empty - handles not created until after all CreateFcns called


% --- Executes on button press in pushbutton6.
function pushbutton6_Callback(hObject, eventdata, handles)
% hObject    handle to pushbutton6 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)

load('save.mat');
disp('End of training');
axes(handles.axes2);
vid = videoinput('winvideo'.1.'YUY2_640x480');
set(vid,'ReturnedColorSpace'.'rgb');
vidRes = get(vid, 'VideoResolution');
nBands = get(vid, 'NumberOfBands');
hImage = image( zeros(vidRes(2), vidRes(1), nBands) );
preview(vid, hImage);
disp('Camera on');
faceDetector1 = vision.CascadeObjectDetector;
while(1)
frame = getsnapshot(vid);
box = step(faceDetector1, frame); % Detect faces
if isempty(box)= =0
    ff=imcrop(frame,[box(1),box(2),box(3),box(4)]); %figure; imshow(cc); ff=rgb2gray(ff); %figure; imshow(cc); ff=histeq(ff); % histogram equalizer % imwrite(cc,'.\test\1.jpg');
    yy=svm_test(xsup,w,b,nbsv,ff);
    h=rectangle('position',[box(1),box(2),box(3),box(4)].'LineWidth'.2.'edgecolor'.'b');
    for i=1:length(yy)

        end
    end
      pause(0.05);
      set(t1,'string'[]);delete(h);
        if strcmpi(get(gcf,'CurrentCharacter'),'c')
         delete(vid);
         disp('Program exit');
         break;
        end

      
else t1=text(10.10.sprintf('No face detected'), 'FontAngle'.'italic'.'FontSize'.15.'Color'.'b'.'FontWeight'.'Bold');
     pause(0.05);
     set(t1,'string'[]);if strcmpi(get(gcf,'CurrentCharacter'),'1')
         delete(vid);
         disp('Program exit');
         break;
     end
end
end

Copy the code

3. Operation results

Matlab version and references

1 matlab version 2014A

2 Reference [1] CAI Limei. MATLAB Image Processing — Theory, Algorithm and Case Analysis [M]. Tsinghua University Press, 2020. [2] Yang Dan, ZHAO Haibin, LONG Zhe. Examples of MATLAB Image Processing In detail [M]. Tsinghua University Press, 2013. [3] Zhou Pin. MATLAB Image Processing and Graphical User Interface Design [M]. Tsinghua University Press, 2013. [4] LIU Chenglong. [5] Yao Liza, XU Guoming, FANG Bo, HE Shixiong, ZHOU Huan. [6] yao liza, zhang junwei, fang bo, zhang shaolei, zhou huan, zhao feng. A video expression recognition method combining LBP and SVM [J]. Journal of shandong university of technology (natural science edition), 2020,34(04) Design and implementation of facial expression recognition system based on LBP and SVM [J]. Journal of guizhou normal university (natural science edition), 2020,38(01)