A list,

This paper designs and implements a text-related voiceprint recognition system based on Matlab, which can determine the speaker’s identity.

1 System Principle

A. Voice print recognition

In the past two years, with the development of artificial intelligence, many mobile phone apps have introduced the function of voice print lock. This is mainly used in the voice print recognition related technology. Voice print recognition, also known as speaker recognition, is a little different from speech recognition.



B. Meyer frequency cepstrum coefficient (MFCC)

Mel Frequency Cepstrum Coefficient (MFCC) is one of the most commonly used speech signal features in speech signal processing.

Experimental observations show that the human ear acts like a filter bank, focusing only on certain frequencies on the spectrum. The range of sound frequency perception of human ear does not follow a linear relationship in the spectrum, but follows an approximate linear relationship in the Mel frequency domain.

Meier frequency cepstrum coefficient takes into account human auditory characteristics, first mapping the linear spectrum to the Mel nonlinear spectrum based on auditory perception, and then converting to cepstrum. The relation between ordinary frequency conversion and Mayer frequency is:



C. VectorQuantization

The system uses vector quantization to compress the extracted speech MFCC features.

VectorQuantization (VQ) is a lossy data compression method based on block coding rules. In fact, there is a VQ step in multimedia compression formats such as JPEG and MPEG-4. Its basic idea is: several scalar data groups form a vector, and then the whole quantization in the vector space, so as to compress the data without losing much information.

3 System Structure

The structure of the whole system in this paper is shown as follows:

— Training process

Firstly, the speech signal is preprocessed, then the MFCC characteristic parameters are extracted and compressed by vector quantization method to obtain the speaker’s pronunciation codebook. The same speaker says the same content for many times, and the training process is repeated to form a codebook library.

— Identification process

In recognition, the speech signal is also preprocessed to extract MFCC features and compare the Euclidean distance between this feature and the training library codebook. When the value is smaller than a certain threshold, we assume that the speaker and the content of the speech are consistent with those in the training codebook, and the pairing is successful.

Ii. Source code

function varargout = test4(varargin)
% TEST4 MATLAB code for test4.fig
%      TEST4, by itself, creates a new TEST4 or raises the existing
%      singleton*.
%
%      H = TEST4 returns the handle to a new TEST4 or the handle to
%      the existing singleton*.
%
%      TEST4('CALLBACK',hObject,eventData,handles,...) calls the local
%      function named CALLBACK in TEST4.M with the given input arguments.
%
%      TEST4('Property','Value',...) creates a new TEST4 or raises the
%      existing singleton*.  Starting from the left, property value pairs are
%      applied to the GUI before test4_OpeningFcn gets called.  An
%      unrecognized property name or invalid value makes property application
%      stop.  All inputs are passed to test4_OpeningFcn via varargin.
%
%      *See GUI Options on GUIDE's Tools menu.  Choose "GUI allows only one
%      instance to run (singleton)".
%
% See also: GUIDE, GUIDATA, GUIHANDLES
 
% Edit the above text to modify the response to help test4
 
% Last Modified by GUIDE v2.5 17-Mar-2019 09:58:00
 
% Begin initialization code - DO NOT EDIT
gui_Singleton = 1;
gui_State = struct('gui_Name',       mfilename, ...
                   'gui_Singleton',  gui_Singleton, ...
                   'gui_OpeningFcn', @test4_OpeningFcn, ...
                   'gui_OutputFcn',  @test4_OutputFcn, ...
                   'gui_LayoutFcn',  [] , ...
                   'gui_Callback',   []);
if nargin && ischar(varargin{1})
    gui_State.gui_Callback = str2func(varargin{1});
end
 
if nargout
    [varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});
else
    gui_mainfcn(gui_State, varargin{:});
end
% End initialization code - DO NOT EDIT
 
 
% --- Executes just before test4 is made visible.
function test4_OpeningFcn(hObject, eventdata, handles, varargin)
% This function has no output args, see OutputFcn.
% hObject    handle to figure
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
% varargin   command line arguments to test4 (see VARARGIN)
 
% Choose default command line output for test4
handles.output = hObject;
 
% Update handles structure
guidata(hObject, handles);
 
% UIWAIT makes test4 wait for user response (see UIRESUME)
% uiwait(handles.figure1);
 
 
% --- Outputs from this function are returned to the command line.
function varargout = test4_OutputFcn(hObject, eventdata, handles) 
% varargout  cell array for returning output args (see VARARGOUT);
% hObject    handle to figure
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
 
% Get default command line output from handles structure
varargout{1} = handles.output;
 
 
% --- Executes on button press in pushbutton1.
function pushbutton1_Callback(hObject, eventdata, handles)
% hObject    handle to pushbutton1 (see GCBO)
% eventdata  reserved - to be defined in a future version of MATLAB
% handles    structure with handles and user data (see GUIDATA)
global thk1 thk2 thk3 
global tlc1 tlc2 tlc3
global tlyy1 tlyy2 tlyy3 
global tqs1 tqs2 tqs3
global tyqc1 tyqc2 tyqc3
global startpos len
 
startpos=601;
len=399;
[s,fs]=audioread('训练样本hk1.wav');
thk1= MFCC2par(s,fs);
thk1=thk1(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本hk2.wav');
thk2= MFCC2par(s,fs);
thk2=thk2(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本hk3.wav');
thk3= MFCC2par(s,fs);
thk3=thk3(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本lc1.wav');
tlc1= MFCC2par(s,fs);
tlc1=tlc1(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本lc2.wav');
tlc2= MFCC2par(s,fs);
tlc2=tlc2(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本lc3.wav');
tlc3= MFCC2par(s,fs);
tlc3=tlc3(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本lyy1.wav');
tlyy1= MFCC2par(s,fs);
tlyy1=tlyy1(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本lyy2.wav');
tlyy2= MFCC2par(s,fs);
tlyy2=tlyy2(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本lyy3.wav');
tlyy3= MFCC2par(s,fs);
tlyy3=tlyy3(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本qs1.wav');
tqs1= MFCC2par(s,fs);
tqs1=tqs1(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本qs2.wav');
tqs2= MFCC2par(s,fs);
tqs2=tqs2(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本qs3.wav');
tqs3= MFCC2par(s,fs);
tqs3=tqs3(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本yqc1.wav');
tyqc1= MFCC2par(s,fs);
tyqc1=tyqc1(startpos:startpos+len,1:12);
 
[s,fs]=audioread('训练样本yqc2.wav');
tyqc2= MFCC2par(s,fs);
tyqc2=tyqc2(startpos:startpos+len,1:12);
function getmfcc= MFCC2par( x,fs)
 %=========================================================
 % 无去噪及端点检测
 % Input:音频数据x,采样率fs
 % Output:(N,M)大小的特征参数矩阵  其中N为分帧个数,M为特征维度
 % 特征参数:M=24 倒谱系数12维,一阶差分12维
 %=========================================================
 
%[x fs]=wavread(sound);
%取单声道信号
[~,etmp]=size(x);
if (etmp==2)
x=x(:,1);
end
 
%归一化mel滤波器组系数
 
bank=melbankm(24,256,fs,0,0.5,'m');%Mel滤波器的阶数为24,fft变换的长度为256,采样频率为8000Hz  
 
bank=full(bank);
 
bank=bank/max(bank(:));%[24*129]
 
 %设定DCT系数
 
for k=1:12
 
n=0:23;
 
dctcoef(k,:)=cos((2*n+1)*k*pi/(2*24));
 
end
 
%归一化倒谱提升窗口
 
w=1+6*sin(pi*[1:12]./12);
 
w=w/max(w);
 
 
%预加重滤波器
 
xx=double(x);
 
xx=filter([1-0.9375],1,xx);%预加重
 
xx=enframe(xx,256,80);%对x 256点分为一帧
 
 
%计算每帧的MFCC参数
 
for i=1:size(xx,1)
 
y=xx(i,:);%取一帧数据
 
s=y'.*hamming(256);
 
t=abs(fft(s));%fft快速傅立叶变换  幅度谱
 
t=t.^2; %能量谱
 
%对fft参数进行mel滤波取对数再计算倒谱
c1=dctcoef*log(bank*t(1:129));%对能量谱滤波及DCT %t(1:129)对一帧的前128个数(帧移为128)
 
c2=c1.*w';%归一化倒谱
 
%mfcc参数
 
m(i,:)=c2';
 
end
Copy the code

3. Operation results