A list,

1 PCA Principal Component Analysis (PCA) is a common data Analysis method. PCA is a data representation method that transforms the original data into a set of linearly independent dimensions through linear transformation. PCA can be used to extract the main feature components of data and is often used for dimensionality reduction of high-dimensional data.

In data mining and machine learning, data is represented by vectors. For example, the annual flow and transaction of a Taobao store in 2012 can be regarded as a set of records, in which the data of each day is a record in the following format: (date, page views, number of visitors, the singular, clinch a deal, clinch a deal amount) sign the “date” is a record rather than the measurement, and data mining concern is mostly measure value, so if we ignore the date after this field, we get a set of records, each record can be expressed as a five dimensional vector, one of the samples as shown below:It is customary to use column vectors to represent a record, and this guideline will be followed later in this article. The complexity of many algorithms in machine learning is closely related to, or even exponentially related to, the dimension of the data. Here only 5 dimensions of data may not matter, but in actual machine learning, it is not uncommon to deal with tens of thousands or even tens of thousands of dimensions of data. In this case, the resource consumption of machine learning is unacceptable, so dimensionality reduction will be adopted for data. Dimensionality reduction means the loss of information, but since the actual data itself is often relevant, reduce the loss of information in dimensionality reduction. For example, in the data of taobao stores above, we know from experience that there is a strong correlation between “page views” and “number of visitors”, and a strong correlation between “number of orders” and “number of transactions”. When the number of page views is high (or low) on a particular day, we should largely assume that the number of visitors is also high (or low) on that day. Therefore, if you delete views or visitors, you don’t end up losing so much information that you reduce the dimension of the data, a so-called dimensional reduction operation. If the data dimension reduction is analyzed and discussed by mathematics, it is expressed by the professional term PCA, which is a dimension reduction method with strict mathematical basis and has been widely adopted.

The inner product of two vectors of the same size is defined as follows: In algebra, vectors are often represented by the point coordinates of the end of a line segment. So let’s say that some vector has a coordinate of 3,2, and the 3 here really means that the projection of the vector on the x axis is 3, and the projection of the vector on the y axis is 2. That is, implicitly introducing a definition of x and y vectors of length 1 in the positive direction. So a vector (3,2) is actually projecting 3 onto the X-axis and 2 onto the Y-axis. Notice that the projection is a vector, it can be negative. The vectors (x, y) actually represent linear combinations:So from the representation above, it turns out that all two-dimensional vectors can be represented as linear combinations like this. Here (1,0) and (0,1) are called bases in two dimensions.The default bases of (1,0) and (0,1) are chosen for convenience, of course, because they are unit vectors in the positive direction of the x and y axes respectively, thus making point coordinates and vectors on the two-dimensional plane correspond one to one. But in fact any two linearly independent two-dimensional vectors can be a basis, and linearly independent two-dimensional vectors in a two-dimensional plane, intuitively, are two vectors that are not in a straight line.And if the basis is orthogonal, the only thing that makes it a basis is that it’s linearly independent, and a non-orthogonal basis is fine. But because of the good properties of the orthogonal basis, the basis used in general is orthogonal. The basis transformation in the above example can be represented by matrix multiplication, i.e

If promotion, suppose there are M A N d vector, to transform it as R A N d vectors of new space, so first will be R A base according to the row of matrix A, then the vector according to the column of matrix B, then the product of two matrices AB is transformation as A result, the AB first M after the first M column transformation as A result, through the matrix multiplication is expressed as: 1.3 Covariance Matrix and Optimization Objective When dimensionality reduction is carried out, the key problem is how to determine whether the selected basis is optimal. That is, selecting the optimal basis is the best way to ensure the characteristics of the original data. So let’s say I have five pieces of dataYou take the average of each row, and then you subtract the average from each row, and you getThe matrix is expressed in the form of coordinates, and the graph is as follows:So now the question is: how do you choose to represent these data with one-dimensional vectors and still want to retain the original information as much as possible? In fact, this problem is to select a vector in a direction in the two-dimensional plane, project all data points onto this line, and represent the original record with the value of the projection, that is, the problem of two-dimensional reduction to one-dimensional. So how do you choose this direction (or base) so that you retain as much original information as possible? An intuitive view is that you want the projected values to be as diffuse as possible.

1.3.1 Variance The above problem is to hope that the values of the post-projection can be scattered in one direction as far as possible, and the degree of dispersion can be expressed by mathematical variance, namely:Thus, the above problem is formalized as: looking for a one-wiki, which maximizes the variance value after all data is transformed into coordinates on this basis.

2.3.2 Covariance Mathematically, the correlation can be expressed by the covariance of two features, namely:When the covariance is 0, it means that the two features are completely independent. In order for the covariance to be zero, you can only choose the second basis in directions that are orthogonal to the first basis. So the two directions that you end up choosing must be orthogonal.

At this point for dimension reduction problem of the optimization goal: will be reduced to a set of N d vector K d (K < N), the goal is to choose K units (mode 1) orthogonal basis, so that when the raw data transformation to the group based on, in various fields between the two covariance is 0, the field of variance was as large as possible (under the constraint of orthogonal, take the biggest K variance).

2.3.3 Covariance Matrix Assume that there are only two fields x and Y, and form them into a matrix by row, where is the matrix obtained by the centralized matrix, that is, subtracting the average value of each field from each field: 3.4 Diagonalization of covariance matrix 1.4 Algorithm and Examples 1.4.1 PCA algorithm1.4.2 instance 1.5. Discuss some of the capabilities and limitations of PCA based on the above explanation of the mathematical principles of PCA. In essence, PCA takes the direction with the largest variance as the main feature and “de-correlates” the data in each orthogonal direction, that is, makes them irrelevant in different orthogonal directions.

Therefore, PCA also has some limitations. For example, it can remove linear correlation well, but there is no way for high-order correlation. For data with high-order correlation, Kernel PCA can be considered and non-linear correlation can be converted into linear correlation through Kernel function. In addition, PCA assumes that the main features of the data are distributed in the orthogonal direction. If there are several directions with large variances in the non-orthogonal direction, the effect of PCA will be greatly reduced.

Finally, IT should be noted that PCA is a parameterless technology. In other words, in the face of the same data, if cleaning is not considered, the results will be the same. There is no subjective parameter intervention, so PCA is convenient for general implementation, but it cannot be personalized optimization.

Ii. Source code

% Pen recognition global IM; Imgdata =[]; Training image matrixfor i=1:2
    for j=1:4
        a=imread(strcat('ORL\pen',num2str(i),'_',num2str(j),'.bmp'));
        b=a(1:176*132); % b is the column vector1Star M, where M is equal to23232
        b=double(b); imgdata=[imgdata; b]; Imgdata is an M * N matrix, imgData per row of data a picture, M =400
    end;
end;
imgdata=imgdata'; Imgmean =mean(imgdata,2); % average picture, N - dimensional column vectorfor i=1:8minus(:,i) = imgdata(:,i)-imgmean; % minus is an N*M matrix, which is the difference between the training graph and the average graph end; covx=minus'* minus; % M * M order covariance matrix [COEFF, latent,explained] = pcacov(Covx'); %PCA, calculated with the transpose of the covariance matrix to reduce the computation % selection composition95The eigenvalue I of % of energy is equal to1;
proportion=0;
while(proportion < 95)
    proportion=proportion+explained(i);
    i=i+1;
end;
p=i- 1; % The characteristic pen coordinate system I = is obtained by training1;
while (i<=p && latent(i)>0)
    base(:,i) = latent(i)^(- 1/2)*minus * COEFF(:,i); % base is the N by P matrix used for projection, divided by latent(I)^(1/2) is the normalization of the pen image I = I +1; End % projected the training sample on the coordinate system and obtained a p*M order matrix as reference = base'*minus; % Test procedure -- Select the image in the test images folder to find the test im=imread('To be tested pen. to be tested pen.bmp');
a=im;
%b=a(1:38400);
b=double(b);
b=b';

object = base'*(b-imgmean); % Plot the image subplot(2.3.1); 
imshow(a); 
title(['Test pen']);   

distance=100000; % minimum distance method to find the training picture closest to the image to be recognizedfor k=1:8 
   temp= norm(object - reference(:,k));
   if(distance > temp) which = k; distance = temp; end; end; % find the location of the nearest image num1 =ceil(which/5); % num2 = mod(which,5); % num2 image fileif (num2 == 0)
    num2 = 5;
end;

Copy the code

3. Operation results

Fourth, note

Version: 2014 a