This article has participated in the “new creative Ceremony” activity

preface

Gda principal component analysis is also a common operation for dimensionality reduction and noise reduction. It maximizes the direct variance of data by changing the coordinate axis (actually the mapping of data), that is, it is easier to divide data

Objective and principle analysis

Get the principal component eigenvectors. In gDA principal component analysis, the idea is to be able to map the original data to an axis with a larger variance (some books and videos also talk about moving the axis) by multiplying a unit vector in the direction of the mapping.

pretreatment

In order to facilitate the calculation, the data should be preprocessed firstly. The preprocessing here is mean zeroization, that is, subtracting the mean value from all the data. Xi xi – x = ˉ x_i = x_i – \ bar = {x} xi xi – ˉ x

Formula derivation and interpretation

The target

We want to get the data with the largest variance, that is, max1n∑ I =1n(xpi−xˉ)2max \frac{1}{n} \ Sum_ {I =1}^n(x^i_p-\bar{x})^ 2MAXn1 ∑ I =1n(xpi−xˉ)2 because we’ve preprocessed the data. Max1n ∑ I =1n(xpi)2max \frac{1}{n} \sum_{I =1}^n(x^i_p)^2maxn1∑ I =1n(xpi)2

Derivation of mapping relation

So let’s say that x sub I is going to map to some new coordinate axis xiw=∣xi∣ ∣w∣cos thetax ^iw= \rvert x^ I \lvert \cdot \rvert w \lvert cos\thetaxiw=∣xi∣ ∣cos theta so let’s say that the vector W is the unit vector Change it to: Xiw =∣xi∣cos theta= xpi x^iw= \rvert x^ I \lvert cos theta=x^i_p xiw=∣xi∣cos theta= xpi

Let’s put the mapping in

Max1n ∑ I =1n(xiw)2max \frac{1}{n} \sum_{I =1}^n(x^iw)^2maxn1∑ I =1n(xiw)2

Gradient rise

Building a function

Now let’s build a function


= 1 n i = 1 n ( x 1 i w + x 2 i w + x 3 i w + x n i w ) 2 \\= \frac{1}{n} \sum_{i=1}^{n}(x^i_1w+x^i_2w+x^i_3w+ \cdot \cdot \cdot x^i_nw)^2

derivative

Below we begin to f (w) derivative ∇ f (w) = 1 NXT (xw) \ nabla f (w) = \ frac {1} {n} x ^ T (xw) ∇ f (w) = n1xT (xw) Our problem becomes gradient ascent using the above formula, and then we optimize W by gradient ascent. The optimized W is the principal component that can transform the data to the coordinate axis with larger variance