This article has participated in the “new creative Ceremony” activity

The principle of analyzing

In our last blog on dimensionality reduction, we discussed the principles and derivation of covariance and singular value decomposition, which we will not repeat here. You can look at the easy-to-understand machine learning – curse of dimensions (in a simple way to express the mathematical concept and practice of machine learning dimensionality reduction) and easy-to-understand machine learning – gradient ascent principal component analysis mathematical principle derivation and explanation for the review.

Data selection

For data selection, we used iris data set and Swiss volume data set to achieve the code and demonstrate the effect

Why use these two data sets? Because iris is one of the most commonly used data sets for machine learning beginners, it is more familiar to most people, and his data is four-dimensional data, which meets the requirements for dimension reduction. However, since we cannot intuitively observe the distribution effect of four-dimensional data through drawing, we introduce Swiss volume data set here

Iris data observation

Since the data of iris cannot be observed intuitively by drawing, why is it still observed by drawing? First of all, it’s customary to look at the data set, and then we can take some of the data and look at the distribution of the data

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)

plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] =False  # is used to display the minus sign normally
iris = datasets.load_iris() # Get warbler's tail data
X = iris.data[:,2:] The sepals of Warbler's tail are long and wide, and the petals are long and wide. Only the data of the petals are extracted here
y = iris.target # Warbler's tail category

plt.scatter(X[y==0.0],X[y==0.1])
plt.scatter(X[y==1.0],X[y==1.1])
plt.scatter(X[y==2.0],X[y==2.1])
plt.xlabel('Petal length')
plt.ylabel('Petal width')
plt.grid()
Copy the code

Swiss volume data observation

from sklearn.datasets import make_swiss_roll
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

X, t = make_swiss_roll(n_samples=2000, noise=0.1)
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=t, cmap=plt.cm.Spectral,edgecolors='black')
plt.show()
Copy the code

From the image, we can observe that the dimension reduction of the Swiss roll can not be completed by removing only one coordinate axis, but should be removed after rotation

Code implementation

Singular value decomposition method

x1 = X.dot(X.T)
eig, featueVector = np.linalg.eig(x1)
x1 = x1.dot(featueVector[:,1:])
plt.scatter(x1[:,0], x1[:,1], c=t)
plt.show()
Copy the code

Effect of iris data set after dimensionality reduction

Effect of dimensionality reduction on Swiss volume data set

Covariance method

import matplotlib.pyplot as plt
import numpy as np

X1cov=np.cov(X.T)
eig, featueVector = np.linalg.eig(X1cov)
X1new = X.dot(featueVector[:,:2])
xm = X1new.max()
plt.scatter(X1new[:,0], X1new[:,1], c=t)
plt.show()
Copy the code

Effect of iris data set after dimensionality reduction

Effect of Swiss volume data set after dimensionality reduction

PCA method in SKLearn

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(X)
X1new = pca.transform(X)
plt.scatter(X1new[:,0], X1new[:,1], c=t)
plt.show()
Copy the code

Effect of iris data set after dimensionality reduction

Effect of Swiss volume data set after dimensionality reduction