“This is the sixth day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

When it comes to dimension reduction, the first thing that comes to mind is PCA, encoder, decoder and word vector. However, today we may not talk about some specific methods of dimension reduction, but stand at a certain height to look at dimension reduction.

In fact, when we do classification problems, we usually use the concept of distance, why do we reduce the dimension? Usually, our data are high-dimensional data, and in the high-dimensional space, our data are sparse.

Dimension Even if the dimension characteristic 2n2^n2n increases by one dimension, the amount of data grows exponentially

There are many methods of dimensionality reduction in machine learning, but we can roughly divide them into two broad categories,

Matrix Factorization

  • Principal Component Analysis
  • Sparse PCA
  • Linear Autoencoder
  • Latent Dirichlet Allocation
  • Non-negative Matrix Factorization
  • Generalised Low Rank Models
  • Word2Vec
  • GloVe
  • Probalistic PCA

Matrix factorization actually refers to this broad class of techniques, from topic modeling to Word2Vec, plain OLD PCA, and various other probabilistic techniques, covering a large number of algorithms based on a very simple framework of basic single matrix factorization.

Ok, matrix factorization goal is, we are thinking of a matrix expression to the product of two small matrix approximation, this is what we want to do, so in terms of dimension reduction, blue matrix is their data, also is that we get the source data, each line is a sample of each column is a feature, we want to turn it into a multiplied by some prototype said.

Blue Data Data rows are samples, and columns are characteristics of samples, which can be decomposed into representations multiplied by archetypes.

You can think of this representation as a low-dimensional representation, and archetypes is a basic form that you use to reconstruct raw data,

A single row of data represents a single sample, and each sample has a corresponding representation, which is multiplied by the entire prototype matrix. Representations can be reduced to higher dimensional space by multiplying representations by prototypes to get samples.

The samples are decomposed into linear combinations of prototypes, and the presentation is the low-dimensional representation. The data is represented as the combination of these prototypes, which is matrix decomposition

Neighbour Graphs

  • Laplacian Eigenmaps
  • Spectral Embedding
  • Hessian Eigenmaps
  • Local Tangent Space Alignment
  • JSE
  • Isomap
  • t-SNE
  • Locally Linear Embedding
  • LargeVis

For building a graph from the data and then embedding the graph in a low-dimensional space, all the details are how to build the graph and how to lay it out.

The focus is on core intuition and core ideas, not the nuts and bolts of how these algorithms work.

The geometric angles

Figure A shows that the area of A circle cut inside A square with sides of length 1 can be represented as π0.52\ PI 0.5^2π0.52. The volume of A sphere cut inside A cube is 43π0.53\frac{4}{3} \ PI 0.5^334π0.53

In a higher dimensional space, the sphere inscribed to a higher dimensional side of length 1 is kπ0.5Dk \ PI 0.5^Dkπ0.5D D is a higher dimensional space, and when D is large enough, the volume of the sphere inscribed to a higher dimension approaches 0.

In high-dimensional space, our cognition in low-latitude space is sometimes broken. In high-dimensional space, sphere samples are all distributed on the edge, resulting in data sparsity, which is not only sparse but also uneven.