Selected from arXiv

By Petros Drineas and Michael W. Mahoney

Heart of the machine compiles

Participants: Li Zannan, Liu Xiaokun, Jiang Siyuan

Matrix calculation plays an important role in computer science and is a mathematical knowledge that every developer needs to master. Recently, Petros Drineas of Purdue University and Michael Mahoney of UC Berkeley presented an overview of Lectures on Randomized Numerical Linear Algebra can be used as a reference for linear Algebra knowledge. This paper will briefly introduce some of its contents (mainly chapter 2: Linear Algebra).

The thesis links: https://arxiv.org/pdf/1712.08880.pdf

Introduction to the

Matrices occupy a unique place in computer science, statistics and applied mathematics. An M × N matrix can describe the discrete differential operator information of m objects (each object is described by N features) in a finite element grid. An N by N positive definite matrix can encode the correlation between all pairs of N objects, or the edge connectivity between all pairs of N nodes in a network, and so on. Influenced by developments in science and computer technology, we have witnessed exciting developments in the theory and practice of matrix algorithms in recent years. One of the most notable is the use of randomization — usually assuming that the input data is noisy due to the generation mechanism — which can be used as an algorithm or computational resource to develop and improve basic matrix problems such as matrix multiplication, least square (LS) approximation, and low-order matrix approximation algorithms.

Random Numerical linear Algebra (RandNLA) is an interdisciplinary field of research that uses randomization as a computational resource to develop improved algorithms for large-scale linear algebra problems. From a fundamental point of view, RandNLA is derived from theoretical computer science (TCS) and has deep ties to mathematics (convex analysis, probability theory, metric embedding theory), as well as to applied mathematics (scientific computing, signal processing, numerical linear algebra). At the application level, RandNLA is an important new tool for machine learning, statistics and data analysis. Many well-designed implementations have surpassed highly optimized software libraries for a wide range of problems, such as least square regression, while also having considerable extensibility, parallel computation, and distribution capabilities. In addition, RandNLA provides a good algorithmic and statistical foundation for modern large-scale data analysis.

The following chapter will be used as an independent introduction to the three basic RandNLA algorithms using a multi: Randomized matrix multiplication, Randomized least-squares solvers. And a random algorithm is used to calculate the low rank approximation of the matrix. So, this chapter has a very strong connection to many areas of applied mathematics, and in particular it has a strong connection to many other chapters of the volume. Most importantly, it contains the work of G. Martinsson, who used these methods to develop an improved low-rank matrix approximation solver [2]; The work of R. Vershynin, who developed probability theory tools for analyzing RandNLA algorithm [3]; The work of J. Duchi, who used stochastic methods to solve more general optimization problems in a complementary way [4]; And the work of M. Maggioni, who used these methods as building blocks for more complex multiscale methods [5].

In the second section of this paper, the basic knowledge of linear algebra will be summarized; The basic knowledge of discrete probability is summarized in section 3. The random algorithm of matrix multiplication is introduced in the fourth section. In the fifth section, random algorithm of least square regression problem is introduced. In section 6, the stochastic algorithm of low rank approximation is introduced. Finally, we include two other introductory resources on RandNLA [6,7] for interested readers.

2 Linear Algebra

In this section, we will give a brief overview of the basic linear algebra properties and the mathematical notation we will use in this chapter. We assume that the reader has a basic knowledge of linear algebra (e.g., vector inner and cross products, basic matrix operations such as addition, scalar multiplication, transpose, upper/lower triangular matrices, matrix-vector multiplication, matrix multiplication, trace of matrices, etc.).

2.1 basis

We’re going to focus entirely on matrices and vectors in linear space. We use the symbol X ∈ R^n to represent n-dimensional vectors. Note that vectors are written in bold lowercase letters. It’s assumed that all vectors are column vectors unless otherwise specified. Vectors with all elements of zero are represented by 0, and vectors with all elements of 1 are represented by 1 (similar to Broadcasting); Dimensions may be implied in context or explicitly indicated by subscripts.

We will use bold capital letters to represent matrices, for example, A ∈ R^ MXN represents A matrix of order MXN; The ith row vector of A is A_i star, and the ith column vector of A is A_ star I. The identity matrix is represented by I_n, where n is the number of rows and columns of the matrix. Finally, we denote the i-th column of I_n, the i-th gauge basis, with e_i.

Inverse matrix: A matrix A ∈ R^ MXN is said to be nonsingular or invertible if there exists an inverse matrix A^-1 ∈ R^ MXN that satisfies the following conditions:

If all of the columns (or row vectors) of A are linearly independent, then A is invertible. In other words, there is no non-zero vector x ∈ R^n such that Ax is equal to 0. The standard properties of invertible matrices are: (A^−1)^⊤ = (A^⊤)^−1 = A^−⊤ (A inverse transpose is equal to A inverse transpose) and (AB)^−1 = B^−1* A^−1 (A inverse left times B inverse is equal to B inverse left times A inverse. Note: wechat expressions are inconvenient to display, please check the raw materials for accurate expressions.

Orthogonal matrix: A is called an orthogonal matrix if A matrix A ∈ R^n×n satisfies A^⊤=A^−1. Equivalently, for all I and j belonging to [1,n], the orthogonal matrix satisfies:

The same is true for the row vectors of A. That is, all of the columns (or rows) of A are pair-wise orthogonal or reciprocal normal vectors.

QR decomposition: Any matrix A ∈ R^n×n can be decomposed into the product of an orthogonal matrix and an upper triangular matrix: A=QR

Q ∈ R^n×n is an orthogonal matrix, and R ∈ R^n×n is an upper triangular matrix. QR decomposition is useful for solving linear equations with computational complexity O(n^3) and is numerically stable. To solve the linear equation Ax=b by QR decomposition, we first multiply both sides of the equation by Q^⊤, that is, Q^⊤QRx = Rx = Q^⊤b. And then we solve for Rx = Q^⊤b using reverse substitution.

2.2 norm

Norms are used to measure the size of a matrix or, correspondingly, the length of a vector. A norm is a function that maps R^ MXN (or R^n) to R. Formally:

, definition 1: any function meet | | | | : R ^ m * n – > R and the following properties, is called a norm:

  • The negative: | | A | | 0 or higher; | | A | | = 0 if and only if A = 0;

  • Ranging triangle rule: | | A + B | | | or less | A | | + | | B | |.

  • Scalar multiplication law: | | alpha A | | = | alpha | | | | A |, alpha ∈ R.

The following two properties can easily be proved:

  • || A ||=|| -A ||

  • B | | | A | | – | | | | | | or less | A – B | |

The second property is called the inverted triangle inequality.

2.3 Vector norm

Given an n-dimensional vector x and an integer p > 1, we can define the vector p-norm as:

The most common p-norm of vectors is:

  • 1 – norm:

  • Euclidean (2) norm:

  • Infinite (maximum) norm:

Given n-dimensional vectors x and y, we can use the P-norm as the upper bound of the inner product, that is, the Cauchy-Schwartz inequality can be written as:

In general, the inequality gives the Euclidean norm of two vectors as the upper bound of their inner product. The Holder inequality states that:

The inequality properties of the following p-norm vectors can be easily proved:

2.4 Inductive matrix norm

Given an m×n order matrix A, and an integer p > 1, we define the p-norm of the matrix as:

In general, the p-norm of the most commonly used matrix is:

  • 1-norm, take the maximum value of the matrix column plus and absolute value:

  • Infinite norm, take the maximum value of the absolute value of the matrix row sum:

  • 2 – norm,

This set of norms is called “induced” because they are achieved by means of A non-zero vector X that does not depend on A and P. Therefore, the general norm is A unit vector (p – the unit norm of norm) x to | | A | | p = | | Ax | | p. The inductive matrix p-norm follows the following submultiplicativity rule:

In addition, the matrix p – norm for elementary transformation of matrix is the same, namely | | PAQ | | p = | | A | | p, p and Q the elementary transformation of matrix for the corresponding dimensions. Similarly, if we consider matrix segmentation:

Then son matrix norm and all associated matrix norm: namely | | B | | p < = | | A | | p. The following relations among the p-norms of matrices can be proved relatively simply. Given an m by n matrix,

In addition, | | A ^ T | | 1 = | | A | | up, | | A ^ T | | up = | | A | | 1. Which transpose affected the infinite norm of the matrix norm, and 1-2 – without affecting the norm, namely | | A ^ T | | = | 2 | A | | 2. Similarly, the matrix 2-norm is not affected by a multi operation using the matrix pre(post) -multi, where its columns (or rows) are orthogonal vectors: 2 = | | UAV ^ T | | | | A | | 2, U and V for the corresponding dimensions of orthogonal matrix (U * U = ^ T I and V ^ T * V = I).

2.6 Singular value decomposition

We know that square matrices can be decomposed into eigenvalues and eigenvectors, but non-square matrices can not achieve eigenvalue decomposition. Therefore, singular value decomposition (SVD) is the most important matrix decomposition method in every matrix, because not all matrices can perform eigenvalue decomposition, but all matrices can perform SINGULAR value decomposition.

Definition 6. Given A matrix A ∈ R^m×n, we define the full SVD as:

Where U ∈ R^m×m and V ∈ R^n×n are orthogonal matrices containing the left and right singular vectors of A respectively, σ ∈ R^m×n is A diagonal matrix, where the singular value of A decreases on the main diagonal.

We often use u_i (or v_j), I =1… , m (or j=1,… N) to represent the columns of the matrix U (or V). Again, we will use σ_i, I = 1… , min{m, n} to represent the singular value:

The singular values of A are non-negative and the number is equal to min{m, n}. The number of nonzero singular values of A equals the rank of A. Due to orthogonal invariance, we get:

Where P and Q are orthogonal matrices in the corresponding dimensions (P^TP = I and Q^TQ = I). In other words, the singular value of PAQ is the same as the singular value of A.

The following inequality involving singular values of matrices A and B is very important. First, if A and B are both on R^m×n, for all I = 1… , min {m, n}.

Second, if A ∈ R^p×m and B ∈ R^m×n, for all I = 1… , min {m, n}.

The sigma _1 (A) = | | A | | _2. We are often interested in preserving only non-zero singular values and corresponding left and right singular vectors (of matrix A). Given A matrix A ∈ R^m×n and rank(A)=ρ, we can define its sparse SVD.

Definition 9. Given A matrix A ∈ R^m×n with rank ρ ≤ min{m, n}, we define sparse SVD as:

Where U ∈ R^m×ρ and V ∈ R^n×ρ are matrices containing binary orthogonal columns of left and right singular vectors corresponding to non-zero singular values (U^TU = I and V^TV = I). σ ∈ R^ρ×ρ is A diagonal matrix of nonzero singular values of A decreasing on the diagonal.

If A is non-singular, we can use SVD to compute its inverse:

(If A is non-singular, then it is square and full rank, in which case sparse SVD and full SVD are the same.) It is well known that SVD is very important, and the best k-rank approximation of any matrix can be calculated by SVD.

Theorem 10. Let A = U σ V^⊤ ∈ R^m×n be the sparse SVD of A; K < rank(A) = ρ

Then,

In other words, the above theorem states that if we are looking for A k-rank approximation of A matrix that minimizes the 2-norm or Frobenius norm of the “error” matrix (that is, the difference between A and its approximation), then we need to preserve the first k singular values of A and the corresponding left and right singular vectors.

We will often use these notations: let U_k ∈ R^m× K (or V_k ∈ R^n× K) represent the matrix of the first K left (or right) singular vectors of matrix A; Let σ _K ∈ R^ K × K represent the diagonal matrix containing the first k singular values of A. In the same way, let U_k, an ∈ R ^ m * (rho) – k (or V_k, an ∈ R ^ n * (rho) – k) said A rho – k at the bottom of A non-zero left (or right) singular vector matrix; Then let σ _K, orthogonal ∈ R^(ρ−k)×(ρ− K) represent the diagonal matrix containing the bottom ρ -K singular values of A. And then,

2.9 Moore – Penrose pseudo inverse

For a non-square matrix, its inverse is undefined. Moore-penrose pseudo-inverse, a very famous method of generalized matrix inversion, has made some progress in this kind of problems. Formally, given A matrix A of order M by n, then the matrix A† is the Moore-Penrose pseudo-inverse of A if it satisfies the following properties:

Given an m×n order matrix A with rank ρ, its sparse singular value decomposition can be expressed as:

Its Moore-Penrose pseudo-inverse A† sparse singular value decomposition can be expressed as:

If A is an n by n full rank matrix, then A† is equal to the inverse of the matrix A. If A is an m×n full rank matrix, then A†A is equal to the identity matrix of order N, and AA† is the projection matrix on the column A of the matrix. If A is A full row rank matrix, then AA† is the m-order identity matrix, and A†A is the projection matrix on the row of the matrix A.

The pseudo-inverse of the product of two matrices has the following important property: for m× P matrix Y1 and p× N matrix Y2, if Rank(Y1)=Rank(Y2), the Rank is equal, [9, Theorem 2.2.3] indicates:

(It is important that we emphasize the condition of rank equality: since the inverse of a product of two matrices is always equivalent to the product of a matrix inverse, this inference is not satisfied for the general Moorn-Penrose pseudo-inverse [9].) Furthermore, the basis space of the Moorn-Penrose pseudo-inverse is related to all actual matrices. Given A matrix A and the Moore-Penrose pseudo-inverse A† of A, the column space of A† can be defined as:

The column space of A† is orthogonal to the null space, and the null space of A† can be defined as:

This article is compiled for machine heart, reprint please contact this public number for authorization.

✄ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Join Heart of the Machine (full-time reporter/intern) : [email protected]

Contribute or seek coverage: [email protected]

Advertising & Business partnerships: [email protected]