I began to review the mathematical part, the basis of AI algorithm, mainly from three aspects:

Linear algebra
Probability theory
Differential and integral calculus

The reference content is as follows:

Deep Learning
Github.com/scutan90/De…
Github.com/sladesha/Re…

This is the first article, the content of the linear algebra part, mainly is the comparison of the basic part of the study notes.

Linear algebra

1.1 Vectors and matrices

1.1.1 The relation between scalar, vector, matrix and tensor

Scalar

A scalar represents a single number, unlike most other objects studied in linear algebra (usually arrays of multiple numbers). We use italics for scalars. Scalars are usually given lowercase variable names. For example, when defining a real scalar, we say “let s∈Rs\in Rs∈R denote the slope of a line”.

Vector

A vector represents an ordered set of numbers. From the index in the order, we can determine each individual number. Usually we give vectors lowercase variable names in bold, such as xx. Elements in vectors can be represented in italics with tabs. The first element of vector XXX is X1X_1X1, the second is X2X_2X2, and so on. We also specify the type of element (real, imaginary, etc.) stored in the vector.

A vector is shown below. A vector can be thought of as a point in space, that is, each element can represent coordinates on a different coordinate axis.

x = \left[ \begin{matrix} x_1 \\ x_2 \\ x_3 \\ \cdots \\ x_n \end{matrix} \right]

Matrix

A matrix is a collection of objects with the same characteristics and latitude, represented as a two-dimensional data table. The meaning is that an object is represented as a row in the matrix, a feature is represented as a column in the matrix, and each feature has a numerical value. The matrix is usually given a bold uppercase variable name, such as AAA.

An example representation of a matrix is as follows:

A = \ left [\ begin A_ {matrix} {1, 1} & A_ {1, 2} \ \ A_ & A_ {2, 1} {2} {matrix} \ \ \ \ end right]

Transpose is one of the important operations of a matrix. Its transpose is the mirror image of the diagonal as the axis. The diagonal from the upper left corner to the lower right corner is called the main diagonal and is defined as follows:

(A^T){i,j} = A_{j,i}

An example operation is as follows:

A = \ left [\ begin A_ {matrix} {1, 1} & A_ {1, 2} \ \ A_ & A_ {2, 1} {2} \ \ A_ {3, 1} & A_ {3, 2} {matrix} \ \ end right] = = > A ^ T = \ left [\ begin A_ {matrix} {1, 1} & A_ {2, 1} & A_ {3, 1} \ \ A_} {1, 2 & A_ {2} & A_ {3, 2} {matrix} \ \ \ \ end right]

We went from a 3 by 23 times 23 by 2 matrix to a 2 by 32 by times 32 by 3 matrix.

Tensor

In some cases, we will talk about arrays with coordinates greater than two dimensions. In general, the elements of an array are distributed in a regular grid of dimensional coordinates, which we call tensors. Use AAA to represent the tensor “A”. Tensor in AAA coordinates for (I, j, k) (I, j, k) (I, j, k) of elements as A (I, j, k) A_ {} (I, j, k) A (I, j, k).

The relationship between the four

(From Deep Learning q500 chapter 1 fundamentals of Mathematics)

Scalars are tensors of order 0, vectors are tensors of order 1. For example, a scalar is when you know the length of a stick, but you don’t know where the stick points. A vector is something that not only knows the length of the stick, but also whether the stick points forward or backward. A tensor is something that not only knows the length of the stick, but also whether the stick is pointing forward or backward, but also how much the stick is deflected up/down and left/right.

1.1.2 Difference between tensors and matrices

Algebraically, a matrix is a generalization of vectors. A vector can be regarded as a one-dimensional “table” (that is, the components are arranged in a row in order), and a matrix as a two-dimensional “table” (the components are arranged in vertical and horizontal positions), so the tensor of order NNN is the so-called “table” of dimensions NNN. Tensors are strictly defined in terms of linear mappings.
Geometrically speaking, a matrix is a real geometric quantity, that is, it is something that does not vary with the transformation of the coordinates of the reference system. Vectors also have this property.
Tensors can be expressed as a 3 by 3 matrix.
Scalar numbers and three-dimensional arrays representing vectors can also be treated as 1×1, 1×3 matrices, respectively.

1.1.3 Matrix and vector multiplication result

If the matrix AAA and BBB are multiplied by the Einstein summation convention, the matrix CCC can be expressed as follows: AB = C = = > aik ∗ BKJ cijAB = C = = = > a_ {ik} * b_ = c_ {kj} {the ij} AB = C = = > aik ∗ BKJ = cij

Where, aiKA_ {ik} aiK, bKJB_ {KJ} BKJ, ciJC_ {ij}cij represent the elements of matrix A,B,CA, B,CA, B,C respectively, KKK appears twice, is A Dummy variable to represent the traversal sum of this parameter.

Here’s an example:

A = \ left [\ begin A_ {matrix} {1, 1} & A_ {1, 2} \ \ A_ & A_ {2, 1} {2} {matrix} \ \ \ \ end right] = \ \ B left [\ begin {matrix} B_ & B_ {1, 1} {1, 2} \ \ B_ & B_ {2, 1} {2, 2} {matrix} \ \ \ \ end right] \ \ \ times B = C = \ left [\ begin {matrix} A_ {1, 1} \ times A_ B_ {1, 1} + {1, 2} \ times B_ & A_ {2, 1} {1, 1} \ times B_} {1, 2 + A_} {1, 2 \ times B_ {2} \ \ A_ {2, 1} \ times B_ {1, 1} + A_ {2} \ times B_ & A_ {2, 1} {2, 1} \ times B_ A_ {1, 2} + {2} \ times B_ {2} {matrix} \ \ \ \ end right] = \ left [\ begin C_ {matrix} {1, 1} & C_ {1, 2} \\ C_{2,1} &c_ {2,2} \\ end{matrix} \right]

So matrix multiplication requires that the number of columns of matrix A must be equal to the number of rows of matrix B, that is, if the dimension of matrix A is m×nm\times nm×n, and the dimension of matrix B must be n×pn \times pn× P, The dimension of the c-matrix is m× PM \times PM ×p.

There’s another kind of matrix multiplication, where you multiply matrices by their corresponding elements, and this is called the corresponding element product, or the Hadamard product, and it’s called A ⊙ B

Matrix and vector multiplication can be regarded as a special case of matrix multiplication, for example: the matrix BBB is an n×1n \times 1n×1 matrix.

Matrix products satisfy these laws:

Compliance distribution: A(B+C) = AB + AC
Associative: A(BC) = (AB)C

But it doesn’t obey the commutative law, that AB doesn’t have to equal BA.

The product of matrices satisfies: (AB)T=ATBT (AB)^T =A ^TB^T (AB)T=ATBT

The dot product of two vectors x and y of the same dimension, can be thought of as a matrix product — xTyx^TyxTy. In other words, the steps to compute Ci,jC_{I,j}Ci,j in the matrix product C=ABC=ABC=AB can be thought of as the dot product between the ith row of A and the JTH column of B. After all, each row or column of a matrix is a vector.

The dot product of a vector satisfies the commutative law:

x^Ty = y^Tx

The proof is based mainly on:

The dot product of two vectors is a scalar
The transpose of a scalar is itself

So there are:

x^Ty = (x^Ty)^T = xy^T

1.1.4 Identity matrix and inverse matrix

The identity matrix is defined as follows, denoting the identity matrix by I, and any vector multiplied by the identity matrix will not change, i.e. :

\forall x \in R^n, I_n x = x \tag{1-1-8}

The structure of the identity matrix is very simple, that is, the main diagonal is 1, and the other positions are 0, as shown in the figure I3I_3I3:

\left[ \begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix} \right]

The inverse matrix is denoted as A−1A^{-1}A−1, which satisfies the following conditions:

A^{-1}A=I_n

1.1.5 Linear equations and linear correlation

Now we have a system of linear equations as follows:

Ax = b

Where, A∈Rm×nA\in R^{m\times n}A∈Rm×n is A known matrix, b∈Rmb\in R^ MB ∈Rm is A known vector, and x∈Rnx\in R^nx∈Rn is an unknown vector to be solved.

Here we can extend the above formula by multiplying matrices (x equals an n×1n\times 1n×1 matrix) :

A_ x = {1,} b_1 = = > A_ x_1 + A_ {1, 1} {1, 2} \ cdots x_2 + + A_ x_n = {1, n} b_1 \ \ A_ x = {2, :} b_2 = = > A_ x_1 + {2, 1} A_ {2} \ cdots x_2 + + A_ {2, n} x_n = b_2 \ \ \ cdots \ \ A_ x = {m. :} b_m = = > A_ (m, 1} x_1 + A_ {2} m, x_2 + \ cdots + A_ {m, n} x_n = b_m \ \

After we define the inverse matrix, we can solve it like this:

Ax=b\\ A^{-1}Ax = A^{-1}b\\ I_nx = A^{-1}b \\ x = A^{-1}b

So the key is to see if there is an inverse matrix and find it.

When the inverse matrix A−1A^{-1}A−1 exists, there must be exactly one solution for each vector b.

However, for the system of equations, there may be no solution for some values of vector B, or there may be infinitely many solutions, there may be no more than one solution, but there are finite solutions, for example, x and y are both solutions of the system of equations, then:

z = \alpha x + (1-\alpha)y

Where α\alphaα is any real number, then z is also a solution to the system. This combination is infinite, so there is no finite solution (more than one).

And the key to figuring out whether Ax is equal to b has A solution is to figure out whether b is in the span of the column vectors of A, and this particular span is called the column space of A or the range of A.

A linear combination of vectors is the sum of each vector multiplied by the corresponding scalar coefficients, i.e. ∑ ICIv (I)\sum_i c_i v^{(I)}∑iciv(I)

The spanning subspace of a set of vectors is the set of points that can be reached by a linear combination of the original vectors.

So in order for this to be true, the column space of A to constitute the entire RmR to the mRm space, and if some point in this space is not in the column space of A, then the corresponding b will make this equation solvable. For this to be true, ** must satisfy the inequality n≥mn\ge mn≥m **.

But this inequality is only a necessary condition, not a sufficient condition, for the equation to have a solution for each b. Because there is a case where some column vectors are redundant, like a 2 by 22 times 22 by 2 matrix, if the two column vectors are the same, then the column space of that matrix is the same as the column space of one of its column vectors, and it doesn’t cover the entire R2R^2R2 space.

This redundancy is also known as linear dependence, and a set of vectors is said to be linearly independent if no one of them can be represented as a linear combination of the others.

So, if the column space of a matrix is going to cover the entire RmR^mRm, then the matrix must contain at least one set of m linearly independent vectors, which is necessary and sufficient to have a solution for every b.

In addition, in order for this matrix to be invertible, you have to guarantee that Ax is equal to b has at most one solution for each value of B, and that you have to guarantee that this matrix has at most m column vectors, otherwise the equation has more than one solution.

So the matrix has to be square, so m is equal to n, and all of the column vectors are linearly independent. A square matrix whose columns are linearly independent is said to be singular.

If A is not A square matrix or if A is not A singular square matrix, it might have A solution, but it can’t be solved by inverse matrix.

1.1.6 Norm induction of vectors and matrices

Norm of vectors

Normally, the size of a vector is measured by its norm, which is formally defined as follows:

L_p=\Vert\vec{x}\Vert_p=\sqrt[p]{\sum_{i=1}^{N}|{x_i}|^p}

Here p is greater than or equal to 1p\ge 1p is greater than or equal to 1.

The norm is a function that maps vectors to non-negative numbers, and intuitively, the norm of vector x measures the distance from the origin to x.

A norm is any function that satisfies the following properties:

F (x) = 0 = > x = 0 \ \ f \ le (x + y) f (x) + f (y) (triangle inequality) \ \ \ forall \ alpha \ in R, f \ alpha (x) = | \ alpha | f (x)

Defines a vector as a ⃗ = \ [,6,8-5-10] vec = {a} [- 5, 6, 8, 10] a = [,6,8-5-10]. X ⃗=(x1,x2… ,xN)\vec{x}=(x_1,x_2,… ,x_N)x =(x1,x2,… , xN). Its different norms are solved as follows:

Vector norm: the individual elements of vector sum of the absolute value of the vector a ⃗ \ vec {a} a norm result: 1 x = | | – 5 + 6 + 8 + | | | | | – 10 | = 29.

\Vert\vec{x}\Vert_1=\sum_{i=1}^N\vert{x_i}\vert

The 2-norm of a vector (Euclidean norm) : The sum of squares of each element of a vector and the square root of the 2-norm of the above a⃗\vec{a}a is: X = (5) – (6) 2 + 2 + 2 + (-) 10 (8) 215 x = \ SQRT {(5) ^ 2 + (6) ^ 2 + (8) ^ 2 + (10) ^ 2} 15 x = (5) – (6) 2 + 2 + 2 + (-) 10 (8) 2 to 15.

\Vert\vec{x}\Vert_2=\sqrt{\sum_{i=1}^N{\vert{x_i}\vert}^2}

The smallest of the absolute values of all the elements of a vector: the negative infinity norm of the above vector a⃗\vec{a}a results in: 5.

\Vert\vec{x}\Vert_{-\infty}=\min{|{x_i}|}

The largest of the absolute values of all elements of a vector: the infinity norm of the above vector a⃗\vec{a}a results in: 10.

\Vert\vec{x}\Vert_{+\infty}=\max{|{x_i}|}

The norm of a matrix

Define a matrix.

A = \left[ \begin{matrix} -1 & 2 & -3 \\ 4 & -6 & 6 \\ \end{matrix} \right]

An arbitrary matrix is defined as Am×nA_{m\times n}Am×n, whose element is AIja_ {ij}aij.

The norm of the matrix is defined as

\Vert{A}\Vert_p :=\sup_{x\neq 0}\frac{\Vert{Ax}\Vert_p}{\Vert{x}\Vert_p}

When vectors take different norms, corresponding matrix norms are obtained.

1 norm of matrix (column norm) : sum the absolute values of each column of the matrix, and then take the largest (column and largest) from them. The 1 norm of the above matrix AAA is [5,8,9][5,8,9][5,8,9], and then take the largest final result is: 9.

\Vert A\Vert_1=\max_{1\le j\le n}\sum_{i=1}^m|{a_{ij}}|

2 norm of the matrix: square root of the largest eigenvalue of the matrix ATAA^TAATA, the final result of the 2 norm of the above matrix AAA is: 10.0623.

\Vert A\Vert_2=\sqrt{\lambda_{max}(A^T A)}

Where, λ Max (ATA)\lambda_{Max}(A^T A)λ Max (ATA) is the maximum value of the absolute eigenvalue of ATAA^T AATA.

The infinite norm of a matrix (row norm) : sum the absolute values of each row of the matrix, and then take the largest, (row and maximum), the row norm of the matrix AAA is [6; 16][6; 16][6; 16], and then take the largest final result: 16.

\Vert A\Vert_{\infty}=\max_{1\le i \le m}\sum_{j=1}^n |{a_{ij}}|

The kernel norm of A matrix is the sum of the singular values (SVD factorization) of the matrix. This norm can be used for low-rank representation (because minimizing the kernel norm is equivalent to minimizing the rank of the matrix — low rank). The final result of matrix A above is: 10.9287.
L0 norm of matrix: the number of non-0 elements of matrix, which is usually used to represent sparsity. The smaller L0 norm is, the more 0 elements there are, the sparsity will be. The final result of matrix AAA above is: 6.
L1 norm: the sum of the absolute values of each element in the matrix. It is the optimal convex approximation of the L0 norm, so it can also represent sparsity. The final result of the above matrix AAA is: 22.
F norm of matrix: the most commonly used matrix norm, the sum of squares of each element of the matrix and then the square root, it is usually called the L2 norm of matrix, its advantage is that it is A convex function, can be solved by derivation, easy to calculate, the matrix A final result is: 10.0995.

\Vert A\Vert_F=\sqrt{(\sum_{i=1}^m\sum_{j=1}^n{| a_{ij}|}^2)}

L21 norm of the matrix, matrix for each first as a unit, each column o F norm (can also be considered vector norm), and then will get the results for the L1 norm (can also be considered vector 1 norm), it is easy to see that it is a kind of between L1 and L2 norm of the matrix AAA the end result is: 17.1559.
The p norm of the matrix

\Vert A\Vert_p=\sqrt[p]{(\sum_{i=1}^m\sum_{j=1}^n{| a_{ij}|}^p)}

The dot product of two vectors can be expressed in terms of the norm:

x^Ty =\Vert x \Vert_2 \Vert y \Vert_2 cos\theta

Here theta \theta theta is the Angle between x and y.

1.1.7 Some special matrices and vectors

Diagonal matrix: Has non-zero elements only on the diagonal, zero everywhere else. The identity matrix that was introduced is one of the diagonal matrices;

Symmetric matrix: A matrix whose transpose is equal to itself, i.e., A=ATA =A ^TA=AT.

Unit vector: a vector with a unit norm, i.e. ∥x∥2=1\Vert x \Vert_2 =1∥x∥2=1

Vectors orthogonal: If xTy=0x^Ty=0xTy=0, then vectors x and y are orthogonal to each other. If the vectors are not only orthogonal to each other, but the norm is one, then it’s called orthonormal.

Orthogonal matrix: Row vectors and column vectors are respectively orthonormal square matrices, i.e

A^TA=AA^T=I

That is, there are:

A^{-1}=A^T

So one of the advantages of orthogonal matrices is that the inverse is cheap.

1.1.8 How can I determine if a matrix is positive definite

To determine whether a matrix is positive definite, there are usually the following aspects:

All ordinal hosts are greater than 0;
There is an invertible matrix CCC such that CTCC^TCCTC is equal to that matrix;
The positive inertia exponent is NNN;
The contract is in the identity matrix EEE (i.e.
The main diagonal elements in the standard form are all positive;
The eigenvalues are all positive;
It’s the metric matrix of some basis.

A matrix whose eigenvalues are all non-negative is called a semi-positive definite, while a matrix whose eigenvalues are all negative is called a negative definite.

Positive qualitative use

Positive characterization of Hessian matrix in gradient descent
- If Hessian is positive definite, then the second partial derivative of the function is always greater than 0, and the rate of change of the function is in an increasing state, so as to determine whether there is a local optimal solution
The basic assumptions of kernel function construction in SVM

1.2 Eigenvalues and eigenvectors

1.2.1 Eigenvalue decomposition and eigenvector

Eigenvalues and eigenvectors are given by eigendecomposition, which is one of the most widely used matrix decompositions.

The eigenvalues tell you how important the feature is, and the eigenvectors tell you what the feature is.

If a vector v⃗\vec{v}v is an eigenvector of square matrix AAA, it must be expressed in the following form:

A\nu = \lambda \nu

λ\lambda lambda is the eigenvalue corresponding to the eigenvector v⃗\vec{v}v.

Eigenvalue decomposition is the decomposition of a matrix into the following form:

A=Q\sum Q^{-1}

: QQQ is the matrix of AAA feature vector orthogonal matrix, ∑ \ sum ∑ is a diagonal matrix, each diagonal element is a characteristic value, the inside of the characteristic value is by order of the eigenvectors corresponding to eigenvalue is described the matrix change direction (from major to minor changes). In other words, the information of matrix AAA can be represented by its eigenvalues and eigenvectors.

Not every matrix can be decomposed into eigenvalues and eigenvectors, but every real symmetric matrix can be decomposed into real eigenvectors and real eigenvalues.

1.2.2 Singular value decomposition

In addition to the eigendecomposition, there is another matrix decomposition called singular value decomposition (SVD), which decomposes a matrix into singular values and singular vectors. Through singular value decomposition, the same type of information as eigendecomposition can be obtained. However, singular value decomposition is more widely used. Every real matrix has a singular value decomposition, but not necessarily an eigendecomposition, because it must be square matrix to have an eigendecomposition.

In feature decomposition, we rewrite A as follows:

A = Vdiag(\lambda)V^{-1}

Where, V is a matrix of eigenvectors, λ\lambda lambda is a vector of eigenvalues, and diag(λ)diag(lambda)diag(λ) represents a diagonal matrix with eigenvalues on its diagonal lines.

The form of singular value decomposition is as follows:

A = U D V^T

If A is m×nm\times nm×n, then U is m×mm\times mm×m, D is m×nm\times nm×n, and V is n×nn\times nn×n. Also, the matrices U and V are orthogonal, and D is diagonal, and not necessarily square.

The elements on the D diagonal are the singular values of A, and the columns of U are the left singular vectors, and the columns of V are the right singular vectors.

The singular value decomposition can be explained by using the eigendecomposition related to A. The left singular vector of A is the eigenvector of AATAA^TAAT, and the right singular vector is the eigenvector of ATAA^TAATA. The non-zero singular value of A is the square root of the eigenvalue of AATAA^TAAT. It’s also the square root of the eigenvalue of ATAA^TAATA.

(Math Basics from Deep Learning 500 Questions)

So how do singular values and eigenvalues correspond? If we multiply the transpose of a matrix AAA by AAA and evaluate the eigenvalues of ATAA^TAATA, we have the following form:
$(A^TA)V = \lambda V$
Here VVV is the right singular vector above, plus:
$\sigma_i = \sqrt{\lambda_i}, u_i=\frac{1}{\sigma_i}AV$
Sigma sigma is the singular value and uuu is the left singular vector.

Like the eigenvalues, sigma sigma is arranged from largest to smallest in the matrix sigma \sum, and sigma sigma decreases very rapidly. In many cases, the sum of the first 10% or even 1% of the singular values makes up more than 99% of the total. In other words, we can also use the singular values of the former RRR (RRR is far less than m, nm, nm, n) to approximate the matrix, namely, partial singular value decomposition:

A_{m\times n}\approx U_{m \times r}\sum_{r\times r}V_{r \times n}^T

The product of the three matrices on the right is going to be something close to AAA, and here, the closer RRR is to NNN, the closer the product is to AAA.

Welcome to follow my official account — AI Algorithm Notes, weekly share algorithm learning notes, paper reading notes, or tool tutorials related to github project.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Math Notes – Linear algebra

Linear algebra

1.1 Vectors and matrices

1.1.1 The relation between scalar, vector, matrix and tensor

1.1.2 Difference between tensors and matrices

1.1.3 Matrix and vector multiplication result

1.1.4 Identity matrix and inverse matrix

1.1.5 Linear equations and linear correlation

1.1.6 Norm induction of vectors and matrices

1.1.7 Some special matrices and vectors

1.1.8 How can I determine if a matrix is positive definite

1.2 Eigenvalues and eigenvectors

1.2.1 Eigenvalue decomposition and eigenvector

1.2.2 Singular value decomposition

Math Notes – Linear algebra

Linear algebra

1.1 Vectors and matrices

1.1.1 The relation between scalar, vector, matrix and tensor

1.1.2 Difference between tensors and matrices

1.1.3 Matrix and vector multiplication result

1.1.4 Identity matrix and inverse matrix

1.1.5 Linear equations and linear correlation

1.1.6 Norm induction of vectors and matrices

1.1.7 Some special matrices and vectors

1.1.8 How can I determine if a matrix is positive definite

1.2 Eigenvalues and eigenvectors

1.2.1 Eigenvalue decomposition and eigenvector

1.2.2 Singular value decomposition

Related Posts

Deep Learning Tips (1) : How to save and restore TensorFlow training model

A preliminary study on speech noise reduction — Spectral subtraction

Optimization of LSTM Wind power Forecast based on MATLAB particle swarm Optimization Algorithm