Linearly dependent, generating subspace.
The inverse matrix A has A -1 histogram, and Ax is equal to b has exactly one solution for each vector B. The system, some values of vector B, may have no solution, or may have an infinite number of solutions. X and y are solutions to the system of equations, z=αx+(1-α), α take any real number.
The A column vectors are viewed as different directions from the origin, and determine how many ways to get to vector B. Each element of vector x tells us how far we go in that direction. Xi is how far we go in the direction of the ith vector. Ax = sumixiA:, I. Linear combination. A linear combination of vectors, each times the sum of the corresponding scalar coefficients. Sumiciv ⁽ ⁾ I. The span of a set of vectors is the set of points reached by a linear combination of the original vectors. Determining whether Ax is equal to b has A solution is the same thing as determining whether b is A member of the spanning subspace of A’s column vectors. The column space of A or the range of A. The equation Ax=b has A solution to any vector B ∈ℝ, and the A column space is required to form the whole ℝ algorithm. ℝ Partical m points are not in A column space, corresponding to B, so that the equation has no solution. The matrix A column space is A requirement for the entire ℝ submatrix M matrix, where A has at least m columns and n>=m. Otherwise, the dimension of A column space is less than m.
The redundancy of column vectors is linear dependence. Any one of a group of vectors cannot be expressed as a linear combination of other vectors and are linearly independent (hoses independent). Some vector is a linear combination of some vectors in a set of vectors, and adding this vector to that set of vectors doesn’t increase the span of that set of vectors. A matrix column space covers the entire ℝ submatrix m bar, and the matrix must contain a set of m linearly independent vectors. Ax is equal to b for every vector b that has a solution. The vector set only has m linearly independent column vectors, not at least m. There is no m-dimensional vector set with more than m linearly independent column vectors, and a matrix with more than m column vectors may have more than one linearly independent vector set of size M.
The matrix is invertible, so we want to make sure that Ax is equal to b has at most one solution for each value of B. You want to make sure that your matrix has at most m column vectors. The matrix must be a square, m=n, and all column vectors are linearly independent. A linearly dependent square matrix of column vectors is singular. The matrix is not a square matrix or singular square matrix, and the equation may have a solution, but it cannot be solved by matrix inverse. Inverse times AA dot minus 1 times I. Left inverse and right inverse are equal.
Norm (norm).
Measure the size of the vector. L ⁽ p ⁾ : | | x | | p = (sumi | xi | ⁽ p ⁾) ⁽ 1 / p ⁾. P ∈ ℝ, p > = 1. Norm (L of p normal), vectors map to non-negative functions. The vector x norm measures the distance from the origin to point x. Norm satisfy the properties of f (x) = 0 = > x = 0, f (x + y) < = f (x) + f (y) three solutions of inequality (triangel inequality), ∀ alpha ∈ ℝ f (alpha x) = | alpha | f (x).
P =2, L logarithmic 2-matrix norm is called Euclidean norm. Represents the Euclidean distance from the origin to a certain point of vector x. Simplify the | | x | |, omit the subscript 2. Square L polynomial 2 norm to measure vector size, calculated by dot product x category X. It is more convenient in mathematics and calculation to take the square L to the 2 norm than to take the L to the 2 norm. The derivative of square L of the 2 norm with respect to each element in x depends only on the corresponding element. The derivative with respect to each element of the 2-norm of L is related to the entire vector. The square L generated by the 2 norm grows slowly near the origin.
L generate 1 norm, the redundancy rate is the same at each position, keep simple mathematical form. | | x | | 1 = sumi | xi |. In machine learning problem, the difference between zero and non-zero is important. When an element in x increases from 0 to ∊, the canonical norm of L to 1 also increases ∊. Scaling a vector by alpha does not change the number of non-zero elements in the vector. The norm of L sub 1 is often used as an alternative function to represent the number of non-zero elements.
L generates ∞ norm, Max norm. Said vector has the value elements absolute value significantly, | | x | | ₍ up ₎ = maxi | xi |.
The Frobenius norm measures the size of the matrix. | | A | | F = SQRT (sumi, jA ⁽ 2 ⁾ ₍ I, j ₎).
Two vector dot product with the norm, said ⫟ y = x | | x | | 2 | | y | | 2 cosine theta, theta Angle between x and y.
Special type matrix, vector.
Diagonal matrix, which has non-zero elements only on the main diagonal and zero everywhere else. Diagonal matrix, if and only if for all I! J Di,j is equal to 0. Identity matrix, diagonal elements are all ones.
Diag (v) denotes the diagonal elements given a diagonal square matrix by the elements of vector v. Diagonal matrix multiplication computes efficiently. Compute the multiplication diag(v)x, where each element xi in x is magnified vi times. Diag you x x = v (v). It’s efficient to compute the inverse of diagonal square matrices. The inverse matrix of diagonal square matrices existed if and only if the diagonal elements were non-zero, diag(V) many-1 histogram = Diag ([1/ V1,…,1/ vN] stigmas). General-purpose machine learning algorithms are derived from arbitrary matrices. By limiting the matrix to the object matrix, we obtain a simple and concise algorithm with low computational cost.
Not all diagonal matrices are square matrices. Rectangular matrices can also be diagonal matrices. Non-square object matrices do not have inverse matrices, but have efficient computational multiplication. Rectangular diagonal matrix D, multiplication Dx involves scaling each element of X. D is a thin matrix with zero added at the end after scaling. D is a fat-wide matrix, scaled to remove the last element.
Symmetric matrix: a matrix whose transpose is equal to itself. A = ⫟ A. Two – parameter functions generate elements independent of the order of parameters, symmetric matrices often appear. A is the moment distance metric matrix, Ai,j represents the distance from point I to point j, Ai,j=Aj, I. The distance function is symmetric.
A unit vector has a unit norm vector. 2 x | | | | = 1.
X category Y =0, vectors X and y are orthogonal to each other. Both vectors have a nonzero norm, and the Angle between them is 90 degrees. ℝ n at most n norm nonzero vectors are orthogonal to each other. The vectors are not only orthogonal to each other, but also have a norm of 1 and orthonorma.
Orthogonal matrices, row vectors and column vectors are orthogonal square matrices respectively. A ⫟ AA ⫟ = A = I, A ⁾ ⁽ – 1 = ⫟ A. The inverse calculation cost of orthogonal matrix is small. The row vectors of an orthogonal matrix are not only orthogonal, they’re orthonormal. There’s no terminology for row vectors or column vectors that are orthogonal to each other but not orthonormal matrices.
Reference: Deep Learning
Welcome to recommend machine learning opportunities in Shanghai. My wechat account is Qingxingfengzi
I have a wechat group, welcome to learn deep learning together.