Unary linear regression

Algorithm principle

Y = wx + b – Procedure for finding w and b – Finding w and B with minimum mean square error [least square method]/ finding maximum likelihood estimates

Least square method

Find the distance from the point to the line parallel to the Y-axis [error between point and line xi] – linear regression σ XI – mean square error

The line that minimizes the mSE — the line that minimizes the MSE, w and B, is called the least square method

Find the distance of a point perpendicular to the Y-axis – orthogonal regression

Maximum likelihood estimation

Find the distribution with the highest probability of sample occurrence

use

Find the parameter values of the probability distribution

expression

Logarithmic likelihood function

Since maximum likelihood estimators are multiplicative – logarithmically simplified serial terms

Consider maximum likelihood estimates of errors

Y = wx + b + epsilon epsilon – can suppose to mean value of 0 normal distribution (error generally fluctuations in 0, central limit theorem to conform to the normal distribution) ∴ – get a probability density function of epsilon – it replace with y – wx – b, is the normal distribution of y, mu for (wx + b)

Then w and B are calculated according to the previous maximum likelihood estimation method

O w and b

1. Convex function

  1. Find the Hesse matrix
  2. Find out if the matrix is positive definite

Find the four partial derivatives

2. Get the most value

Optimization of convex functions

Supplementary knowledge

  1. Convex functions – similar to ⚪, every two points in a circle are connected to the line inside the circle, not concave functions

A downward concave function in machine learning is convex because it is the process of finding the optimal solution

  1. Gradient – first derivative of a function of several variables – find each first partial derivative

3. Hessian matrix – Second derivatives of functions of several variables

Sum converts to vector – numpy is faster to solve

Multiple linear regression

Multiple features – x becomes vector scaler – vector derivative – gradient – partial derivative of each component

Log-probability regression – classification algorithm

A mapping function based on the linear model can achieve the purpose of classification

Binary linear discriminant analysis

Algorithm principle

After the projection of training samples:

  1. Heterogeneous sample centers as far away as possible
  2. The variance of the same sample should be as small as possible