Unary linear regression
Algorithm principle
Y = wx + b – Procedure for finding w and b – Finding w and B with minimum mean square error [least square method]/ finding maximum likelihood estimates
Least square method
Find the distance from the point to the line parallel to the Y-axis [error between point and line xi] – linear regression σ XI – mean square error
The line that minimizes the mSE — the line that minimizes the MSE, w and B, is called the least square method
Find the distance of a point perpendicular to the Y-axis – orthogonal regression
Maximum likelihood estimation
Find the distribution with the highest probability of sample occurrence
use
Find the parameter values of the probability distribution
expression
Logarithmic likelihood function
Since maximum likelihood estimators are multiplicative – logarithmically simplified serial terms
Consider maximum likelihood estimates of errors
Y = wx + b + epsilon epsilon – can suppose to mean value of 0 normal distribution (error generally fluctuations in 0, central limit theorem to conform to the normal distribution) ∴ – get a probability density function of epsilon – it replace with y – wx – b, is the normal distribution of y, mu for (wx + b)
Then w and B are calculated according to the previous maximum likelihood estimation method
O w and b
1. Convex function
- Find the Hesse matrix
- Find out if the matrix is positive definite
Find the four partial derivatives
2. Get the most value
Optimization of convex functions
Supplementary knowledge
- Convex functions – similar to ⚪, every two points in a circle are connected to the line inside the circle, not concave functions
A downward concave function in machine learning is convex because it is the process of finding the optimal solution
- Gradient – first derivative of a function of several variables – find each first partial derivative
3. Hessian matrix – Second derivatives of functions of several variables
Sum converts to vector – numpy is faster to solve
Multiple linear regression
Multiple features – x becomes vector scaler – vector derivative – gradient – partial derivative of each component
Log-probability regression – classification algorithm
A mapping function based on the linear model can achieve the purpose of classification
Binary linear discriminant analysis
Algorithm principle
After the projection of training samples:
- Heterogeneous sample centers as far away as possible
- The variance of the same sample should be as small as possible