reproduced the original address
Maximum likelihood estimation (MLE), commonly understood, is the use of known sample results information, the most likely (maximum probability) to lead to the occurrence of the sample results of the model parameter values!
In other words, maximum likelihood estimation provides a way to evaluate model parameters given observed data, i.e., “model determined, parameters unknown”.
Maybe some of you are about to say it, but it’s a little abstract. Let’s think of it this way, when the model satisfies some distribution, its parameter values I find by using maximum likelihood estimation. For example, the normal distribution has the following formula:
If I get the parameters in my model by maximum likelihood estimationand
So we know the mean and variance of this model and all the other information. That’s true.
In the maximum likelihood estimation, sampling must satisfy an important assumption that all samples are independent and identically distributed.
Let me use two examples to help understand maximum likelihood estimation
But first let’s look at the likelihood functionFrom the following blog:
Detailed explanation of maximum likelihood estimation (MLE), maximum posterior probability estimation (MAP), and the understanding of Bayes formula – CSDN blog,
For this function:There are two inputs: x represents a specific piece of data;
Represents parameters of the model
ifIs known and established,
Is a variable. This function is called a probability function, which describes the probability of different sample points
What is the probability of its occurrence.
ifIs known and established,
It’s a variable, and this function is called the likelihood function, and it describes the occurrence of different model parameters
What’s the probability of this sample point.
It’s a bit like “eat two dishes with one dish”. In fact, such a form we have not encountered before. For example,X to the y. If x is known and certain (for example, x=2), this is
This is an exponential function. If y is known and determined (for example, y=2), this is
This is a quadratic function. The same mathematical form, from the perspective of different variables, can have different names.
Does that make sense? If not, don’t worry, there are specific examples below.
Now I really need to talk about MLE first.
Example a
An example of someone else’s blog.
Suppose you have a jar with black and white balls, the number of which is unknown, and the ratio of which is unknown. We want to know the proportion of white balls to black balls in the pot, but we can’t count all the balls in the pot. Now we can take one ball out of the jar at any time, record the color of the ball, and put the ball back into the jar. This process can be repeated, and we can use the color of the recorded balls to estimate the proportion of black and white balls in the tank. If 70 of the previous 100 repetitions were white balls, what is the most likely proportion of white balls in the tank?
Many of you know the answer right away: 70%. What is the rationale behind it?
Let’s say that the ratio of white marbles in a jar is P, so the ratio of black marbles is 1 minus P. Because after recording the color of each ball, we put it back into the pot and shake it well, the color of each ball follows the same independent distribution.
Here we call the color of the ball drawn once a sampling. In the title in one hundred samples, seventy is a white ball, thirty times for black ball event probability is P (sample results | Model).
If the result of the first abstraction is denoted as x1, the result of the second sampling is denoted as x2…. So the sample is (x1,x2….. The x100). In this way, we can get the following expression:
P (sample results | Model)
= P (x1, x2,… ,x100|Model)
P = P (x1 | Mel) (x2) | M… P(x100|M)
= p^70(1-p)^30.
Ok, so we have the expression for the probability of seeing the sample result. So the parameters of our model, which we’re looking for, are p in this equation.
So how do we figure out what p is?
Different p, as a direct result of p (sample results | Model).
Well, our p actually has an infinite number of distributions. As follows:
So p to the 70(1-p) to the 30 is 7.8 times 10 to the -31.
The distribution of P could also be as follows:
So p to the 70(1-p) to the 30 is 2.95 times 10 to the -27.
So the question is, since there are infinite distributions to choose from, what is the principle that maximum likelihood estimation should follow to choose this distribution?
A: The method adopted is to maximize the possibility of the sample result, i.e., to maximize the value of P ^70(1-p)^30, then we can regard it as the equation of P, take the derivative can be!
So since it’s already happening, why not make this the most likely outcome? This is the core of maximum likelihood estimation.
We try to maximize the probability of observation samples, which is translated into a mathematical problem to make:
P to the 70(1-p) to the 30 is the largest, and that’s too easy, because we only have one p, and we set its derivative to 0, and we get p to be 70%, which is exactly what we thought 70% was. There’s our math in it.
Example 2
Suppose we want to calculate the average annual income of the whole country. First, suppose that the income follows a normal distribution, but the mean and variance of the distribution are unknown. We don’t have the manpower and resources to count the income of everyone in the country. We have over a billion people in this country? Is there nothing to be done then?
No, no, no, no, no, no, no, no, no, no, no, no! For example, we select the population income of a city or a township as our observation sample results. Then the parameters of the normal distribution in the above hypothesis are obtained by maximum likelihood estimation.
With the result of the parameters, we can know the expectation and variance of the normal distribution. That is, we passed a small sample of sampling, in turn know the national annual income of a series of important mathematical indicators amount!
So we know that the core of the maximum likelihood estimation is that in some cases, there are too many samples to get the parameter values of the distribution, you can take a small sample and use the maximum likelihood estimation to get the parameter values of the distribution in the hypothesis.
Hope to help you understand ~
From Maximum likelihood to EM algorithm shallow solution – zouxy09 column – blog channel – CSDN.NET
Maximum likelihood estimation learning – GrowoldWith_your blog – Blog channel – CSDN.NET