“This is the 13th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021”

Playing tennis by weather is a classic example of making a decision about whether to play tennis based on weather conditions. In fact, it is not difficult for us to make a decision about whether to play tennis based on weather conditions, but if we want to make this decision process for a machine to learn, it is not so easy. First of all, whether to play tennis or not is a random event, may or may not go, but in the past not to play tennis in addition to subjective will, but also look at the weather such objective conditions, in good weather to play tennis probability is relatively large.

A priori

Prior probability refers to the probability obtained based on previous experience and analysis, which reflects the expectation of a certain state before actual observation.

  • For example, the probability of snow in winter in the south is much less than that in the north
  • For example, when older people have headaches, they may be suspected of suffering from hypertension

So back to the weather to play tennis this matter, for the northern winter because of more rain and snow weather, and the outdoor wind is relatively strong, so it is generally believed that the probability of playing tennis is relatively small. This is the prior probability.


i = 1 P ( y i ) = 1 \sum_{i=1} P(y_i) = 1

However, prior has a limitation, which reflects its accuracy and flexibility. For example, for a random realization, prior probability is always unchanged, such as the prediction that there will be almost no snow as long as it is in the south. And if the prior probability is uniform, then the rule effect is not good.

The feature space

In the prior, there is relatively little information, so we hope to obtain more information to update the prediction. So we introduce features to get more information by observing data features, which help us update our priors.

A posteriori probability

The a posteriori probability, that is, in a given observation vector x, a particular category of probability p (y ∣ x) p (y | x) p (y ∣ x)

Bayes’ theorem


p ( y x ) = p ( x y ) p ( y ) p ( x ) p(y|x) = \frac{p(x|y)p(y)}{p(x)}

Maximum posterior probability

If the decision is made based on the maximum posteriori probability, the category of the maximum posteriori probability is the predicted result


y ^ = Arg Max i P ( y i x ) \hat{y} = \argmax_i P(y_i|x)

i f P ( y 0 x ) < P ( y 1 x ) y ^ = y 1 i f P ( y 0 x ) > P ( y 2 x ) y ^ = y 0 if\, P(y_0|x) < P(y_1|x) \, \hat{y} = y_1\\ if\, P(y_0|x) > P(y_2|x) \, \hat{y} = y_0

Here y1y_1y1 representative to play tennis, and y0y_0y0 said don’t go to play tennis, to calculate the posterior probability P (y0 ∣ x) < P (y1 ∣ x) P (y_0 | x) < P (y_1 | x) P (y0 ∣ x) < P (y1 ∣ x) so that probability is the probability to play tennis.

The risk assessment

In fact, every time we are give in and out every possible (category), a probability about whether to play tennis, for example, it can calculate the two probability P (y1 ∣ x) P (y_1 | x) P (y1 ∣ x) easy to assume y1y_1y1 here is to play tennis, If the decision for y1y_1y1 P (err ∣ x) = P (y0 ∣ x) P (err | x) = P (y_0 | x) P (err ∣ x) = P (y0 ∣ x) or error probability for P (y1 ∣ x) P (y_1 | x) P (y1 ∣ x)


P ( e r r x ) = min ( [ P ( y 0 x ) . P ( y 1 x ) ] ) P(err|x) =\min([P(y_0|x),P(y_1|x)])

For example, the total probability formula is often regarded as the probability of the occurrence of “cause” in the problem of “seeking results from causes”. In Bayesian statistical inference, the prior probability distribution of an uncertain quantity is a probability distribution that expresses the degree of confidence in this quantity before considering some factors. . Unknown quantities can be parameters of the model or potential variables.