An uncertain world

We live in a world of uncertainty, in which most events are uncertain except for a few. We need probability to describe these uncertain events. At present, probability theory has penetrated into various disciplines, and it can be said that it is a very important part of human knowledge system. Probability theory is scientific, scientific theory also needs probability theory to support.

About the probability

If someone were to tell you that the sun will rise in the east tomorrow, you’d think it was certain, because it’s going to be that way for the foreseeable future, and we’d say there’s a 100% chance of that happening. But if an insurance salesman recommends insurance to you, your probabilistic mind may soon be tempted to use theory to figure out which product is better. From simple dice to complex macro weather forecast, stock market economic forecast, micro quantum mechanics and so on all need probability to describe.

A game of probability

The earliest origins of probability theory can be traced back to B.C. Egyptians, who, like modern humans, began to play games with dice. Different from modern people, they play dice because there was a severe famine and they play dice so they can forget about hunger, while modern people play dice because they are too full and idle.

By the 17th century, European aristocrats were gambling, using random games to gamble. Some of them started thinking about random games, which is more likely? Until 1654, the correspondence between Fermat and PASCAL on the “split bet problem” is generally regarded as the birth of probability theory, they and Huygens are known as the early founders of probability theory.

It can be said that the origin of probability theory is gambling and games, and then began to jump out of gambling games and development.

Later development of probability theory

The subsequent development of probability theory has little to do with gambling, but is mainly promoted by the development of science and technology. At the same time, there are many probability phenomena in social life, which also promote the development of probability theory. Now there are many disciplines closely related to probability theory, such as physics, economics, computer science, natural science, social science, informatics, communication engineering, biology, meteorology and so on.

Probability theory of AI

Artificial intelligence involves many aspects of mathematics, including linear algebra, calculus, probability theory, mathematical statistics, optimization theory and so on. It can be said that current AI is the product of the integration of mathematics and computer science. It builds models through mathematical methods and drives models through data and computing power, so that machines can have the ability to understand the objective world. At present, the mainstream of AI is machine learning and deep learning, both of which involve probability theory.

Probability calculation in deep learning

In the output of deep neural network, a likelihood function is used to represent the probability of each classification, and Softmax is generally used. Softmax can normalize the probability, and each cell in the output layer represents the probability value of each classification, and the total is 1, and the maximum probability classification is used as the prediction classification. The neural network will contain the input layer and the hidden layer. Finally, the probability under different conditions can be obtained by softMax analyser. The following figure is divided into three categories, and the probability value of Y =0, Y =1 and Y =2 will be obtained.

Stochastic mechanisms of deep learning

In addition to the output layer softmax probability calculation mentioned earlier, there are weight random initialization and dropout mechanisms in deep learning that involve random problems. It can be said that parameter initialization is very important in the training process of deep neural network, which may affect the problem that the model cannot be trained or the training time is longer. For example, if all the weights are initialized to 0, the gradient descent failure will be caused. Normal distribution is generally used to randomly initialize the weights. In addition, the dropout mechanism is used to prevent overfitting problems during training, which randomly disables neurons to avoid overfitting.

Exploratory analysis and pretreatment

Exploratory data analysis and data preprocessing will involve a lot of probabilistic statistical methods, such as the simplest frequency, mode, mean, median, deviation, variance, covariance, correlation coefficient, etc. While data preprocessing may be normalized, may also modify the distribution of samples and other operations.

Bayesian probability graph model

Bayesian probabilistic graph modeling is another very dynamic branch of AI, and although it receives less attention than deep learning, it is an important direction for AI. Simply speaking, probability graph model is a direction combining probability theory and graph theory, in which observation node represents observation data, implicit node represents knowledge, and edge represents the relationship between knowledge and data. Probability distribution is obtained according to graph structure to solve problems. Common probability graph models include the simplest naive Bayes, maximum entropy model, hidden Markov model and conditional random field.

The common distribution

Understanding common distributions will be very helpful when building an optimized AI model. Here are the common distributions.

Bernoulli distribution, which is a simple distribution, has only two possible outcomes, like flipping a coin. The probability of heads is P, and the probability of tails is 1 minus P.

Binomial distribution, Bernoulli distribution is a special case of the binomial distribution, Bernoulli distribution when you flip a coin per experiment. For each experiment with more than one consecutive flip of a coin, the probability of heads is still P, and the probability of tails is 1-P. For example, for each experiment with four flips of a coin, the probability distribution of 0 heads, 1 heads, 2 heads, 3 heads, 4 heads,

The polynomial distribution is a generalization of the binomial distribution, where there are only two possible outcomes, whereas the polynomial distribution has multiple possible outcomes. For example, if you roll a die, it has 6 faces, and the probability of each face is 1/6, and all the probabilities add up to 1. If you roll two dice for each experiment, the probability distribution is that there is no success, only one success, and both successes.

Normal distribution is a very common continuous random function, also known as gaussian function, which can be defined by two simple measures of mean and variance. The normal curve is bell-shaped, low at both ends and high in the middle. It is symmetric and is often called bell-shaped because of its bell-shaped curve.

Author profile: Seaboat, good at artificial intelligence, computer science, mathematical principles, basic algorithms. Books: Anatomy of Tomcat kernel Design, Illustrated Data Structure and Algorithm, Popular Science of Artificial Intelligence Principles.

Support the author and buy the author’s book!





Focus on artificial Intelligence, reading and feeling, talk about mathematics, computer Science, distributed, machine learning, deep learning, natural language processing, Algorithms and Data Structures, Java depth, Tomcat kernel, etc.