The information entropy
Entropy has been mentioned in many science fiction movies. I first came into contact with the concept of entropy when I was learning decision trees. But entropy at the time was still something of a veil, whether we understood it or not. Today, after about a year, I have watched more than a dozen videos about entropy at home and abroad in order to better understand what entropy is. Therefore, I dare to talk about entropy again today.
Entropy is a very popular concept right now, and you’ve seen it more or less as a way of describing how chaotic a system is, but that’s the thermodynamic definition of entropy.
Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
Today’s talk about entropy should belong to the information theory of entropy, not the physical sense of entropy, exactly is information entropy.
What is information entropy
Entropy, as you all know, is both a thermodynamic concept and an information theory concept. Let’s not talk about entropy, but let’s talk about what information is, because information is a special case of information entropy, and we all go from simple to general.
The amount of information
Information is a measure of information, just like time is a measure of seconds, when we consider a discrete random variable X, how much information do we receive when we observe a specific value of that variable?
For example, if we hear the sun rising in the east, that information is of little value to us, because it’s a certain event, so the amount of information that that information delivers to us is zero
We got the offer notice from the big factory. If there was no warning before, this amount of information would be very big for us. Next up
For example, in a coin toss game where there are two possibilities, heads and tails, we can use H for heads and the letter T for tails. We tell people who are not present about the outcome of the game by sending an H or a T message to other people, and of course we can send that message with a 0 for tails and a 1 for heads.
If you know about computers, it’s 1 bit to represent 2 possibilities, so we can use 1 bit to describe the outcome of a coin toss game and pass it on to someone else. Information helps us to eliminate the uncertainty about which of the two scenarios is present.
Let’s play a game where we can set up a barrier between two people, and this barrier can only transmit 0 or 1 electrical signal. There are two people at opposite ends of the barrier, let’s call them A and B, A picks A random letter from A,B,C,D, and then extracts it by A combination of zeros and ones and the message goes through the barrier and tells B, You need two bits to pass information to for example 00 for A 01 for B 10 for C 11 for D. That’s two bits of information.
The logarithm base 2 of all equally likely events above is exactly the number of signals we need to transmit the outcome of this event.
The more equally possible events a system has, the larger the number of bits needed to describe the event. All of the equally probable events assumed here can be represented by a series of twos. So how should we represent the amount of information about random events that result in 10 possible probabilities? We can use 4 bits of information to represent 10 possibilities, but 6 of them are wasted here. Of course we can use log210≈3.33\log_2 10 \approx 3.33log210≈3.33 bits to convey the information. From the above example, we can derive the following formula, which is the information quantity formula.
The amount of information describes the fact that the more moderately possible events in a system, the greater the amount of information to transmit one of them. The more equally possible events there are, the more uncertainty there is about which event will happen.
Here, the number of Omega \Omega system microstates corresponds to the number of equal probability events of the system, so the amount of information measuring uncertainty can be called the amount of information. But so far we have studied systems that are made up of equally likely events, which can be regarded as ideal systems, and in most cases in practice, systems are made up of events with different probabilities.
The information entropy
We can actually convert a probabilistic event to the probability that something happens in a system of equally probabilistic events. So we can always convert one probability event into N equally likely events, such as the probability of picking a ball out of N balls. For example, the probability of winning the lottery is 1/2kW, that is, the probability of winning the lottery ball from 2kW, so 1 divided by the probability value P can be deduced backwards to get the number of equally possible probabilities. So with that in mind we can derive a general formula for information entropy from the information theory formula.
The equal ratio probability can be expressed as
So let’s say we’re flipping a coin and let’s say we’re flipping a coin with probability P of heads and probability Q of tails, and let’s convert these two probabilities into equal probability systems, equal probability systems of 1/ P and 1/ Q, respectively, and then we can easily calculate these two equal probability systems as follows
And then the sum of the two equally likely systems multiplied by their probability table is the expected amount of information in the current system, which is our formula for information entropy.
Information entropy formula
Where capital X is a random event, and the probability of the X event being xix_ixi is p(xi)p(x_i)p(xi)
Meaning of information entropy
Just as the speed of light is the speed limit of the physical world, entropy of information indicates the limit of communication. Or the limit of the compression efficiency of all data. All attempts to breach this limit are futile. For the existence of a variety of possible information, the data can be encoded to compress for transmission and storage.
conclusion
Today we know what information is and what information entropy is. The amount of information measures the amount of information brought by the occurrence of a specific event, while the entropy is the expectation of the amount of information that could be generated before the result comes out — considering all possible values of the random variable, that is, the expectation of the amount of information brought by all possible events.