Republishes announcement

Several articles that Xiao Xi will talk about later will be based on the knowledge of this article, but the discussion of Naive Bayes in the previous article is not in-depth enough, and it is not worth writing an additional naive Bayes, so this article reprinted the previous article “Naive Bayes”. Compared with the previous edition, the new edition has been significantly updated on the basics, with some deeper discussions and conclusions, and has been reformatted.

Naive Bayes classifier is arguably the most classical statistics-based machine learning model. First of all, regardless of what Bayes means, the name naive seems to mean something in a classifier.

I found that the English version of this classifier is “Naive Bayes”. “Naive” means “Naive” and “Bayes”. So what do you mean by Naive/Naive? It really stands for simplifying things, like a child, and not thinking about complicated things.

Naive

The phrase “Naive” means “the assumption of trait independence”. In detail, the independence hypothesis here is generally referred to as “conditional independence hypothesis”, but when dealing with sequence problems (such as text classification, speech recognition), it is often used “position independence hypothesis”, what is the meaning respectively?

Conditional independence hypothesis {

If we want to identify a person’s gender, we use two characteristics: height and weight. So category Y here is male/female, and feature X=[x1= height x2= weight].

We know that “height” and “weight” are clearly related, for example, a person who is 1.8 meters tall is unlikely to weigh less than 100 jin, but in the eyes of naive Bayes classifier, height and weight are not related. Let x1= height 180cm, x2= weight 50kg, then:

meansThe probability that a person is 180cm tall and weighs 50kg is equal to the probability that a person is 180cm times the probability that a person is 50kg. Although the probability of a person being 180cm is high (for example, a boy) and the probability of a person being 50kg is also high (for example, a girl), the probability of a person being 180cm tall and weighing 50kg is very small. However, under the assumption of Bayes’ conditional independence, x1 and X2 are independent of each other, so they are directly transformedandThese two probabilities are multiplied, so the probability is definitely much higher than the actual probability.

In conclusion, naive Bayes model assumes that each dimension of feature vector is independent (unrelated). The conditional independence hypothesis.

}

Position independence hypothesis {

The position independence hypothesis is not usually mentioned, but it must be introduced if the naive Bayes model is used to solve the classification problem of serialization.

Position independence means that the position information of the feature vectors of each position in the sequence is completely ignored. For chestnuts, such as in text mining, “I like | | dog” has three features in the vector“Vectors” I “, “like” and “dogs”. If we consider these three characteristics in order of precedence, we can get the fact that you like dogs. But if you follow the order of “dog”, “likes” and “me”, you get a completely different meaning. Obviously, the order (i.e. position) of each feature vector is very important for semantically related classification tasks. However, naive Bayes assumes that the positions are independent, that is, the position information of the sequence is completely discarded. So in simple bayesian view, “I like | | dog” and “dog like | | I” is the same classification task.

}

 

Ok, so we know what naive means, so the core is Bayes.

Bayes

Obviously, the most relevant thing to Bayes in statistical theory is Bayes’ theorem, also known as Bayes’ formula. Regardless of whether you can understand it or not, post the formula in general form:

 

Let’s think of the event A in the formula as A sample characteristic of some value, which is represented by X. The category that regards B as a classification target is some value, which is represented by y. Then you’ll find it very, very simple, as follows:

So, what this formula means is:

The left side of the formula:Given that the value of the sample features is X, the probability that the target category is Y (i.eThe technical term isA posteriori probability) is equal to

The right side of the formula:Knowing nothing, the probability that the target category is y (i.eThe technical term is category Y Prior probability) multiplied by the Given that the target category is Y, the probability that the value of the feature is X (i.eThe technical term isLikelihood function). Divided by the probability that the value of the feature is X without knowing anything, which is technically known as the prior probability of feature X, and also known as evidence).

 

Aye? Did careful readers notice anything? I believe someone must have been excited by this time! Let’s take a look at a chestnut here to introduce a deeper discussion.

 

That’s the chestnut.

 

 

Actually the chestnuts below (∇).

 

If Xiao Xi catches a batch of fish, this batch of fish only black fish and salmon. Although Xiao Xi does not know these two fish, but xiao Xi has equipment to measure the brightness level of each fish belly (for example, the whitest is level 10, the darkest is level 1). Then a kind-hearted fan gave Her a batch of classified snakefish and salmon. So, with the help of what is already known above, the naive Bayesian classifier is used to label the categories of fish caught by the small evening, so as to sort out salmon and snakehead fish, how to do?

Pick the fish

Aye? Doesn’t it say that the brightness level of the fish’s belly can be measured? The brightness level of the fish’s belly is a feature, and the brightness level measured by each fish is the value of the feature, which is X. Snakehead fish and salmon are the targets of our classification, c0 and C1. Did you have an Epiphany?

Right! Remember bayes’ theorem on the left-hand sideDo you mean? Suppose a fish has a brightness rating of 2, so we just need to calculate and compare  withThe size is not ok! It must be the higher value, that is, the higher probability, is the category we want to output ah! The technical term is The lionA posteriori probability.

 

So how do we calculate that? Obviously using the three blobs on the right-hand side. Post it again here for easy reading.

 

 

First of all, of the three globs on the right, divide the bottom oneIt’s the probability that the feature takes a certain value, whereas we’re trying to predict the class of a particular fish, and obviously the value of the feature of the fish we already know, the fixed value, so we’re not going to solve for that  All right, please   All right,  It’s the same value, and it doesn’t help us compare these two probabilities.So let’s just forget about it.

And then, one of the three lumps  Represents a certain class of priorsThe probability ofHow do you calculate that? Remember when the fans gave Yuki a bunch of fish? So let’s just approximate it with this bunch of fish  No!

According to the law of large numbers in probability theory, when there are enough samples, the statistical ratio of the samples can approximate the true probability. Recall that there will be nearly 5,000 heads in 10,000 fair tosses, giving you a probability of 0.5 heads

Therefore, if the fans gave Xiao Xi10000A fish, among them3000 One is a snakehead,7000One was salmon, obviously  By the same token,  . Look,  There you go.

 

The last of the three glots,  How do you get that? It’s easy, too, using fans10000A fish, small evening with equipment to this10000 Once the fish’s brightness levels are measured, all we need isFrom every class of fishLet’s take the statistics of XThe number of fish under each valueAccount for The categoryThe total number of fishThe ratio is fine.

Snakehead fish, for example3000Bar, where the brightness level is8 Of all the fish1000Article,  . And you can do the same thing  The value of it.

 

So now that we’ve solved the right-hand side, we can compare the left-hand side as well. So in the case of the following (fans gave Koshida 100 or so fish to train the classifier) :

 

 

The naive Bayes classifier made by Koshida will definitely reduce the brightness level to less than Were considered salmon (in which case it is always more likely to be classified as salmon than snakehead) and snakehead when it is not.

Wait, there’s a problem. We know,The point iswithEqual points. But when naive Bayes computed these two values, did they really compute these two values?

What is it?

Remember, when we calculated the left-hand side of the equation, we ignored the left-hand side of the equationThis one! Let’s move the formula over:

That is, when a Bayesian classifier calculates a posterior probability for each category, it doesn’t actually calculate a posterior probability! Because we’re only countingSo the result is actually zero!!!!!!!!!

whileWhat is it? For those of you who have a background in probability theory, this is the joint probability of y and X, which is equal toThat’s the probability of X and y happening together.

Say so,Although the core of naive Bayes classifier is Bayes formula, when it calculates the possibility of various classes of a sample, it actually calculates not the posterior probability of various classes, but the joint probability of various classes Y and the feature X of the sample!

What does this result do? It will be useful in the future, and it will be very important.

Wait, one more question, so far, we haven’t used the conditional independence hypothesis that we wrote at the beginning of this article? What does this hypothesis do?

Multidimensional characteristics

Of course, this assumption essentially means ignoring the correlation between the dimensions of X, so it comes in handy when X has multidimensional features.

For example, Xiao Xi bought a ruler to measure the length of the fish.

 

The characteristics ofX=[x1(brightness) x2(body length)]. The only effect is thatI’m going to calculate the right-hand side of the equation, according to the independence hypothesis Expands into  That’s fine. See,naïveSome can avoid a lot of trouble.