The principle of

On the assumption that features are independent from each other, the probability of the occurrence of each category is solved for the given data to be classified under the condition of the occurrence of the data, and then the corresponding category of the maximum conditional probability is found as the final classification of the data.

All we have to do is figure out the conditional probabilities for all the categories, and figure out which category the largest conditional probability corresponds to, and that's naive Bayes' prediction.Copy the code

advantages

<1> originated from classical mathematical theory and has stable classification effect. <2> Performs well on small-scale data and can handle multi-classification tasks. It is suitable for incremental training, especially for batch incremental training when the amount of data exceeds the memory. <3> Is not sensitive to missing data, and the algorithm is relatively simple, which is often used for text classification.Copy the code

disadvantages

<1> Naive Bayes model assumes that features are independent of each other, which is often difficult to be established in practical application. When the number of features is large or the correlation between features is large, the classification effect is not good. <2> Because we determine the posterior probability through prior and data to determine classification, classification decision has a certain error rate <3> is sensitive to the expression form of input data. For example, if our data has two features A and B, and the first feature is A, A, B, B, if we use the form D, E, F, G for the second feature, then it's actually good that the data correlation is not that strong. So if you sort out the representation of the data, for example d and E, and it turns out that it's actually very close, and we're in the same category as DE, and f and G are in the same category as FG, then the second feature is de,de, FG, FG, so it's strongly related to the first feature, not independent. That is, the way features are expressed is very sensitive, because it has an effect on the assumption of independence.Copy the code

There are three algorithms for naive Bayes model

< 1 > GaussianNB:

The applicable conditions are as follows: If most of the sample feature distributions are continuous values, GaussianNB is better: Class sklear.naive_bayes.GaussianNB(Priors =None) Priors: indicates the prior probability. If no value is given, the model calculates the priors based on the sample data (using maximum likelihood method). Object class_prior_: probability of each sample class_count: number of samples in each category TheTA_ : mean of each feature in each category SigMA_ : variance of each feature in each categoryCopy the code

<2>MultinomialNB

MultinomialNB is the appropriate syntax if most of the sample characteristics are 'MultinomialNB' : MultinomialNB(alpha=1.0, fit_prior=True, class_prior=None) Alpha: The prior smoothing factor, which equals 1 by default and Laplace smoothing when equal to 1. Fit_prior: indicates whether to learn the prior probability of each class. The default value is True. Class_prior: indicates the prior probability of each class. Object: class_log_prior_: intercept_ is a linear model corresponding to naive Bayes. Its value is the same as class_log_prior_. Feature_log_prob_ : the logarithmic probability (conditional probability) of a given feature category. Conditional probability of a feature = (number of occurrences of a specific feature in a specific class +alpha)/(Sum of all occurrences of a specific feature in a specific class + number of possible values of a class *alpha) COEF_ : a linear model corresponding to naive Bayes with the same value as Feature_log_PROb class_count_: Feature_count_ : The number of occurrences of each feature in each categoryCopy the code

< 3 > BernoulliNB:

BernoulliNB syntax should be used if the sample features binary discrete values or very sparse multivariate discrete values: Class sklear.naive_bayes.BernoulliNB(alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None) Alpha: Smoothing factor, consistent with alpha in polynomials. Binarize: threshold for binarization of sample features. The default value is 0. If no input is entered, the model assumes that all features are already binarized; If a specific value is entered, the model classifies anything greater than that value into one category and anything less than that into another. Fit_prior: indicates whether to learn the prior probability of each class. The default value is True. Class_prior: indicates the prior probability of each class. Object: class_log_prior_: the prior logarithmic probability of each class smoothed. Feature_log_prob_ : Empirical logarithmic probability for a given feature category. Class_count_ : The number of each sample in the fitting process. Feature_count_ : The number of each feature in the fitting process.Copy the code