Naive Bayes model and Python implementation

1. Naive Bayesian model

Naive Bayes method is a classification method based on Bayes theorem and independent assumption of feature conditions. In the prediction, y with the maximum posteriori probability corresponding to input X is found as the prediction.

NB model:

Input:

A priori probability distribution: P (Y = ck), k = 1, 2,…, KP \ left (Y = c_ {k} \ right), \ quad k = 1, 2, \ \ cdots, KP (Y = ck), k = 1, 2,…, k condition probability distribution: P (X = X ∣ Y = ck) = P (X = X (1) (1),…, X (n) = X (n) ∣ Y = (ck), k = 1, 2,…, KP \ left (X = X | Y = c_ {k} \ right) = P \ left (X ^ {} (1) = X ^ {(1)}, \ \ cdots, X ^ ^ {(n)} = X | Y = {(n)} c_ {k} \ right), \ quad k = 1, 2, \ \ cdots, KP (X = X ∣ Y = ck) = P (X = X (1) (1),…, X (n) = X (n) ∣ Y = (ck), k = 1, 2,…, k, Input data X dimension is NNN.

Output: Posterior probability of test data

According to posterior = likelihood ∗ prior/normalized posterior = likelihood * prior/normalized posterior = likelihood ∗ prior/normalized posterior = likelihood ∗ prior/normalized: P (Y = ck ∣ X = X) = P (X = X ∣ Y = ck) P (Y = ck) ∑ kP (X = X ∣ Y = ck) P (Y = ck) P \ left (Y = c_ {k} \ | X = X right) = P \ \ frac {left (X = X | Y = c_ {k} \ right) P\left(Y=c_{k}\right)}{\sum_{k} P\left(X=x | Y=c_{k}\right) P \ left (Y = c_ {k} \ right)} P (Y = ck ∣ X = X) = ∑ kP (X = X ∣ Y = ck) P (Y = ck) P (X = X ∣ Y = ck) P (Y = ck)

NB classifier is: Y = f (x) = arg ⁡ Max ⁡ ckP (y = ck) ∏ jP (x (j) = x (j) ∣ y = ck) ∑ kP (y = ck) ∏ jP (x (j) = x (j) ∣ y = ck) y = f (x) =, arg, Max _ {c_ {k}} \frac{P\left(Y=c_{k}\right) \prod_{j} P\left(X^{(j)}=x^{(j)} | Y=c_{k}\right)}{\sum_{k} P\left(Y=c_{k}\right) \prod_{j} P \ left (X ^ {} (j) = X ^ {} (j) | Y = c_ {k} \ right)} Y = f (X) = argmaxck ∑ kP (Y = ck) ∏ jP (X (j) = X (j) ∣ Y = ck) P (Y = ck) ∏ jP (X (j) = X (j) ∣ Y = ck)

Where, the denominator is a normalized factor, which can be ignored. Naive Bayes can be divided into Gaussian naive Bayes, polynomial naive Bayes, Bernoulli naive Bayes and so on.

Parameter estimation of naive Bayes

Naive bayes need to estimate the prior probability P (Y = ck) P \ left (Y = c_ {k} \ right) P (Y = (ck) and conditional probability P (X (j) = X (j) ∣ Y = ck) P \ left (X ^ ^ {} (j) = X | {(j)} Y = c_ {k} \ right) P (X (j) = X (j) ∣ Y = ck). Only the discrete attribute case is considered below.

2.1 Maximum likelihood Method (MLE)

Maximum Likehood Estimation is used to estimate prior probability: P (Y = ck) = ∑ I = 1 ni (yi = ck) N, k = 1, 2,…, KP \ left (Y = c_ {k} \ right) = \ frac {\ sum_ {I = 1} ^ {N} \ I left (y_ {I} = c_ {k} \ right)} {N}. \ quad k = 1, 2, \ \ cdots, KP (Y = ck) = N ∑ I = 1 ni yi = (ck), k = 1, 2,…, k conditional probability:

&P\left(X^{(j)}=a_{jl} | Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{j}=a_{j l}, Y_ {I} = c_ {k} \ right)} {\ sum_ {I = 1} ^ {N} \ I left (y_ {I} = c_ {k} \ right)} \ \ & j = 1, 2, \ \ cdots, N. \ quad l = 1, 2, \ \ cdots, S_ {j}; \quad k=1,2, \cdots, k \end{aligned}$$#### Bayesian estimation of prior probability is: $$P_ {\ lambda} \ left (Y = c_ {k} \ right) = \ frac {\ sum_ {I = 1} ^ {N} \ I left (y_ {I} = c_ {k} \ right) + \ lambda} {N + k \ lambda} $$bayesian estimation of conditional probability are as follows: $$P_{\lambda}\left(X^{(j)}=a_{j l} | Y=c_{k}\right)=\frac{\sum_{l=1}^{N} I\left(x_{i}^{(j)}=a_{j l} , y_ {I} = c_ {k} \ right) + \ lambda} {\ sum_ {I = 1} ^ {N} \ I left (y_ {I} = c_ {k} \ right) + S_ {j} \ lambda} $$Q1: maximum likelihood estimation and bayesian estimation feeling here describe no difference? A1: The absence of Tilaplacian smoothing as a Bayesian estimation in Machine Learning is somewhat questionable. # # # 3 simple bayesian realize # # # # 3.1 gaussian naive bayes implement gaussian naive bayesian prediction for continuous data, the principle of several characteristic satisfy the gaussian distribution, is to assume that the training set for different categories data set corresponding to the different characteristics of gaussian distribution mean and variance, and then calculate the test sample each characteristics belong to the corresponding gaussian probability, Thus, the probability of belonging to a certain category is obtained, and the label with the highest probability is used as the label of this sample. “`python import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_iris import pandas as pd from sklearn.model_selection import train_test_split import math sqrt, exp, pi = np.sqrt, np.exp, np.pi def createData(): iris = load_iris() df = pd.DataFrame(iris.data, columns = iris.feature_names) df[‘label’] = iris.target df.columns = [‘sepal length’, ‘sepal width’, ‘petal length’, ‘petal width’, ‘label’] return df.iloc[:,:-1],df.iloc[:,-1] X,y = createData() Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, y, test_size = 0.5, random_state = 1028) “Def __init__(self): self. Model = None @staticMethod def mean(x): Return sum(x)/float(x) def var(self, x): sum(x)/float(x) def var(self, x): Return sum(pow(x-self.mean(x),2)*1.0/len(x)) def gaussianProba(self, x, mean, var): Return 1/(SQRT (2* PI *var))*exp(-pow(x-mean,2)/(2.0*var)) def summarize(self, data): summarize(self, data): Returns the mean, variance of each feature of the training set. Data = np.array(data) return [self.mean(data),self.var(data)] def fit(self, x, y): ”’ labels = np.unique(y) data = {label:[] for label in labels} for f, label in zip(x, y): data[label].append(f.tolist()) self.model = {label: self.summarize(value) for label, value in data.items()} return data,self.model def calculateProba(self, data): “Calculates the probability of the test set corresponding in each category. ”’ prob = {} data = data.transpose() for label, value in self.model.items(): prob[label] = 1 for i in range(len(data)): prob[label] *= self.gaussianProba(data[i], value[0][i], value[1][i]) return prob def predict(self, data): Label the sample with the highest probability value. ”’ res = [] for label, value in self.calculateProba(data).items(): Res.append (value) res = np.array(res) return np.argmax(res, axis = 0) def score(self, x, y): “” ”’ score = 0 label = self.predict(x) for i in range(len(label)): if label[i] ==y[i]: Score + = 1 return score * 1.0 / len (label) if __name__ = = “__main__ ‘: model = GaussianNaiveBayes() data, m1 = model.fit(Xtrain.values, Ytrain.values) prob = model.calculateProba(Xtest.values) label = model.predict(Xtest.values) score = Model. Score (xtest. values, ytest. values) print(‘ Accuary ‘, Score) ‘Accuary 0.946666666667’ #### 3.2 MultinomialNB Multinomial naive Bayes can be used in the classification of discrete data. Call the SkLearn API: “`python import numpy as np rng = np.random.RandomState(1) X = rng.randint(5, size=(6, 100)) y = np.array([1, 2, 3, 4, 5, 6]) from sklearn.naive_bayes import MultinomialNB clf = MultinomialNB() clf.fit(X, y) print(clf.predict(X[2:3])) “` ### 4. Extension: The difference between maximum likelihood estimation and Bayesian estimation Bayesian estimation assumes that the model parameters obey a certain distribution **, and then estimates the model parameter distribution. This is very different from the maximum likelihood method (which assumes that the parameter is a value and only estimates the parameter, falling within the range of point estimates). There are two kinds of Bayesian estimation applications: the estimation of discrete data and the estimation of continuous data. In fact, there is little difference between the estimation of discrete data and the estimation of continuous data, but the parameter distribution assumption is different. Reference: – 1. [dead simple bayesian explanation and code] (HTTP: / / https://github.com/endymecy/spark-ml-source-analysis/blob/master/%E5%88%86%E7%B1%BB%E5%92%8C%E5%9B%9E%E BD 5% % 92 9 B4 c % % % % E6 / E7 B4 E8 A0 B4% % % % % % 9 d E5 B6 8 f % % % E6%96% AF/nb. The md); 2. [CSDN. About simple bayesian method] (http://blog.csdn.net/u012162613/article/details/48323777). 3. [k? : n Making code] (HTTP: / / https://github.com/fengdu78/lihang-code/blob/master/code/%E7%AC%AC4%E7%AB%A0%20%E6%9C%B4%E7%B4%A0%E8%B4%9D%E5% B6 8 f % % E6%96% AF (NaiveBayes)/GaussianNB ipynb); 4. [wepon great god blog] (https://blog.csdn.net/u012162613/article/details/48323777); 5. [Utdallas.edu] the Bayes courseware (http://www.utdallas.edu/~nrr150130/cs7301/2016fa/lects/Lecture_14_Bayes.pdf). 6. [Bayes MLE MAP distinction CMU](http://www.cs.cmu.edu/~aarti/Class/10701_Spring14/slides/MLE_MAP_Part1.pdf); 7. [MLE explanation in English] (https://newonlinecourses.science.psu.edu/stat414/node/191/); 8. [sklearn Naive Bayes](https://scikit-learn.org/stable/modules/naive_bayes.html);

Naive Bayes model and Python implementation

1. Naive Bayesian model

Parameter estimation of naive Bayes

2.1 Maximum likelihood Method (MLE)

Related Posts

Handwritten digit recognition Based on Matlab Fisher classification handwritten digit recognition

Deep learning in Python using convolutional neural networks to process sequences

Google cloud platform launches Japanese version of TensorFlow machine learning course, AI project in Japan