Full code: github.com/xjwhhh/Lear… Welcome Follow and Star
Naive Bayes method is a classification method based on Bayes’ theorem and the independent assumption of feature conditions.
For a given training data set, the joint probability distribution of input/output is first learned based on the assumption of feature condition independence. Then based on this model, for a given input x, the output y with the highest posteriori probability is obtained by using Bayes’ theorem.
Naive Bayes method is a common method with simple implementation and high efficiency in learning and prediction
The following is the naive Bayes algorithm:
Specific explanation and proof can see “statistical learning methods” or other blog posts, here is not repeated
Python code implementation, using MINST data set, in order to avoid the probability value of 0, using Bayesian estimation:
import cv2 import time import logging import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score def log(func): def wrapper(*args, **kwargs): start_time = time.time() logging.debug('start %s()' % func.__name__) ret = func(*args, **kwargs) end_time = time.time() logging.debug('end %s(), cost %s seconds' % (func.__name__, End_time - start_time)) return ret return wrapper # The length of conditional_probability corresponds to the last dimension of conditional_probability 2 def binaryzation(img): cv_img = img.astype(np.uint8) cv2.threshold(cv_img, 50, 1, cv2.THRESH_BINARY_INV, cv_img) return cv_img @log def train(train_set, train_labels): Class_num = len(set(train_labels)) Feature_num = len(train_set[0]) prior_probability = Np.zeros (class_num) # prior probability conditional_probability = np.zeros((class_num, feature_num, Print (conditional_probability. Shape) for I in range(len(train_labels)): Img = binaryzation(train_set[I]) # image binarization label = train_labels[I] prior_probability[label] += 1 for j in range(feature_num): Conditional_probability [label][j][img[j]] += 1 # Therefore, both prior probability and conditional probability need not be divided by the denominator prior_probability += 1 for label in set(train_labels): for j in range(feature_num): conditional_probability[label][j][0] += 1 conditional_probability[label][j][0] /= (len(train_labels[train_labels == label]) + 2 * 1) conditional_probability[label][j][1] += 1 conditional_probability[label][j][1] /= (len(train_labels[train_labels == label]) + 2 * 1) # print(prior_probability) # print(conditional_probability) return prior_probability, conditional_probability @log def predict(test_features, prior_probability, conditional_probability): result = [] for test in test_features: img = binaryzation(test) max_label = 0 max_probability = 0 for i in range(len(prior_probability)): # print("label",i) probability = prior_probability[i] for j in range(len(img)): Print ("j",j) probability *= int(conditional_probability[I][j][img[j]]) if max_probability < probability: max_probability = probability max_label = i result.append(max_label) return np.array(result) if __name__ == '__main__': logger = logging.getLogger() logger.setLevel(logging.DEBUG) raw_data = pd.read_csv('.. /data/train.csv', header=0) data = raw_data.values imgs = data[0:2000, 1:] labels = data[0:2000, 0] # print(imgs.shape) # print(imgs.shape) 1/3 data as test set train_features, test_features, train_labels, test_labels = train_test_split(IMgs, labels, Test_size =0.33,random_state=1) prior_probability, conditional_probability = train(train_features, train_labels) test_predict = predict(test_features, prior_probability, conditional_probability) score = accuracy_score(test_labels, test_predict) print("The accuracy score is ", score)Copy the code