This is the 14th day of my participation in the August More Text Challenge. For details, see: August More Text Challenge
Logistic regression
introduce
First of all to have a perceptual understanding of it is a classification model. Secondly, it is a linear model and the parameter space of linear regression is consistent, and the information is contained in W and B.
In other words, add another layer of mapping function F to the linear regression. Generally, sigmoid function is used for this mapping function. The function form is as follows:
F (x) = 11 + e – xf (x) = \ frac {1} {1 + e ^ {x} -} f (x) = 1 + e – x1
In a word, logistic regression is: logistic regression assumes that the data obey Bernoulli distribution, and uses gradient descent to solve parameters by maximizing likelihood function, so as to achieve the purpose of binary classification. Among them:
Loss function
Calculate the likelihood function for all samples:
Logarithmic likelihood function:
The maximum value of the likelihood function is obtained, so the loss function (cost function) can be defined as (that is, the likelihood function multiplied by a minus sign) :
Calculate f'(x)f ‘(x)f ‘(x) in advance:
The partial derivative with respect to W: Partial J (w, b) partial wj = partial partial wj {- ∑ I = 1 nyilog [f] (xiw + b) + (1 – yi) log [1 – f (xiw + b)]} = – ∑ I = 1 n {yi1f (xiw + b) – (1 – yi) 11 – f (xiw + b)} partial f partial wj (xiw + b) = – ∑ I = 1 n {yi1f (xiw + b) – (1 – yi) 11 – f (xiw + b)} f (xiw + b) [1 – (xiw + b) f] partial (xiw + b) partial wj = – ∑ I = 1 n {yi [1 – (xiw + b) f] – (1 – yi) f (xiw + b)} partial (xiw + b) partial wj = – ∑ I = 1 n {y I [1 – (xiw f + b)] – (1 – yi)} xij (xiw + b) = f – ∑ I = 1 n {yi – f (xiw + b)} xij = ∑ I = 1 n {f (xiw + b) – yi} xij = ∑ I = 1 n {y ^ I – yi} xij \ begin} {aligned \ frac {\partial J(w,b)}{\partial w_j} &= \frac {\partial}{\partial w_j} \{-\sum_{i=1}^n y_i \log [f(x_i w + b)] + (1-y_i) \log [1-f(x_i w + b)] \} \\ &= -\sum_{i=1}^n \{y_i \frac {1}{f(x_i w + b)} – (1-y_i) \frac {1}{1-f(x_i w + b)} \} \frac {\partial f(x_i w + b)} {\partial w_j} \\ &= -\sum_{i=1}^n \{y_i \frac {1}{f(x_i w + b)} – (1-y_i) \frac {1}{1-f(x_i w + b)} \} f(x_i w + b) [1-f(x_i w + b)] \frac {\partial (x_i w + b)} {\partial w_j} \\ &= -\sum_{i=1}^n \{y_i [1-f(x_i w + b)] – (1-y_i) f(x_i w + b) \} \frac {\partial (x_i w + b)} {\partial w_j} \\ &= -\sum_{i=1}^n \{y_i [1-f(x_i w + b)] – (1-y_i) f(x_i w + b) \} x_{ij} \\ &= -\sum_{i=1}^n \{y_i – f(x_i w + b) \} x_{ij} \\ &= \sum_{i=1}^n \{f(x_i w + b) – y_i \} x_{ij} \\ &= \sum_{i=1}^n \{\hat y_i – y_i \} x_{ij} \\ } {\ end aligned partial wj partial J (w, b) = partial wj partial {- I = 1 ∑ nyilog [f] (xiw + b) + (1 – yi) log (1 – f (xiw + b)]} = – = 1 ∑ I n {yif (xiw + b) 1 – (1 – yi) 1 – (xiw + b) 1} f partial wj partial f (xiw + b) = – = 1 ∑ I n {yif (xiw + b) 1 – (1 – yi) 1 – (xiw + b) 1} f f (xiw + b) [1 – (xiw + b) f] partial wj partial (xiw + b) = – = 1 ∑ I n {yi [1 – (xiw + b) f] – (1 – yi) f (xiw + b)} partial wj partial (xiw + b ∑) = – I = 1 n {yi [1 – (xiw + b) f] – (1 – yi) f (xiw + b)} xij ∑ = – I = 1 n {yi – f (xiw + b)} xij ∑ = I = 1 n {f (xiw + b) – yi} xij ∑ = I = 1 n {y ^ I – yi} xij
Similarly, take the partial derivative of B:
W and B can be solved by stochastic gradient descent:
These are also the keys to hand-tearing code.
example
Pure python implementation
For the time being, only the first two categories of irises were used as data for classification.
# Temporarily implement dichotomies
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
def get_train_test() :
iris = load_iris()
index = list(iris.target).index(2) # only for class0 andclass1
iris = load_iris()
X = iris.data[:index]
y = iris.target[:index]
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)
return X_train,y_train,X_test,y_test
def sigmoid(x) :
return 1/ (1+ np.exp(-x))
lr = LogisticRegression()
X_train,y_train,X_test,y_test = get_train_test()
X_train.shape,y_train.shape,X_test.shape,y_test.shape
lr.fit(X_train,y_train)
predictions = lr.predict(X_test)
print(y_test == (predictions >0.5))
Copy the code
skleran
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
Y = iris.target
# Divide data into training sets and test sets
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.2,random_state=0)
Import the model and call the LogisticRegression() function
from sklearn.linear_model import LogisticRegression
# lr = LogisticRegression(penalty='l2',solver='newton-cg',multi_class='multinomial')
lr = LogisticRegression()
lr.fit(x_train,y_train)
Evaluate the model
print('Logistic regression training set accuracy: %.3f'% lr.score(x_train,y_train))
print('Logistic regression test set accuracy: %.3f'% lr.score(x_test,y_test))
from sklearn import metrics
pred = lr.predict(x_test)
accuracy = metrics.accuracy_score(y_test,pred)
print(Accuracy of logistic regression model: %.3f% accuracy)
Copy the code
Maximum entropy model
First of all, there is a perceptual understanding of the maximum entropy model, which is a model that maximizes entropy. The idea is that it should be as confusing, or as random, as possible in addition to the conditions that have been given. And this randomness is mathematically called the maximum entropy model.
Code core formula:
The wiw_iwi parameter can be obtained by iterating:
Python implementation
class MaxEntropy(object) :
def __init__(self, lr=0.01,epoch = 1000) :
self.lr = lr Vector #
self.N = None # Number of data
self.n = None # Xy for
self.hat_Ep = None
# self.sampleXY = []
self.labels = None
self.xy_couple = {}
self.xy_id = {}
self.id_xy = {}
self.epoch = epoch
def _rebuild_X(self,X) :
X_result = []
for x in X:
print(x,self.X_columns)
X_result.append([y_s + '_' + x_s for x_s, y_s in zip(x, self.X_columns)])
return X_result
def build_data(self,X,y,X_columns) :
self.X_columns = X_columns
self.y = y
self.X = self._rebuild_X(X)
self.N = len(X)
self.labels = set(y)
for x_i,y_i in zip(self.X,y):
for f in x_i:
self.xy_couple[(f,y_i)] = self.xy_couple.get((f,y_i),0) + 1
self.n = len(self.xy_couple.items())
def fit(self,X,y,X_columns) :
self.build_data(X,y,X_columns)
self.w = [0] * self.n
for _ in range(self.epoch):
for i in range(self.n):
# self.w[I] += 1/self.n * np.log(self.get_hat_ep (I)/self.get_ep (I)) # self.w[I] += 1/self.n * np.log(self.get_hat_ep (I)/self.get_ep (I)
self.w[i] += self.lr * np.log(self.get_hat_Ep(i) / self.get_Ep(i) ) # Here multiply by 1/self.n, or by a smaller learning rate
# print(_,np.log(self.get_hat_Ep(i) / self.get_Ep(i) ) )
def predict(self,X) :
print(X)
X = self._rebuild_X(X)
print(X)
result = [{} for _ in range(len(X))]
for i,x_i in enumerate (X):
for y in self.labels:
# print(x_i)
result[i][y] = self.get_Pyx(x_i,y)
return result
def get_hat_Ep(self,index) :
self.hat_Ep = [0]*(self.n)
for i,xy in enumerate(self.xy_couple):
self.hat_Ep[i] = self.xy_couple[xy] / self.N
self.xy_id[xy] = i
self.id_xy[i] = xy
return self.hat_Ep[index]
def get_Zx(self,x_i) :
Zx = 0
for y in self.labels:
count = 0
for f in x_i :
if (f,y) in self.xy_couple:
count += self.w[self.xy_id[(f,y)]]
Zx += np.exp(count)
return Zx
def get_Pyx(self,x_i,y) :
count = 0
for f in x_i :
if (f,y) in self.xy_couple:
count += self.w[self.xy_id[(f,y)]]
return np.exp(count) / self.get_Zx(x_i)
def get_Ep(self,index) :
f,y = self.id_xy[index]
# print(f,y)
ans = 0
# print(self.X)
for x_i in self.X:
if f not in x_i:
continue
pyx = self.get_Pyx(x_i,y)
ans += pyx / self.N
# print("ans",ans,pyx)
return ans
data_set = [['youth'.'no'.'no'.'1'.'refuse'],
['youth'.'no'.'no'.'2'.'refuse'],
['youth'.'yes'.'no'.'2'.'agree'],
['youth'.'yes'.'yes'.'1'.'agree'],
['youth'.'no'.'no'.'1'.'refuse'],
['mid'.'no'.'no'.'1'.'refuse'],
['mid'.'no'.'no'.'2'.'refuse'],
['mid'.'yes'.'yes'.'2'.'agree'],
['mid'.'no'.'yes'.'3'.'agree'],
['mid'.'no'.'yes'.'3'.'agree'],
['elder'.'no'.'yes'.'3'.'agree'],
['elder'.'no'.'yes'.'2'.'agree'],
['elder'.'yes'.'no'.'2'.'agree'],
['elder'.'yes'.'no'.'3'.'agree'],
['elder'.'no'.'no'.'1'.'refuse'],
]
X = [i[:-1] for i in data_set]
X_columns = columns[:-1]
Y = [i[-1] for i in data_set]
train_X = X[:12]
test_X = X[12:]
train_Y = Y[:12]
test_Y = Y[12:]
columns = ['age'.'working'.'house'.'credit_situation'.'labels']
X_columns = columns[:-1]
mae = MaxEntropy()
mae.fit(train_X,train_Y,X_columns)
mae.predict(test_X)
Copy the code
The resources
zhuanlan.zhihu.com/p/68423193
Blog.csdn.net/weixin_4156…
Blog.csdn.net/weixin_4156…