Welcome to follow the public account: Sumsmile/focus on image processing mobile development veteran

1. Algorithm requirements

Students have two test scores that predict students’ admission outcomes, i.e. the fitting of the two parameters

  • an

Linear fitting to predict student enrollment

  • Question 2

Nonlinear fitting, prediction of student enrollment, regularized logistic regression

By doing this problem, it is found that the selection of fitting method depends on the characteristics of input data first, and the steps of machine learning to solve engineering problems can be summarized:

  1. Drawing to observe the characteristics of input parameters (if it is within three dimensions, it cannot be drawn beyond three dimensions)
  2. Design fitting function, i.e., estimate algorithm h(θ){h\left(\theta\right)}h(θ)
  3. The loss function cost is obtained, that is, Jθ{J_\theta}Jθ, and the sigmoid function is used in logistic regression
  4. Find the gradient descent function, the partial derivative, to do gradient descent
  5. Start data training by finding the θ{\theta}θ vector
  6. According to the calculated h(θ){h\left(\theta\right)}h(θ), plot to observe the fitting situation, whether there is over-fitting or under-fitting, adjust parameters according to the observation results

2. Linear logistic regression

To prepare data

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns plt.style.use('fivethirtyeight') import matplotlib.pyplot as plt # import tensorflow as tf from sklearn.metrics import Data = pd.read_csv('ex2data1.txt', names=['exam1', 'exam2', Describe ()# draw sample distribution sns.set(context="notebook", style="darkgrid", Palette =sns.color_palette("RdBu", 2)) # RdBu refers to a style, Palplot (sns.color_palette("RdBu", n_colors=5)) # Draw the color bar sns.lmplot(x='exam1', y='exam2', hue='admitted', Data =data, height=6, fit_reg=False, scatter_kws={"s": 50}) plt.show()#Copy the code
Sample data Sample characteristics
Sns.color_palette Color distribution Sample distribution

Define the necessary functions to get X and y

Def get_X(df):# read feature # """ # use concat to add INTERSECT feature to avoid side effect # not efficient for big dataset though # """ ones = pd.DataFrame({'ones': Data = pd.concat([ones, df], axis=1) # Return data.iloc[:, Def get_y(df): "assume the last column is the target" return Np.array (df.iloc[:, -1])#df.iloc[:, -1] def normalize_feature(df) Applies function along input axis(default 0) of DataFrame.""" Print (x.shape) y = get_y(data) print(y.shape) # (100, (3), 100)Copy the code

Logistic regression is based on sigmoID function

G represents a commonly used logistic function and is Sigmoid function, and the formula is: [g (z) = 11 + e – z] [/ left/right (z) = g \ frac {1} {1 + {{e} ^ {z}}}] [g (z) = 1 + e – z1] together, we get the logistic regression model assumes that the function: Theta [h (x) = 11 + e – theta TX] [{{h} _ {\ theta}} \ left \ right (x) = \ frac {1} {1 + {{e} ^ {, {{\ theta} ^ {T}} x}}}] [h theta (x) = 1 + e – theta TX1]

Sigmoid programming implementation:

def sigmoid(z): Return 1 / (1 + np.exp(-z)) plots, ax = plt. plot(figsize=(8, 5)) ax.plot(np.arange(-10, 10, step=0.01), Sigmoid (np) arange (0.01) - 10, 10, step =)) ax. Set_ylim ((0.1, 1.1)) ax. Set_xlabel (' z ', 18) fontsize = ax. Set_ylabel (g (z), fontsize=18) ax.set_title('sigmoid function', fontsize=18) plt.show()Copy the code

Define cost functiion


  • m a x ( ( Theta. ) ) = m i n ( ( Theta. ) ) max(\ell(\theta)) = min(-\ell(\theta))
  • Choose −ℓ(θ) -ell (\theta)−ℓ(θ) as the cost function

J ( Theta. ) = 1 m i = 1 m [ y ( i ) log ( h Theta. ( x ( i ) ) ) + ( 1 y ( i ) ) log ( 1 h Theta. ( x ( i ) ) ) ] = 1 m i = 1 m [ y ( i ) log ( h Theta. ( x ( i ) ) ) ( 1 y ( i ) ) log ( 1 h Theta. ( x ( i ) ) ) ] J\left( \theta \right)=-\frac{1}{m}\sum\limits_{i=1}^{m}{[{{y}^{(i)}}\log \left( {{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)]} \\ =\frac{1}{m}\sum\limits_{i=1}^{m}{[-{{y}^{(i)}}\log \left( {{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1-{{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)]}

Start dealing with theta

theta = theta=np.zeros(3) # X(m*n) so theta is n*1 def cost(theta, X, y): ''' cost fn is -l(theta) for you to minimize''' return np.mean(-y * np.log(sigmoid(X @ theta)) - (1 - y) * np.log(1 - Sigmoid (X @ theta)) # X @ theta is equivalent to x.dot (theta) cost(theta, X, y) # yields an initial loss value: 0.6931471805599453Copy the code

To define gradient descent

  • This is batch gradient descent.
  • 1mXT(Sigmoid(Xθ)−y)\frac{1}{m} X^T(Sigmoid(X theta) -y)m1XT(Sigmoid(Xθ)−y)


partial J ( Theta. ) partial Theta. j = 1 m i = 1 m ( h Theta. ( x ( i ) ) y ( i ) ) x j ( i ) \frac{\partial J\left( \theta \right)}{\partial {{\theta }_{j}}}=\frac{1}{m}\sum\limits_{i=1}^{m}{({{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}})x_{_{j}}^{(i)}}

def gradient(theta, X, y): # "" Just 1 Batch gradient" "return (1 / len(X)) * x.t@ (sigmoid(X @ theta) -y) gradient(theta, X, y) # Look at the gradient descent for step 1 # array([-0.1, -12.00921659, -11.26284221])Copy the code

The fitting parameters

  • It uses scipy. Optimize. Minimize to find parameters
import scipy.optimize as opt res = opt.minimize(fun=cost, x0=theta, args=(X, y), method='Newton-CG', Jac =gradient) # fun- loss function, X0 - parameters of the function to be fitted, ARgs - input sample data, method- gradient descent processing method, jac- training method, here select gradient descent print(res) fun: 0.20349770426553998 JAC: array([-2.85342794E-06, -3.50853296E-05, -1.6206163904]) message: 'Optimization terminated successfully.' nfev: 71 nhev: 0 nit: 27 njev: 178 status: 0 success: True x: Array ([25.16557602, 0.20626565, 0.20150593])Copy the code

Prediction and validation with training sets

In the actual project, should not use training set to do prediction and verification, cross-check and the selection of the number of data have another pay attention to, here is the exercise, do not do so much pay attention to, to achieve the purpose of learning.

def predict(x, theta): Prob = sigmoid(x @ theta) # Return (prob >= 0.5). Astype (int) final_theta = res.x y_pred = predict(x, final_theta) print(classification_report(y, Y_pred)) precision recall F1-score support 0 0.87 0.85 0.86 40 1 0.90 0.92 0.91 60 accuracy 0.89 100 macro AVg 0.89 0.88 Weighted AVG 0.89 0.89 0.89 100Copy the code

Looking for decision boundaries

Stats.stackexchange.com/questions/9…

X× theta =0X \times \theta =0X × theta =0 (this is the line)

Is actually solving theta. Theta 0 + 1 2 ∗ ∗ x + theta, y = 0 + {\ theta_1} {\ theta_0 * x + {\ * y theta_2}} = 0 theta. Theta 0 + 1 2 ∗ ∗ x + theta, y = 0 coef = – (res) x/res) x [2]) # find the equation

Print (res.x) # this is final theta [-25.16557602 0.20626565 0.20150593] coef = -(res.x/res.x[2]) # find the equation, Print (coef) # The equation x = NP. Arange (130, Step =0.1) y = COef [0] + COef [1]*x [124.8875223-1.02362075-1Copy the code

Set (context="notebook", style="ticks", font_scale=1.5) sns.lmplot(x='exam1', y='exam2', hue='admitted', data=data, height=6, fit_reg=False, scatter_kws={"s": 25} ) plt.plot(x, y, 'grey') plt.xlim(0, 130) plt.ylim(0, 130) plt.title('Decision Boundary') plt.show()Copy the code

Nonlinear logistic regression

The basic logic is the same, the same point does not repeat.

Loading sample data

df = pd.read_csv('ex2data2.txt', names=['test1', 'test2', 'accepted']) df.head() sns.set(context="notebook", Style ="ticks", font_scale=1.5) sns.lmplot('test1', 'test2', hue='accepted', data=df, size=6, fit_reg=False, scatter_kws={"s": 50} ) plt.title('Regularized Logistic Regression') plt.show()Copy the code

It was observed that the data in problem two were non-linear and needed to be fitted with more complex polynomials

Feature mapping

polynomial expansion

Implement code logic:

for i in 0.. i for p in 0.. i: output x^(i-p) * y^pCopy the code

Note that notebook does not support local images

Expand the original two columns of data into n columns

def feature_mapping(x, y, power, as_ndarray=False): # """return mapped features as ndarray or dataframe""" # data = {} # # inclusive # for i in np.arange(power + 1): # for p in np.arange(i + 1): # data [" f {} {} ". The format (I - p, p)] = np. The power (x, I - p) * np in power (y, p) # {} said dictionary, namely the map data collection, Data = {"f{}{}". Format (i-p, p): np.power(x, i - p) * np.power(y, p) for i in np.arange(power + 1) for p in np.arange(i + 1) } if as_ndarray: return pd.DataFrame(data).values else: return pd.DataFrame(data) x1 = np.array(df.test1) x2 = np.array(df.test2) data = feature_mapping(x1, x2, power=6) print(data.shape) data.head()Copy the code

Expanded data set:

Regularized cost function


J ( Theta. ) = 1 m i = 1 m [ y ( i ) log ( h Theta. ( x ( i ) ) ) ( 1 y ( i ) ) log ( 1 h Theta. ( x ( i ) ) ) ] + Lambda. 2 m j = 1 n Theta. j 2 J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{{y}^{(i)}}\log \left( {{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1-{{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)]}+\frac{\lambda }{2m}\sum\limits_{j=1}^{n}{\theta _{j}^{2}}

theta = np.zeros(data.shape[1]) X = feature_mapping(x1, x2, power=6, as_ndarray=True) print(X.shape) y = get_y(df) print(y.shape) (118, 28) (118,) def regularized_cost(theta, X, y, l=1): # "you don't penalize theta_0" # theta_0 Do not "punish" theTA_j1_to_n = theta[1:] regularized_term = (l/(2 * len(X))) * np.power(theta_j1_to_n, 2). Sum () return cost(theta, X, y) + regularized_term (theta, X, y) L =1) 0.6931471805599454 # This is the same as the not regularized cost because we init theta as zeros... Since we set theta to 0, the regularization cost function has the same value as the cost functionCopy the code

Regularized gradient


partial J ( Theta. ) partial Theta. j = ( 1 m i = 1 m ( h Theta. ( x ( i ) ) y ( i ) ) ) + Lambda. m Theta. j   for j p 1 \frac{\partial J\left( \theta \right)}{\partial {{\theta }_{j}}}=\left( \frac{1}{m}\sum\limits_{i=1}^{m}{\left( {{h}_{\theta }}\left( {{x}^{\left( i \right)}} \right)-{{y}^{\left( i \right)}} \right)} \right)+\frac{\lambda }{m}{{\theta }_{j}}\text{ }\text{ for j}\ge \text{1}

def regularized_gradient(theta, X, y, l=1): # '''still, leave theta_0 alone''' theta_j1_to_n = theta[1:] regularized_theta = (l / len(X)) * theta_j1_to_n # by doing this, No offset is on theta_0 # theta_0 That is, regularization does not affect theta_0 regularized_term = Np.concatenate ([NP.array ([0]), regularized_theta]) return gradient(theta, X, y) + regularized_term regularized_gradient(theta, X, y)Copy the code

The fitting parameters

import scipy.optimize as opt print('init cost = {}'.format(regularized_cost(theta, X, y))) res = opt.minimize(fun=regularized_cost, x0=theta, args=(X, y), method='Newton-CG', Jac =regularized_gradient) res init cost = 0.6931471805599454 fun: 0.5290027297128722 jac: Array ([4.64317436E-08, 1.04331373E-08, -3.61802419E-08, -3.44397841E-08, 2.46408233E-08, 1.12246256E-08, -5.13021764E-09, -1.11582358E-08, 1.23402171E-09, 2.62428040e-08, -3.94916642E-08, 5.21264511E-09, -8.98645551E-09, 1.04022216E-08, 3.24793498E-08, -5.20025250E-09, -5.87238749E-09, 7.46797548E-10, -4.52162093E-09, -1.47871366E-09, 1.80405423E-08, -1.62958045E-08, 7.26901438E-10, -5.53694209E-09, 3.57089897E-09, -3.35135784E-09, 5.51302048E-09, Topic_topic_test: 0 topic_topic_test: 0 topic_test: 0 topic_test: 0 topic_test: 0 topic_test: 0 topic_test: 0 True x: Array ([1.27274054, 0.62527229, 1.18108684, -2.01996217, -0.91742229, -1.43166588, 0.1240061, -0.36553467, -0.35724013, -0.1751284, -1.45815894, -0.050989, -0.61555564, -0.27470555, -1.192815, -0.24218818, -0.20600633, -0.04473079, -0.27778484, -0.29537856, -0.45635706, -1.04320283, 0.02777156, -0.29243164, 0.01556705, -0.3273799, 0.14388646, 0.92465161])Copy the code

To predict

final_theta = res.x y_pred = predict(X, final_theta) print(classification_report(y, Y_pred)) precision recall F1-score support 0 0.90 0.75 0.82 60 1 0.78 0.91 0.84 58 accuracy 0.83 118 macro AVG 0.84 0.83 Weighted AVG 0.84 0.83 0.83 118Copy the code

Use different
Lambda. \lambda
(the regularized weights, which are constants), draw the decision boundaries

We find all X’s that satisfy X×θ=0X\times \theta =0X ×θ=0

  • instead of solving polynomial equation, just create a coridate x,y grid that is dense enough, and find all those
    X × θ X\times \theta
    that is close enough to 0, then plot them
def draw_boundary(power, l): # """ # power: polynomial power for mapped feature # l: lambda constant # """ density = 1000 threshhold = 2 * 10**-3 final_theta = feature_mapped_logistic_regression(power, l) x, y = find_decision_boundary(density, power, final_theta, threshhold) df = pd.read_csv('ex2data2.txt', names=['test1', 'test2', 'accepted']) sns.lmplot(x='test1', y='test2', hue='accepted', data=df, height=6, fit_reg=False, scatter_kws={"s": 100}) plt.scatter(x, y, c='red', s=10) plt.title('Decision boundary') plt.show() def feature_mapped_logistic_regression(power, l): # """for drawing purpose only.. not a well generealize logistic regression # power: int # raise x1, x2 to polynomial power # l: int # lambda constant for regularization term # """ df = pd.read_csv('ex2data2.txt', names=['test1', 'test2', 'accepted']) x1 = np.array(df.test1) x2 = np.array(df.test2) y = get_y(df) X = feature_mapping(x1, x2, power, as_ndarray=True) theta = np.zeros(X.shape[1]) res = opt.minimize(fun=regularized_cost, x0=theta, args=(X, y, l), method='TNC', jac=regularized_gradient) final_theta = res.x return final_theta def find_decision_boundary(density, power, theta, threshhold): T1 = np.linspace(-1, 1.5, density) T2 = Np. linspace(-1, 1.5, density) cordinates = [(x, Y) for x in T1 for y in t2. Zip (* c) : X_cord, y_cord = zip(*cordinates) mapped_cord = feature_mapping(x_cord, y_cord, power) # this is a dataframe inner_product = mapped_cord.values @ theta decision = mapped_cord[np.abs(inner_product) < Threshhold] return decision. F10, decision. F01 #Copy the code

Try a few more
Lambda. {\lambda}
Value, found
Lambda. {\lambda}
The value ranges from 1 to 10

λ=1, 0, 100, 10{\lambda =1, 0, 100, 10}λ=1, 0, 100, 10 boundary(power=6, l=1) λ=1, 0, 100, 10{\lambda =1, 0, 100, 10


Lambda. = 1 {\lambda = 1}
Less fitting

Lambda. = 0 {\lambda = 0}
A fitting

Lambda. = 100 {\lambda = 100}
Owe fitting

Lambda. = 10 {\lambda = 10}
A reasonable

Note that notebook does not support loading local image resources

Local images cannot be inserted in jupyter notebook

Reference:

  • Github.com/fengdu78/Co…
  • Matplotlib. Pyplot style beautification
  • Matplotlib
  • Seaborn 0.9 Chinese document
  • Penalty & C for LogisticRegression parameter analysis
  • Coursera – ML – AndrewNg homework

Welcome to follow the public account: Sumsmile/focus on image processing mobile development veteran