Welcome to follow the public account: Sumsmile/focus on image processing mobile development veteran

The code is from Dr. Huang’s Github, with detailed comments as you understand them. Thank you to Dr. Huang for your efforts in spreading machine learning. Github.com/fengdu78/Co…

First, the question requirements

Univariate and multivariable linear regression: Give a set of data and use linear regression method to fit the function. Predict the profits of opening a restaurant, predict house prices. It took me two days to learn Python and understand it all.

The content of this article:

  • Univariate – Restaurant profit forecast
  • Multivariable – house price forecast
  • Existing libraries implement gradient descent
  • Hard solution fitting parameters of normal equations

Ii. Code implementation and detailed description of single variable fitting, complete code refer to github of the original author

1. Data reading and pre-processing

// Import the dependency library, Import numpy as NP import pandas as pd import matplotlib.pyplot as PLT path = 'ex1data1.txt' data  = pd.read_csv(path, header=None, names=['Population', 'Profit']) // Show data.head() // Describe the basic features of sample data.describe()Copy the code

Data (DataFrame class instance)

Look at the distribution of the data graphically

// Scatter = scatter // data.plot = matplotlib Plot (kind='scatter', x='Population', y='Profit', figsize=(12,8)) plt.show()Copy the code

2. Define the loss function

Now let’s use gradient descent to implement linear regression to minimize the cost function. The equations implemented in the following code examples are detailed in ex1.pdf in the Exercises folder.

First of all, We will create a parameter theta as the characteristic function of the cost function J (theta) = 12 m ∑ I = 1 m (h theta – y (x (I)) (I)) 2 J \ left (\ theta \ right) = \ frac {1} {2} m. \ sum \ limits_ {I = 1} ^ {m} {{{\ left ( {{h} _ {\ theta}} \ left ({{x} ^ {(I)}} \ right) – {{} y ^ {(I)}} \ ^ right)} {2}}} J (theta) = m1i = 1 ∑ 2 m (h theta – y (x (I)) (I)) 2

Theta [h (x) = theta TX = theta 0 x0 1 x1 + + theta. Theta 2 x2 +… + theta NXN] [{{h} _ {\ theta}} \ left \ right (x) = {{\ theta} ^ {T}} x = {{\ theta }_{0}}{{x}_{0}}+{{\theta }_{1}}{{x}_{1}}+{{\theta }_{2}}{{x}_{2}}+…+{{\theta }_{n}}{{x}_{n}}] Theta [h (x) = theta TX = theta 0 x0 1 x1 + + theta. Theta 2 x2 +… + theta NXN]

Code implementation:

def computeCost(X, y, theta):
    X * theta.T
    inner = np.power(((X * theta.T) - y), 2)
    return np.sum(inner) / (2 * len(X))
Copy the code

Let’s add a column to the training set so that we can use a vectorized solution to calculate costs and gradients. Because there are θ0{\theta_0}θ0 terms, adding “1” to the first column of x gives a baseline starting value, commonly called “intercept”.

Data. Insert (0, 'Ones', 1) // Add 1 to the position of the 0 column named 'Ones' data.describe() //Copy the code

Variable initialization

Cols = data.shape[1] // The size of data, which can be defined as size, length, etc., is: row * column // iloc = index, loc = name // python ": X = data.iloc[:,0:cols-1]# x = all rows, Drop the last column y = data.iloc[:,cols-1:cols]#X is the last column of all rowsCopy the code

Observe that X (training set) and y (target variable) are correct.

X.ead () y.ead ()Copy the code

Now, the critical step is that the cost function should be a NUMpy matrix, so we need to convert X and Y before we can use them. We also need to initialize Theta.

X = np.matrix(x.values) Y = Np. matrix(y.values) // Theta is a (1,2) matrix, i.e., 1 row and 2 columns,matrix([[0, 0]]), Matrix (np.array([0,0])); // The dimensions of x y theta x.shape, y.shape, theta. Shape // ((97, 2), (97, 1)), (1, 2) // Calculate the cost function computeCost(X, y, theta) // When theta is 0, it is equivalent to calculate the variance of Y, and the value is 32.072733877455676Copy the code

Define the gradient descent function, Batch gradient there (batch gradient descent theta j: = theta j – alpha partial partial theta jJ (theta) {{\ theta} _ {j}} : = {{\ theta} _ {j}} – \ alpha \ frac {\ partial} {\ partial {{\ theta J \} _ {j}}} left (\ theta \ right) theta. J: = theta – alpha partial theta partial j j j (theta)

Find the above expression by partial derivative

Note that In Ng’s tutorial, he distinguishes j = 0 from j = 1 to make it easier for you to understand. In this case, I’m adding a column of “1” to x,

X0 {x_0}x0 is 1, Don’t have to distinguish between [h theta (x) = theta TX = theta 0 x0 1 x1 + + theta. Theta 2 x2 +… + theta NXN] [{{h} _ {\ theta}} \ left \ right (x) = {{\ theta} ^ {T}} x = {{\ theta }_{0}}{{x}_{0}}+{{\theta }_{1}}{{x}_{1}}+{{\theta }_{2}}{{x}_{2}}+…+{{\theta }_{n}}{{x}_{n}}] Theta [h (x) = theta TX = theta 0 x0 1 x1 + + theta. Theta 2 x2 +… + theta NXN]

Programming to achieve gradient descent, the first time to see this formula a little dizzy, compared to the above partial derivative solution of the formula to understandPay attention to
Theta. 0 {\theta_0}
Also to
Theta. 1 {\theta_1}
I’m going to figure out

// X: independent variable of observation // y: result variable of observation // theta: parameters theta0 and theta1 // iters: 1000 def gradientDescent(X, y, theta, alpha, iters): Temp = np.matrix(np.zeros(thet.shape)) //.ravel().flatten(); Parameters = int(thet.ravel ().shape[1]) // Initialize a zero set, Cost = np.zeros(iters) for I in range(iters): cost = np.zeros(iters) Error = (X * thet.t) -y for j in range(parameters): The parameters are 0->1, and the parameters are iterated twice. Each time the parameters are iterated, theta0 and theta1 are iterated in the same way as theta1. Term = np.multiply(error, X[:,j]) temp[0,j] = theta[0,j] - (alpha/len(X)) * np.sum(term)) Cost [I] = computeCost(X, y, theta) // Return the result, Notice in Python, you can return multiple values: return theta, costCopy the code

3. Run gradient descent and solve fitting parameters

// Step 0.01, alpha can be adjusted to dynamic update, fast then slow alpha = 0.01 // iterates 1000 times iters = 1000 // gradient g: theta, cost: loss G, cost = gradientDescent(X, y, theta, alpha, iters) The loss function computeCost(X, y, g) // results in 4.515955503078914Copy the code

Draw linear model and data and observe the fitting situation

x = np.linspace(data.Population.min(), data.Population.max(), 100) f = g[0, 0] + (g[0, 1] * x) fig, Ax = plt.subplots(figsize=(12,8)) ax.plot(x, f, 'r', label='Prediction') # ax.scatter(data.Population, data.Profit, label='Traning Data', s =60, c = 'b', Marker ='+') # check API "s" -draw point size, "c"-color, Ax. Scatter (data.population, data.profit, label='Traning Data') ax.legend(loc=2) ax.set_xlabel('Population') ax.set_ylabel('Profit') ax.set_title('Predicted Profit vs. Population Size') plt.show()Copy the code

Observe the loss function of thin face, the beginning of the decline quickly, then slowly convergence, relatively ideal

Multivariable linear regression. Logic is similar to a single variable and will not be repeated. (Normalized treatment is added)

Exercise 1 also includes a house price data set with two variables (size of house, number of bedrooms) and a target (price of house). We analyze the data set using techniques we have already applied.

path = 'ex1data2.txt' data2 = pd.read_csv(path, header=None, names=['Size', 'Bedrooms', 'Price']) datA2.head () datA2.describe () # For this task, we added another preprocessing step - feature normalization. For pandas, the normalization of features is an important part of machine learning data preprocessing, which directly affects the efficiency and results of model training, especially the magnitude difference between the features. Data2 = (data2-data2.mean ())/data2.std() data2.head() # Now we repeat the preprocessing steps of Part 1 and run the linear regression program on the new data set. # add ones column data2.insert(0, 'Ones', 1) # set X (training data) and y (target variable) cols = data2.shape[1] X2 = data2.iloc[:,0:cols-1] y2 = data2.iloc[:,cols-1:cols] # convert to matrices and initialize theta X2 = np.matrix(X2.values) y2 = np.matrix(y2.values) Theta2 = np.matrix(np.array([0,0,0])) # perform linear regression on the data set g2, cost2 = gradientDescent(X2, y2, Theta2, alpha, iters) # get the cost (error) of the model computeCost(X2, y2, g2) # calculate 0.1307033696077189, You can't tell what this value means, is it good or bad? # You can quickly view the training progress of this one. Plot (np.arange(iters), cost2, copy-plots (figsize=(12,8)). 'r') ax.set_xlabel('Iterations') ax.set_ylabel('Cost') ax.set_title('Error vs. Training Epoch') fig.set_facecolor('white'); plt.show()Copy the code

Data form:

The normalized:

Gradient descent curve:

We can use SciKit-learn linear regression functions instead of implementing these algorithms from scratch. Let’s apply sciKit-Learn’s linear regression algorithm to the data from Part 1 and see how it performs.

In a few lines of code:

from sklearn import linear_model model = linear_model.LinearRegression() model.fit(X, y) x = np.array(X[:, Plot (X, f, 'r'), plots(X, f, 'r'), and plots(X, f, 'r'). label='Prediction') ax.scatter(data.Population, data.Profit, label='Traning Data') ax.legend(loc=2) ax.set_xlabel('Population') ax.set_ylabel('Profit') ax.set_title('Predicted Profit vs. Population Size') fig.set_facecolor('white') plt.show()Copy the code

Fourth, solve by solving the equation
Theta. {\theta}
Parameter, usually called the normal equation.

The principle is to use the least square method of linear algebra to fit, don’t remember the least square method, you can review.

Def normalEqn(X, y): Theta = Np.linalg.inv (X.T@X)@X.T@y#X.T@X equivalent to x.T. Dot (X) return theta final_theta2=normalEqn(X, Y)# there is a slight difference between the value of theta in the batch gradient descent final_theta2 # Matrix ([[-3.89578088], [1.19303364]]) # Gradient descent results are matrix([[-3.24140214, 1.1272942]]), the first parameter is the intercept gap is largeCopy the code

Welcome to follow the public account: Sumsmile/focus on image processing mobile development veteran