Linear Regression

Linear regression algorithm is one of the most important algorithms in machine learning and statistical analysis.

Before you jump a certain auxiliary program, wechat Jump Game, you need to fill in a proportional experience value between the pressing time and the jumping distance according to the screen size and keep manually adjusting it. Later, you can use this algorithm to fit the pressing time and the jumping distance. Here is the Pull Request.

Given a set of points described by d attributes X=(x1; x2; . ; Xd), linear model attempts to learn a predictive function based on linear combination of attributes, namely ƒ(x)= W1x1 + W2x2 +… Plus WDXD plus b, if you know w and b, you can determine the model.

The principle of

In high school mathematics, we have learned the algorithm for obtaining undetermined coefficient with only one attribute X, namely, the least square method. A regression line ƒ(x)=kx+ B can be determined with a series of discrete points by the least square method. This problem with only one input variable/eigenvalue is also called univariate linear regression.

Cost Function

Different k values also result in different modeling errors between predicted value and actual value. Variance is a commonly used loss Function, also known as Cost Function. Our goal is to find model parameters that can minimize variance.

The loss function of univariate linear regression is usually graphically similar to a parabola with a minimum.

The linear regression loss function of the two variables/features graphically resembles a bowl, with the bottom of the bowl being the minimum.

In the case of more eigenvalues, it is difficult to graph high-dimensional space, and the loss function has different extreme values in different regions, so it is generally difficult to calculate the minimum value.

Gradient Descent

We usually use the gradient descent algorithm to find this minimum. A random combination of parameters is selected, the loss function is calculated, and the next new combination of parameters that reduces the loss function the most is found and updated synchronously, continuing to do so until a local minimum is found. Different combinations of initial parameters may find different local minima.

Gradient descent algorithm formula:

Where, α is the learning rate, which determines how much steps are taken along the direction that causes the loss function to decline greatly. If the value is too small, the convergence will be slow; if the value is too large, it may exceed the minimum value, leading to failure of convergence or failure to find a reasonable combination of undetermined parameters θ.

The right side of α is a derivative term, which requires the basic knowledge of derivative and partial derivative. Simply speaking, the slope of tangent line at the current θ can determine the correct direction, and the learning rate can determine how far to go.

Let’s now implement and experiment with the idea of machine learning using TensorFlow.

To prepare data

Firstly, numpy is used to generate some simulated data and intentionally random offset points (xi, Yi). The generated random points are regarded as data sets, and the data sets are divided into training sets and test sets in a ratio of 8:2. Then, the training sets are read through codes and desired parameters K and B are updated to make k and B closer and closer to the true values, so that f(xi)≈ YI. So that the variance is minimized.

The variance corresponds to the Euclidean distance, and the least square method is trying to find a line that minimizes the sum of the Euclidean distances from all the samples to the line.

import tensorflow as tf import numpy as np from tensorflow.python.framework import ops import matplotlib.pyplot as plt Ops. Reset_default_graph () sess = tf.session () data_AMOUNT = 101 # number of data batch_size = 25 # Number of data y=Kx+3 (K=5) X_VALS = np.linspace(20, 200, data_amount) y_vals = np.multiply(x_vals, 5) y_vals = np.add(y_vals, Normal (0,15, data_amount) y_offset_vals = NP.random. Normal (0,15, data_amount) y_vals = NP.add (y_VALS, Y_offset_vals) # For intentional deviations from y valuesCopy the code

Model training

X_data = tf.placeholder(shape=[None, 1], dType = tF.float32) y_target = tF.placeholder (shape=[None, 1], Dtype =tf.float32) # construct K = tf.variable (tf.random_normal(mean=0, shape=[1, 1])) calcY = tf.add(tf.matmul(x_data, K), Loss = tf.reduce_mean(tf.square(y_target-calcy)) init = tf.global_variables_initializer() Sess. Run (init) my_opt = tf. Train. GradientDescentOptimizer (0.0000005) train_step = my_opt. Minimize # (loss) to minimize the loss value Loss_vec = [] # Save the loss value for each iteration. rand_index = np.random.choice(data_amount, size=batch_size) x = np.transpose([x_vals[rand_index]]) y = np.transpose([y_vals[rand_index]]) sess.run(train_step, feed_dict={x_data: x, y_target: y}) tmp_loss = sess.run(loss, feed_dict={x_data: x, y_target: Y}) loss_vec.appEnd (tmp_loss) # If (I + 1) % 25 == 0: print('Step #' + str(i + 1) + ' K = ' + str(sess.run(K))) print('Loss = ' + str(sess.run(loss, feed_dict={x_data: X, y_target: y}))) # Sess.run (k) sess.close()Copy the code

The learning framework uses gradient descent to find an optimal solution that minimizes the variance.

Learning rate is a very important parameter. If it is too small, the algorithm will take a long time to converge. If it is too large, the results may not converge or NAN may not be able to get the results directly.

Display the results

Numpy and Matplot were used in this experiment. Later, I will practice using these two libraries to strengthen my impression.

best_fit = []
for i in x_vals:
    best_fit.append(KValue * i + 3)

plt.plot(x_vals, y_vals, 'o', label='Data')
plt.plot(x_vals, best_fit, 'r-', label='Base fit line')
# plt.plot(loss_vec, 'k-')
plt.title('Batch Look Loss')
plt.xlabel('Generation')
plt.ylabel('Loss')
plt.show()
Copy the code

There are two graphs after the graphics, one of which represents the fitting line at the end of the training and the other represents the convergence of the loss value of each training, but the results are not unique.

As can be seen, with the progress of training, the overall prediction loss is getting smaller and smaller. Changing the learning rate or batch size will significantly change the convergence rate of training loss, or even fail to converge. In general, the larger the batch value is, the better the effect is.

Other regression algorithms

In addition to the linear regression algorithm, there are several other regression algorithms, subsequent study, supplement.

Deming regression algorithm

The least square linear regression algorithm minimizes the vertical distance to the regression line and only considers the y value, while the Deming regression algorithm minimizes the vertical distance to the regression line and considers both the X value and the Y value.

The specific algorithm can be modified by subtracting the loss function, and the results are basically the same.

Lasso regression and ridge regression algorithm

In lasso regression, L1 regular term is added, and L2 regular term is added in ridge regression.

Elastic network regression algorithm

An algorithm combining Lasso regression and ridge regression, adding both L1 and L2 regular terms to the loss function.

Logistic regression algorithm

The linear regression is converted into a binary classifier, and the output of linear regression is scaled to 0 and 1 by sigmoid function to judge whether the target belongs to a certain category.

reference

qianhk.com/2018/02/ Client…

Studentdeng. Making. IO/blog / 2014/0…

This article first appeared on Qian Kaikai’s blog