Linear regression is a statistical technique used to measure the relationship between variables. The interesting thing is that the implementation of this algorithm is not complicated, but it can be applied to many situations. It is for these reasons that I am happy to start my study of TensorFlow with linear regression.
Remember that linear regression models the relationship between a dependent variable, independent variables XI, and a random value B, whether in the case of two variables (simple regression) or multiple variables (multiple regression).
In this section, we create a simple example to illustrate how TensorFlow assumes that our data model conforms to a simple linear regression y = W * x + b. To do this, we first generate a series of points in two-dimensional space using simple Python code, TensorFlow is then used to find lines that best fit these points.
The first thing to do is import the NumPy library, which generates some points as follows:
From this code, we can see that we have generated points that obey y = 0.1 * x + 0.3, which do not exactly conform to the line due to the added deviation of the normal distribution. This gives us an interesting example.
In this example, the dots are displayed as shown below,
The reader can generate the map by importing some functions from the Matplotlib library and installing Matplotlib via PIP.
These points are going to be the data set that we use to train the model.
Cost Function and gradient descent algorithm
The next step is to train our learning algorithm so that it can calculate the output value Y from the input data X_data. We know in advance that this is a linear regression model, so we use two parameters W and B to describe the model.
Our goal is to use TensorFlow code to find the best parameters W and B, so that the input data x_data will generate the output data y_data. In this case, there will be a line y_data=W* X_data +b. The reader knows that W is close to 0.1 and B is close to 0.3, but TensorFlow does not know that, and it needs to calculate the value itself.
The standard way to solve such problems is to iterate over each value in the data set and modify parameters W and B to get more accurate results each time. To ensure that the results get better over time, we define a cost function (also known as an “error function”) to measure how good (or bad) the results are.
This function takes the argument pairs W and B and returns a difference that represents how well the line fits the data. In our example, we use a variance to represent cost function. By means of the mean variance, we obtain the average “error” of the distance between the estimated value and the true value generated in each iteration of the algorithm.
More details on cost function and alternatives will be covered later, but in this case, the mean variance will help us step in the best direction.
Now it’s time to start implementing all the details of the above analysis programmatically in TensorFlow. Let’s start by creating three variables,
The Variable method is called to define a Variable that is stored in the TensorFlow internal graph data structure. We’ll look at the parameters in the method in more detail later, but I think it’s important for us to continue implementing the model for now.
Using the defined variable, cost function can be realized by the distance between the actual point and the function y= W * x + b. And then you square it, and you sum it up to get the average. In TensorFlow, this cost function can be expressed as:
Loss = tf.reduce_mean(tf.square(y-y_data)),
As you can see from the code, this expression evaluates the average square of the distance between y_data and the point y calculated from the input X_data.
At this point, the reader probably already knows that the line that best fits these points is the line with the smallest difference. Therefore, if we minimize the error function, we will get the best model from the data.
Instead of going into the details of the optimization function, we use the well-known gradient descent optimization algorithm to minimize the function. On the theoretical level, gradient descent algorithm is a function given a parameter set, starting from the initial value of the parameter set, and iterating step by step toward the parameter value minimized by the function. It is minimized by iterating forward in the negative direction of the functional gradient. Squaring the distance makes it easy to keep the value positive and make the error function differentiable to calculate the gradient.
The gradient descent algorithm starts with the initial value of the parameter set (W and B in our example), and then the algorithm gradually modifies these parameter values in the iterative process. After the algorithm is finished, the variable value makes the cost function obtain the minimum value.
To use this algorithm in TensorFlow, you only need to execute the following two lines of code:
The optimizer = tf. Train. GradientDescentOptimizer (0.5)
train = optimizer.minimize(loss)
By now, TensorFlow has enough information to create relevant data in its internal data structure, which also implements a gradient descent algorithm optimizer defined for cost Function, which may be invoked later in the training process. Later we will discuss the function’s argument – the learning rate (0.5 in our example).
Run the algorithm
As we learned earlier, the TensorFlow library called in the code just adds information to the inner graph, and TensorFlow has not yet run the algorithm. As in the example in the previous section, we need to create a session and call the run method with the train argument. Since we have already defined concrete variables, we must initialize them in advance, using the following command:
Now we can start the iterative process, and the algorithm will help us find the values of W and B that will best fit those points in the model we define. The training process does not stop until the specified precision is reached on the data set. In this particular example, we assume that only 8 iterations are sufficient, as follows:
The result of this code is that the values of W and B are close to what we already know. On the machine here, it prints the following:
If we use the following code to graphically display the result:
In the figure, we can see the straight line obtained after 8 iterations, with parameters W=0.0854 and B =0.299.
We only performed 8 iterations for simplicity, and if we iterate a few more times, the resulting parameter value will be closer to the real value. To print the values of W and b, run the following command:
print(step, sess.run(W), sess.run(b))
On my computer, the result appears as follows:
It can be found that the algorithm starts with the initial value W=-0.0484 and B =0.2972, and then iterates parameter values gradually to minimize cost function.
The gradual reduction of cost function can also be observed with the following code:
print(step,sess.run(loss))
On my machine, the result is:
Readers are advised to print out the graph after each iteration, so that we can observe the process of the algorithm adjusting parameter values each time. In this example, the snapshots of the 8 iterations are as follows:
From the figure, readers can find that the algorithm is getting better and better at data fitting in each iteration. Then, how does the gradient descent algorithm approach the parameter value gradually to minimize the cost function?
Because our error function consists of two parameters (W and b), it can be treated as a two-dimensional plane. Each point in the plane represents a line. The height of the function at each point is the wrong value for this line. Some lines in the plane contain smaller error values than others. When TensorFlow performs the gradient descent search, it starts at a point on the plane (in this example W= -0.04841119 and B = 0.29720169) and proceeds in the direction of the minimum difference.
To run the gradient descent algorithm on the error function, TensorFlow computes its gradient. The gradient is like a compass, guiding us in the smallest direction. To compute the gradient, TensorFlow takes the derivative of the error function, in our case, the algorithm needs to compute partial derivatives with respect to W and B to indicate the direction of progress in each iteration.
The learning rate mentioned earlier controls the step size of TensorFlow in each iteration. If this parameter is set too large, the minimum value may be exceeded. Conversely, if the parameter is too small, a large number of iterations are required to reach the minimum. Therefore, it is important to use the right learning rate. There are different techniques for selecting learning rates, but these are beyond the scope of this book. One way to ensure that the gradient descent algorithm works well is to make sure that errors are decreasing with each iteration.
To help you test and run the code in this chapter, you can download the regression. Py file from Github. Here is the complete code:
In this chapter, we learned the use of two basic components of TensorFlow library: cost function and gradient descent algorithm through a basic linear regression algorithm. In the next section we will look at the details of the underlying data structures in TensorFlow.