This paper is a note of the author’s study of TensorFlow2.0 (hereinafter written as TF2.0), using the textbook hands-on deep learning (TF2.0 version).
The reason why TensorFlow can be used to achieve linear regression is that we can regard linear regression as a fully connected network with only one layer and one neuron:
The graph above is linear regressionThe neural network representation of.
Achieve linear regression
In order to do linear regression, we need to
- Define the linear regression model
- Defining the Loss function
- Define iterative optimization algorithms
These are also key points in machine learning theory, which we can review in this article.
Define the linear regression model
To implement an algorithm, we first need to use vector expression to express it, that is, to describe a model using vectors, matrices. The advantage of this method is that the vector batch calculation is much faster than the calculation of each sample in the loop. The vector expression of linear regression is:
Among them,Is aThe matrix of dimensions,Represents n samples,Represents the dimension of the feature;Is the parameter of the model. It’s aA vector of dimensions;It’s the deviation, it’s a scalar;Is the predicted value of n samples, and it is alsoThe vector.
The model is implemented with TF2.0 as follows:
import tensorflow as tf
import numpy as np
import random
def linear_reg(X, w, b):
# matmul is matrix multiplication
return tf.matmul(X, w) + b
Copy the code
Defining the Loss function
Generally, the Loss function of the regression model is MSE (Mean Squared Error) :
On the type,Is the Observed Value of the sample, 和 Are all, n represents the average of Loss of N samples to avoid the influence of sample number on Loss. Since Loss is a scalar, the formula above needs to be adjusted as follows:
Loss is achieved by TF2.0 as follows:
def squared_loss(y, y_hat, n):
y_observed = tf.reshape(y, y_hat.shape)
return tf.matmul(tf.transpose(y_observed - y_hat),
y_observed - y_hat) / 2 / n
Copy the code
Define iterative optimization algorithms
Deep learning mostly adopts minibatch Stochastic Gradient Descent algorithm to iterate model parameters, which can save memory space, increase the number of iterations of the model and speed up the convergence of the model.
The SGD algorithm randomly selects part of the data from the samples every time, for example, 100 data are taken every time, and then the Loss of the 100 data is calculated. The gradient is calculated according to the Loss, and then the gradient is used to update the current parameters. Therefore, there are three steps:
- I’m going to pick n samples at random
- Calculate the Loss of the N samples, calculate the gradient, and update the parameters with the gradient
- Loops 1 and 2
So let’s take a look at the random sample selection code
def data_iter(features, labels, mini_batch):
Data iteration functions Args: -features: feature matrix NXD dimension -labels: sample, nx1 dimension -mini_batch: >>> mini_batch = 100 >>> for X, y in data_iter(features, labels, mini_batch): >>> # do gradient descent '''
features = np.array(features)
labels = np.array(labels)
indeces = list(range(len(features)))
random.shuffle(indeces)
for i in range(0, len(indeces), mini_batch):
j = np.array(indeces[i:min(i+mini_batch, len(features))])
yield features[j], labels[j]
Copy the code
Next, let’s look at the code for updating model parameters:
def sgd(params, lr):
Compute gradients and update model parameters Args: -params: model parameters, in this case [w, b] -lr: learning rate"
for param in params:
param.assign_sub(lr * t.gradient(l, param))
Copy the code
That’s the key code. Let’s string them together:
# Generate simulated data
# 1000 samples, 2-d feature
num_samples = 1000
num_dim = 2
# True weight, bias
w_real = [2.3.4]
b_real = 4.2
# Generate features that conform to normal distribution with a standard deviation of 1
features = tf.random.normal((num_samples, num_dim), stddev=1)
labels = features[:,0]*w_real[0] + features[:,1]*w_real[1] + b_real
# Add noise data to labels
labels += tf.random.normal(labels.shape, stddev=0.01)
# Learning rate, number of iterations
lr = 0.03
num_epochs = 3
Initialize model parameters
w = tf.Variable(tf.random.normal([num_dim, 1], stddev=0.01))
b = tf.Variable(tf.zeros(1,))
mini_batch = 10
# Start training
for i in range(num_epochs):
for X, y in data_iter(features, labels, mini_batch):
# Record gradient process in memory
with tf.GradientTape(persistent=True) as t:
t.watch([w, b])
# Calculate the loss of this small batch
l = squared_loss(y, linear_reg(X, w, b), mini_batch)
# Calculate gradients and update parameters
sgd([w, b], lr)
# Calculate the total error of this iteration
train_loss = squared_loss(labels, linear_reg(features, w, b), len(features))
print('epoch %d, loss %f' % (i + 1, tf.reduce_mean(train_loss)))
Copy the code
Simple implementation
The above code is implemented step by step according to the principle of linear regression. The steps are very clear but tedious. In fact, TF provides a rich library of algorithms for you to call, which greatly improves your work efficiency. Let’s replace the above code with the methods provided in the TF library.
Let’s use KerAS to define a fully connected network structure with only 1 layer, where you do not need to specify parameters:
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow import initializers as init
model = keras.Sequential()
model.add(layers.Dense(1, kernel_initializer=init.RandomNormal(stddev=0.01)))
Copy the code
Next set the Loss function to MSE:
from tensorflow import losses
loss = losses.MeanSquaredError()
Copy the code
Set the optimization policy to SGD:
from tensorflow.keras import optimizers
trainer = optimizers.SGD(learning_rate=0.03)
Copy the code
The code of small batch random data set acquisition is as follows:
from tensorflow import data as tfdata
batch_size = 10
dataset = tfdata.Dataset.from_tensor_slices((features, labels))
dataset = dataset.shuffle(len(features)).batch(batch_size)
Copy the code
As you can see, to build a model is to set some configuration items, without writing any logic, put the above code together as follows:
from tensorflow import data as tfdata
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow import initializers as init
from tensorflow import losses
from tensorflow.keras import optimizers
Set network structure: layer 1 full connection, initialize model parameters
model = keras.Sequential()
model.add(layers.Dense(1, kernel_initializer=init.RandomNormal(stddev=0.01)))
# Loss function: MSE
loss = losses.MeanSquaredError()
Optimization strategy: Stochastic gradient descent
trainer = optimizers.SGD(learning_rate=0.03)
# Set the data set, and the number of samples in small batches
batch_size = 10
dataset = tfdata.Dataset.from_tensor_slices((features, labels))
dataset = dataset.shuffle(len(features)).batch(batch_size)
num_epochs = 3
for epoch in range(1, num_epochs+1) :# Calculate in small batches
for (batch, (X, y)) in enumerate(dataset):
with tf.GradientTape() as tape:
# loss calculation
l = loss(model(X, training=True), y)
Calculate gradients and update parameters
grads = tape.gradient(l, model.trainable_variables)
trainer.apply_gradients(zip(grads, model.trainable_variables))
# Total loss after this iteration
l = loss(model(features), labels)
print('epoch %d, loss: %f' % (epoch, l.numpy().mean()))
# Output model parameters
print(model.get_weights())
Copy the code
The above code directly copy can be run (dependent library also need you to install), the students can start to try.
summary
This paper realizes a simple linear regression model through TF2.0, including
- According to the definition of the model, the definition of the loss function, and the definition of the iterative algorithm these basic steps to achieve a generalized neural network, although small, but all the five organs
- A more streamlined version was implemented with rich TF2.0 components aimed at understanding the use of TF2.0.
Reference:
- Hands-on deep Learning (TF2.0 version) – Linear regression is implemented from scratch
- Hands-on Deep Learning