This paper is a note of the author’s study of TensorFlow2.0 (hereinafter written as TF2.0), using the textbook hands-on deep learning (TF2.0 version).

The reason why TensorFlow can be used to achieve linear regression is that we can regard linear regression as a fully connected network with only one layer and one neuron:

The graph above is linear regressionThe neural network representation of.

Achieve linear regression

In order to do linear regression, we need to

  1. Define the linear regression model
  2. Defining the Loss function
  3. Define iterative optimization algorithms

These are also key points in machine learning theory, which we can review in this article.

Define the linear regression model

To implement an algorithm, we first need to use vector expression to express it, that is, to describe a model using vectors, matrices. The advantage of this method is that the vector batch calculation is much faster than the calculation of each sample in the loop. The vector expression of linear regression is:


Among them,Is aThe matrix of dimensions,Represents n samples,Represents the dimension of the feature;Is the parameter of the model. It’s aA vector of dimensions;It’s the deviation, it’s a scalar;Is the predicted value of n samples, and it is alsoThe vector.

The model is implemented with TF2.0 as follows:

import tensorflow as tf
import numpy as np
import random

def linear_reg(X, w, b):
  # matmul is matrix multiplication
  return tf.matmul(X, w) + b
Copy the code

Defining the Loss function

Generally, the Loss function of the regression model is MSE (Mean Squared Error) :


On the type,Is the Observed Value of the sample,Are all, n represents the average of Loss of N samples to avoid the influence of sample number on Loss. Since Loss is a scalar, the formula above needs to be adjusted as follows:


Loss is achieved by TF2.0 as follows:

def squared_loss(y, y_hat, n):
  y_observed = tf.reshape(y, y_hat.shape)
  return tf.matmul(tf.transpose(y_observed - y_hat), 
                   y_observed - y_hat) / 2 / n
Copy the code

Define iterative optimization algorithms

Deep learning mostly adopts minibatch Stochastic Gradient Descent algorithm to iterate model parameters, which can save memory space, increase the number of iterations of the model and speed up the convergence of the model.

The SGD algorithm randomly selects part of the data from the samples every time, for example, 100 data are taken every time, and then the Loss of the 100 data is calculated. The gradient is calculated according to the Loss, and then the gradient is used to update the current parameters. Therefore, there are three steps:

  1. I’m going to pick n samples at random
  2. Calculate the Loss of the N samples, calculate the gradient, and update the parameters with the gradient
  3. Loops 1 and 2

So let’s take a look at the random sample selection code

def data_iter(features, labels, mini_batch):
  Data iteration functions Args: -features: feature matrix NXD dimension -labels: sample, nx1 dimension -mini_batch: >>> mini_batch = 100 >>> for X, y in data_iter(features, labels, mini_batch): >>> # do gradient descent '''
  features = np.array(features)
  labels = np.array(labels)
  indeces = list(range(len(features)))
  random.shuffle(indeces)
  for i in range(0, len(indeces), mini_batch):
    j = np.array(indeces[i:min(i+mini_batch, len(features))])
    yield features[j], labels[j]
Copy the code

Next, let’s look at the code for updating model parameters:

def sgd(params, lr):
  Compute gradients and update model parameters Args: -params: model parameters, in this case [w, b] -lr: learning rate"
  for param in params:
    param.assign_sub(lr * t.gradient(l, param))
Copy the code

That’s the key code. Let’s string them together:

# Generate simulated data
# 1000 samples, 2-d feature
num_samples = 1000
num_dim = 2
# True weight, bias
w_real = [2.3.4]
b_real = 4.2
# Generate features that conform to normal distribution with a standard deviation of 1
features = tf.random.normal((num_samples, num_dim), stddev=1)
labels = features[:,0]*w_real[0] + features[:,1]*w_real[1] + b_real 
# Add noise data to labels
labels += tf.random.normal(labels.shape, stddev=0.01)
# Learning rate, number of iterations
lr = 0.03
num_epochs = 3
Initialize model parameters
w = tf.Variable(tf.random.normal([num_dim, 1], stddev=0.01))
b = tf.Variable(tf.zeros(1,))
mini_batch = 10
# Start training
for i in range(num_epochs):
    for X, y in data_iter(features, labels, mini_batch):
    		# Record gradient process in memory
        with tf.GradientTape(persistent=True) as t:
            t.watch([w, b])
            # Calculate the loss of this small batch
            l = squared_loss(y, linear_reg(X, w, b), mini_batch)
        # Calculate gradients and update parameters
        sgd([w, b], lr)
    # Calculate the total error of this iteration
    train_loss = squared_loss(labels, linear_reg(features, w, b), len(features))
    print('epoch %d, loss %f' % (i + 1, tf.reduce_mean(train_loss)))
Copy the code

Simple implementation

The above code is implemented step by step according to the principle of linear regression. The steps are very clear but tedious. In fact, TF provides a rich library of algorithms for you to call, which greatly improves your work efficiency. Let’s replace the above code with the methods provided in the TF library.

Let’s use KerAS to define a fully connected network structure with only 1 layer, where you do not need to specify parameters:

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow import initializers as init

model = keras.Sequential()
model.add(layers.Dense(1, kernel_initializer=init.RandomNormal(stddev=0.01)))
Copy the code

Next set the Loss function to MSE:

from tensorflow import losses

loss = losses.MeanSquaredError()
Copy the code

Set the optimization policy to SGD:

from tensorflow.keras import optimizers

trainer = optimizers.SGD(learning_rate=0.03)
Copy the code

The code of small batch random data set acquisition is as follows:

from tensorflow import data as tfdata

batch_size = 10
dataset = tfdata.Dataset.from_tensor_slices((features, labels))
dataset = dataset.shuffle(len(features)).batch(batch_size)
Copy the code

As you can see, to build a model is to set some configuration items, without writing any logic, put the above code together as follows:

from tensorflow import data as tfdata
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow import initializers as init
from tensorflow import losses
from tensorflow.keras import optimizers

Set network structure: layer 1 full connection, initialize model parameters
model = keras.Sequential()
model.add(layers.Dense(1, kernel_initializer=init.RandomNormal(stddev=0.01)))
# Loss function: MSE
loss = losses.MeanSquaredError()
Optimization strategy: Stochastic gradient descent
trainer = optimizers.SGD(learning_rate=0.03)
# Set the data set, and the number of samples in small batches
batch_size = 10
dataset = tfdata.Dataset.from_tensor_slices((features, labels))
dataset = dataset.shuffle(len(features)).batch(batch_size)

num_epochs = 3
for epoch in range(1, num_epochs+1) :# Calculate in small batches
    for (batch, (X, y)) in enumerate(dataset):
        with tf.GradientTape() as tape:
            # loss calculation
            l = loss(model(X, training=True), y)
        Calculate gradients and update parameters
        grads = tape.gradient(l, model.trainable_variables)
        trainer.apply_gradients(zip(grads, model.trainable_variables))
    
    # Total loss after this iteration
    l = loss(model(features), labels)
    print('epoch %d, loss: %f' % (epoch, l.numpy().mean()))
# Output model parameters
print(model.get_weights())
Copy the code

The above code directly copy can be run (dependent library also need you to install), the students can start to try.

summary

This paper realizes a simple linear regression model through TF2.0, including

  1. According to the definition of the model, the definition of the loss function, and the definition of the iterative algorithm these basic steps to achieve a generalized neural network, although small, but all the five organs
  2. A more streamlined version was implemented with rich TF2.0 components aimed at understanding the use of TF2.0.

Reference:

  • Hands-on deep Learning (TF2.0 version) – Linear regression is implemented from scratch
  • Hands-on Deep Learning