preface

In fact, the idea of writing this blog is mainly to record some common API usage of TF2.0 and how to build a neural network by using TF.keras easily and quickly

1. Let’s start with TF.Keras

With it, we can easily build the network model we want to build, just like building blocks, layer by layer of network superimposed. However, the deep network will have problems such as gradient disappearance, so only a network model can be built, and some other knowledge methods are needed to optimize the effect of the model. For an introduction to the fashion-Mnist data set, please see the following link on Github for the introduction of fashion-Mnist

2. Talk about the commonly used optimization methods for image classification

  • 1. Normalization (standardization) of image data: accelerate network convergence. The specific principle can be imagined as that concentric circles reach the center of the circle the fastest along the gradient, while irregular graphs reach the center along the gradient very zigzagging

  • 2. Data feature enhancement:link
  • 3. Network hyperparameter search: get the best model parameters, mainly grid search, random search, genetic algorithm, heuristic search
  • 4. Application of dropout, earlystopping, regularization and other methods to prevent model overfitting by adding layers of forgetting, regularization, and earlystopping

3. Implementation code and results

# Import some common libraries and add them later
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import sklearn
import os
import sys

Check the version and make sure it is 2.0
print(tf.__version__)
Copy the code

# Use the built-in kerAS module to import data, cut training set, verification set and test set, and standardize training data
fashion_mnist=keras.datasets.fashion_mnist
(x_train_all,y_train_all),(x_test,y_test)=fashion_mnist.load_data()
print(x_train_all.shape)
print(y_train_all.shape)
print(x_test.shape)
print(y_test.shape)

# Split training set and verification set
x_train,x_valid=x_train_all[5000:],x_train_all[:5000]
y_train,y_valid=y_train_all[5000:],y_train_all[:5000]

print(x_train.shape)
print(y_train.shape)
print(x_valid.shape)
print(y_valid.shape)


# standardization
from sklearn.preprocessing import StandardScaler

scaler=StandardScaler()
x_train_scaled=scaler.fit_transform(x_train.astype(np.float32).reshape(-1.1)).reshape(-1.28.28)
x_valid_scaled=scaler.fit_transform(x_valid.astype(np.float32).reshape(-1.1)).reshape(-1.28.28)
x_test_scaled=scaler.fit_transform(x_test.astype(np.float32).reshape(-1.1)).reshape(-1.28.28)
Copy the code
Visualize the image and the corresponding tag
# Show multiple images
def show_imgs(n_rows,n_cols,x_data,y_data,class_names) :
    assert len(x_data)==len(y_data)# Determine whether the input data information corresponds to the same
    assert n_rows*n_cols<=len(x_data)# Ensure that there will not be insufficient data
    plt.figure(figsize=(n_cols*2,n_rows*1.6))
    for row in range(n_rows):
        for col in range(n_cols):
            index=n_cols*row+col   # get the index of the currently displayed image
            plt.subplot(n_rows,n_cols,index+1)
            plt.imshow(x_data[index],cmap="binary",interpolation="nearest")
            plt.axis("off")
            plt.title(class_names[y_data[index]])
    plt.show()
    
class_names=['t-shirt'.'trouser'.'pullover'.'dress'.'coat'.'sandal'.'shirt'.'sneaker'.'bag'.'ankle boot']
show_imgs(5.5,x_train,y_train,class_names)
Copy the code

# Build a network model

model=keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28.28]))
model.add(keras.layers.Dense(300,activation="relu"))
model.add(keras.layers.Dense(100,activation="relu"))
model.add(keras.layers.Dense(10,activation="softmax"))
model.compile(loss="sparse_categorical_crossentropy",optimizer="adam",metrics=["acc"])
model.summary()
Copy the code

Where do we get the numbers in params in the network information? y=wX plus b and then we go from None,784, to None, 300 according to the matrix multiplication rule and the matrix in the middle is 784,300 and then the size of the offset term B is 300, so 784300 plus 300 is 235,500. That’s a little detail just to mention it a little bit.

# Train, and keep the best models, records of training, and use early stops to prevent overfitting
import datetime
current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
logdir = os.path.join('logs', current_time)
output_model=os.path.join(logdir,"fashionmnist_model.h5")
callbacks=[
    keras.callbacks.TensorBoard(log_dir=logdir),
    keras.callbacks.ModelCheckpoint(output_model,save_best_only=True),
    keras.callbacks.EarlyStopping(patience=5,min_delta=1e-3)
          ]


history=model.fit(x_train_scaled,y_train,epochs=30,validation_data=(x_valid_scaled,y_valid),callbacks=callbacks)
Copy the code

Before, I used the folder named by myself to run with TensorBoard and ModelCheckpoint, but there was an error. I searched the bug on Windows, and this is a solution. Then I opened TensorBoard to have a look. The best models are also saved as H5 files for easy invocation

def plot_learning_curves(history) :
    pd.DataFrame(history.history).plot(figsize=(8.5))
    plt.grid()
    plt.gca().set_ylim(0.1)
    plt.show()

plot_learning_curves(history)
Copy the code

This is the change in each training, and similar to the above

# Finally, accuracy on the test set
loss,acc=model.evaluate(x_test_scaled,y_test,verbose=0)
print("The loss on the test set is :",loss)
print("The accuracy on the test set is :",acc)
Copy the code

# Get the predicted tags on the test set, visualizing the difference between real tags
y_pred=model.predict(x_test_scaled)
predict = np.argmax(y_pred,axis=1) 

show_imgs(3.5,x_test,predict,class_names)
show_imgs(3.5,x_test,y_test,class_names)
Copy the code

Predicted resultsReal results

4. Conclusion:

See the above example, using tf.keras to build a model is written

model=keras.models.Sequential() model.add(...) model.add(...) . model.compile(...). model.fit(...)Theta, of course, could be written as thetamodel=keras.models.Sequential([ ... ... ... ] )# There's not much difference


And I can write it as a functioninputs=... hidden1=... (inputs) ....# subclass
class. :.Copy the code

But for the parameters in the model, For example, the choice of a loss function (“sparse_categorical_crossentropy” and “categorical_crossentropy”) Or “binary_crossentropy”) when to use which loss function is most appropriate, the choice of activation function in each layer of the network, the choice of optimizer… I didn’t give you an example of using hyperparametric search to get optimal model parameters, but I think I’ll write an example of hyperparametric search next time.