The training and evaluation of “lead” model is the core of the whole machine learning task flow. Only by mastering the correct training and evaluation methods, and using them flexibly, can we carry out experimental analysis and verification more quickly, and thus have a deeper understanding of the model.
preface
This article will focus on the processes and methods for local training, evaluation, and prediction using Keras models, building on the three approaches to building models using Keras in TensorFlow 2.x. Keras model has two ways of training evaluation. One way is to use the model built-in API, such as Model.fit (), model.evaluate() and model.predict() to perform different operations respectively. Another approach is to customize the training and evaluation process with the Eager execution strategy and GradientTape objects. Both approaches work on the same principle for all Keras models, and there is no essential difference. In general, we prefer to use the first training evaluation method because it is simpler and easier to use, while in some special cases, we may consider using a custom method to complete the training and evaluation.
Built-in API for training evaluation
Complete end-to-end example
Here is an example of an end-to-end training evaluation implemented using the built-in API of the model, which can be thought of as being used to solve a multi-classification problem. A functional API is used to build the Keras model, but the model can also be defined using Sequential and subclassing methods. The sample code looks like this:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
# Train and Test data from numpy array.
x_train, y_train = (
np.random.random((60000.784)),
np.random.randint(10, size=(60000.1)),
)
x_test, y_test = (
np.random.random((10000.784)),
np.random.randint(10, size=(10000.1)))# Reserve 10,000 samples for validation.
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
# Model Create
inputs = keras.Input(shape=(784, ), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
# Model Compile.
model.compile(
# Optimizer
optimizer=keras.optimizers.RMSprop(),
# Loss function to minimize
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
# List of metrics to monitor
metrics=['sparse_categorical_accuracy'],)# Model Training.
print('# Fit model on training data')
history = model.fit(
x_train,
y_train,
batch_size=64,
epochs=3.# We pass some validation for monitoring validation loss and metrics
# at the end of each epoch
validation_data=(x_val, y_val),
)
print('\nhistory dict:', history.history)
# Model Evaluate.
print('\n# Evaluate on test data')
results = model.evaluate(x_test, y_test, batch_size=128)
print('test loss, test acc:', results)
# Generate predictions (probabilities -- the output of the last layer)
# Model Predict.
print('\n# Generate predictions for 3 samples')
predictions = model.predict(x_test[:3])
print('predictions shape:', predictions.shape)
Copy the code
It can be seen from the code that to complete the whole process of model training and evaluation, the model must be built well first. Then, the models are compiled to specify the optimizer, loss functions and metrics that need to be used during model training. Then, the training and cross-validation (FIT) of the model is started. This step can only be continued if the training data and validation data are specified in advance and some parameters such as EPOCHS are set. The cross-validation operation will be triggered automatically after the end of each epoch training. Finally, evaluate and predict. We judge the quality of the model according to the evaluation and prediction results. Now that a complete model training and evaluation process is complete, let’s expand on some of the implementation details in the example.
Model compile
-
Model compilation should be carried out first before model training, because only when we know what targets to optimize, how to optimize and what indicators to focus on, can the model be properly trained and adjusted. The compile method contains three main parameters: loss to be optimized, which indicates the target to be optimized, optimizer, which indicates the direction of the target optimization, and optional metrics, which indicates the model indicators to be concerned with during training. The Keras API already includes a number of built-in loss functions, optimizers, and metrics that are ready to use for most training needs.
-
Losses function class is mainly under tf.keras.losses module, which contains a variety of predefined losses, such as BinaryCrossentropy, Multiple categorical loss and root mean square loss etc. Passed to compile the parameter can be either a string such as binary_crossentropy can also be a corresponding losses solid such as tf. Keras. Losses. BinaryCrossentropy (), When we need to set some parameters of the loss function (such as from_logits=True in the above example), we need to use instance parameters.
-
Optimizers are available in tF. keras.optimizers, including SGD, Adam, and RMSprop. It can also be passed to the compile method in the form of strings or instances. Generally, the optimizer parameter we need to set is mainly the learning rate, and other parameters can be set dynamically according to the specific implementation of each optimizer, or directly use its default value.
-
Indicators are mainly under the module tf.keras.metrics, including AUC indicators commonly used in dichotomies and Recall indicators commonly used in lookalike. Similarly, it can be passed to the compile method as a string or an instance. Note that the compile method receives a metric list, so it can pass multiple metrics.
-
Of course, if the losses in the Losses module or the metrics module do not meet your requirements, you can also customize their implementation.
-
For custom losses, there are two ways. One is to define a loss function that takes two input arguments y_true and y_pred, then calculates the loss inside the function and returns it. The code is as follows:
def basic_loss_function(y_true, y_pred) : return tf.math.reduce_mean(tf.abs(y_true - y_pred)) model.compile(optimizer=keras.optimizers.Adam(), loss=basic_loss_function) Copy the code
-
If you need a loss function that contains more than two of the above parameters, you can use another subclass approach. Define a class that inherits from the tF.keras.Losses.Loss class and implements its __init__(self) and call(self, y_true, y_pred) methods in a similar way to the subclassing layer and model. For example, to achieve a weighted dichotomous cross entropy loss, the code is as follows:
class WeightedBinaryCrossEntropy(keras.losses.Loss) : """ Args: pos_weight: Scalar to affect the positive labels of the loss function. weight: Scalar to affect the entirety of the loss function. from_logits: Whether to compute loss from logits or the probability. reduction: Type of tf.keras.losses.Reduction to apply to loss. name: Name of the loss function. """ def __init__(self, pos_weight, weight, from_logits=False, reduction=keras.losses.Reduction.AUTO, name='weighted_binary_crossentropy') : super().__init__(reduction=reduction, name=name) self.pos_weight = pos_weight self.weight = weight self.from_logits = from_logits def call(self, y_true, y_pred) : ce = tf.losses.binary_crossentropy( y_true, y_pred, from_logits=self.from_logits, )[:, None] ce = self.weight * (ce * (1 - y_true) + self.pos_weight * ce * y_true) return ce model.compile( optimizer=keras.optimizers.Adam(), loss=WeightedBinaryCrossEntropy( pos_weight=0.5, weight=2, from_logits=True,),)Copy the code
-
For user-defined metrics, it can also be subclassed. First, define an indicator class inherited from tF.keras.metrics.Metric and implement its four methods, __init__(self), which are used to create state variables. Update_state (self, y_true, y_pred, sample_weight=None), result(self), return the final result of the status variable, And the reset_states(self) method, which reinitializes state variables. For example, to achieve a statistical indicator of the number of True Positives in a multiple category, the code would be as follows:
class CategoricalTruePositives(keras.metrics.Metric) : def __init__(self, name='categorical_true_positives', **kwargs) : super().__init__(name=name, **kwargs) self.true_positives = self.add_weight(name='tp', initializer='zeros') def update_state(self, y_true, y_pred, sample_weight=None) : y_pred = tf.reshape(tf.argmax(y_pred, axis=1), shape=(-1.1)) values = tf.cast(y_true, 'int32') == tf.cast(y_pred, 'int32') values = tf.cast(values, 'float32') if sample_weight is not None: sample_weight = tf.cast(sample_weight, 'float32') values = tf.multiply(values, sample_weight) self.true_positives.assign_add(tf.reduce_sum(values)) def result(self) : return self.true_positives def reset_states(self) : # The state of the metric will be reset at the start of each epoch. self.true_positives.assign(0.) model.compile( optimizer=keras.optimizers.RMSprop(learning_rate=1e-3), loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=[CategoricalTruePositives()], ) Copy the code
-
For some losses defined in layers, this can be achieved by calling self.add_loss() in the call method of the custom layer, and it will be automatically added to the overall loss during model training without human intervention. By comparing the changes of loss value output by model training before and after adding custom loss, it can confirm whether this part of loss is added into the overall loss. You can also print the Model.Losses after building the model to see all losses for the model. Note that the regularization loss is built into all layers of the Keras, so you just need to add the corresponding regularization parameter when calling the layer instead of add_loss() in the call method.
-
For metrics, call self.add_metric() in the call method of the custom layer to add metrics, and again, it will automatically appear in the overall metrics without human intervention.
-
Models implemented by functional APIS can have the same effect as custom models by calling model.add_loss() and model.add_metric(). Example code is as follows:
import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers inputs = keras.Input(shape=(784, ), name='digits') x1 = layers.Dense(64, activation='relu', name='dense_1')(inputs) x2 = layers.Dense(64, activation='relu', name='dense_2')(x1) outputs = layers.Dense(10, name='predictions')(x2) model = keras.Model(inputs=inputs, outputs=outputs) model.add_loss(tf.reduce_sum(x1) * 0.1) model.add_metric( keras.backend.std(x1), name='std_of_activation', aggregation='mean', ) model.compile( optimizer=keras.optimizers.RMSprop(1e-3), loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), ) model.fit(x_train, y_train, batch_size=64, epochs=1) Copy the code
-
-
If you are compiling a multiple-input, multiple-output model, you can specify a different loss function for each output, along with a different metric, as described below.
Model Training and Validation (FIT)
-
The training of the model is realized by calling model.fit() method, which includes training data and validation data parameters, which can be numpy type data or dataset type data under tF.data module. In addition, FIT methods also include epochs, BATCH_size, steps_per_EPOCH and other parameters to control the training process. In addition, callbacks can control the model to perform some other operations during training. For example, Tensorboard logging.
-
The training and validation data for the model can be of numPY type, and the initial end-to-end example uses a NUMPY array as input. Generally, numPY data is used as the data input for training and evaluation when the amount of data is small and the memory capacity is sufficient.
-
For numpy data, if the EPOchs parameter is specified, the total amount of training data is the original sample number x EPOchs.
-
By default, in one round of training (EPOCH), all the original samples will be trained once, and in the next round of training, these sample data will be used for training. The number of steps performed in each round is the original sample number /batch_size. If batch_size is not specified, the default value is 32. Cross validation is triggered at the end of each training round and is also performed on all validation samples. You can specify validATION_batCH_size to control the batch size of validation data. If you do not specify validation_BATCH_size, the default value is the same as batch_size.
-
For NUMPY data, if the steps_per_EPOCH parameter is set, indicating the specified number of steps to be trained in one round, the next round will continue training using the next batch of data based on the previous round until all epochs end or the total amount of training data is exhausted. If the training process does not end due to data depletion, the total amount of data needs to be greater than steps_per_EPOCH * EPOchs * batCH_size. You can also set validation_steps to indicate the number of steps required for cross validation. In this case, ensure that the total data in the validation set is greater than VALIDATION_steps * VALIDATION_batch_size.
-
The FIT method also provides another parameter, validation_split, to automatically reserve a percentage of the training set for validation. This parameter ranges from 0 to 1. For example, 0.2 represents 20% of the training set for validation. The FIT method defaults to keeping the last 20% of the NUMpy array as the validation set.
-
-
After TensorFlow 2.0, it is more recommended to use data of dataset type under Tf. data module as data input for training and verification, which can load and preprocess data in a faster and more scalable way.
-
The training code using dataset is as follows:
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)) # Shuffle and slice the dataset. train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64) # Prepare the validation dataset val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val)) val_dataset = val_dataset.batch(64) # Now we get a test dataset. test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)) test_dataset = test_dataset.batch(64) # Since the dataset already takes care of batching, # we don't pass a `batch_size` argument. model.fit(train_dataset, epochs=3, validation_data=val_dataset) result = model.evaluate(test_dataset) Copy the code
-
The dataset is generally a binary group. The first element is the input feature of the model. If there are many inputs, it is a dict or tuple with multiple features.
-
The from_tensor_slices method can be used to directly generate dataset type data from the Nunpy array, which is a convenient and quick generation method and is generally used during testing. Other commonly used generation methods, such as generating dataset from TFRecord file or TextLine file, can refer to the concrete implementation of related classes under tF.data module.
-
Dataset can call built-in methods to preprocess data in advance, such as shuffle, Batch and repeat operations. Shuffle operation is to reduce the probability of model overfitting, and it is only a small range of perturbation. It needs to use a cache area to fill data first, and then randomly extract batCH_size data from the cache area during each training. The resulting gaps are filled with subsequent data, thus achieving the effect of local perturbation. Batch refers to the batch of data, which is often used to control and adjust the training speed and training effect of the model. Since the batch has been used in the dataset, the BATCH_size in fit method is no longer needed to be provided. Repeat is used to replicate data to solve the problem of insufficient data quantity. If its parameter count is specified, it means that the entire data set should be replicated count times; otherwise, it will be replicated indefinitely. In this case, steps_per_EPOCH must be set, otherwise the training cannot be terminated.
-
In the above example, all the data in the train dataset will be trained in each round because the dataset will be reset at the end of the round and then used to retrain. However, when steps_per_EPOCH is specified, the dataset will not be reset after each training round until all epochs end or all training data is consumed. Ensure that the total amount of training data to be provided is greater than steps_per_EPOCH * EPOchs * batCH_size. Similarly, you can specify validation_STEPS so that the validation validation executes the specified number of steps. At the start of the next validation, the Validation dataset is reset to ensure that the same data is used for each cross-validation. The validation_split parameter does not apply to dataset type data because it needs to know the index of each data sample, which is difficult under the DATASET API.
-
When steps_per_EPOCH is not specified, the processing flow of NUMpy data is the same as that of dataset data. But when specified, note the difference in treatment between them. For numpy type data, it will be converted to dataset type data during processing, but the dataset will be repeated epochs times. Moreover, after each training round, the dataset will not be reset, and the training will continue after the last batch. Assume that the original data amount is N and specify steps_per_epoch, the difference between the two is mainly reflected in the real training data amount, where NUMpy is N * epochs and dataset is N. You can refer to the source code for details.
-
Dataset and map and Prefetch methods are more practical. The map method takes a function as a parameter that processes each piece of data in the dataset and returns a new dataset. For example, we generate a dataset after reading a text file using TextLineDataset. The map method is used to extract some columns of the input data as features and some columns as labels. The prefetch method pre-prepares the data required by the next training from the dataset and puts it in memory, which can reduce the delay waiting time between each training round.
-
-
In addition to training and validation data, you can also pass sample weight (sample_weight) and class_weight (class_weight) parameters to the FIT method. These two parameters are usually used to deal with the classification imbalance problem, so that the contribution of each category to the overall loss tends to be consistent by giving higher weight to samples with fewer categories.
-
For input data of type NUMpy, you can use the above two parameters. For example, if you want to give a higher weight to category 5, you can use the following code:
import numpy as np # Here's the same example using `class_weight` class_weight = {0: 1..1: 1..2: 1..3: 1..4: 1..# Set weight "2" for class "5", # making this class 2x more important 5: 2..6: 1..7: 1..8: 1..9: 1.} print('Fit with class weight') model.fit(x_train, y_train, class_weight=class_weight, batch_size=64, epochs=4) # Here's the same example using `sample_weight` instead: sample_weight = np.ones(shape=(len(y_train), )) sample_weight[y_train == 5] = 2. print('\nFit with sample weight') model.fit( x_train, y_train, sample_weight=sample_weight, batch_size=64, epochs=4.)Copy the code
-
For the input data of the dataset type, the above two parameters cannot be used directly. You need to add sample_weight to the input data of the dataset to return the dataset of a triple. The value is in the format of input_batch, target_batch, sample_weight_batch. The sample code looks like this:
sample_weight = np.ones(shape=(len(y_train), )) sample_weight[y_train == 5] = 2. # Create a Dataset that includes sample weights # (3rd element in the return tuple). train_dataset = tf.data.Dataset.from_tensor_slices(( x_train, y_train, sample_weight, )) # Shuffle and slice the dataset. train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64) model.fit(train_dataset, epochs=3) Copy the code
-
-
There are some special time points in the model training process, such as the end of a batch or an epoch, generally some additional processing operations will be done to assist us in training, and the model cross verification introduced above is one of them. There are also some other operations, such as when the model training stagnates (the loss value fluctuates around a certain value), the learning rate is automatically reduced to make the loss continue to decrease, so as to achieve better convergence effect. During the training process, the weight information of the model is saved so that the training can be continued on the basis of the existing weight when the model is restarted, thus reducing the training time. At the end of each round of training, loss and metrics information of the model should be recorded for Tensorboard analysis, etc. These operations are an indispensable part of model training. They can all be implemented as callbacks under the tF.keras.callbacks module, which can be passed as list parameters to fit methods for different operational purposes.
-
The following uses EarlyStopping as an example to illustrate how to use callbacks. In this case, when the cross-validation loss val_loss is lower than 1E-2 for at least 2 rounds of training (EPOCHS), we will stop the training in advance. The sample code is shown below:
callbacks = [ keras.callbacks.EarlyStopping( # Stop training when `val_loss` is no longer improving monitor='val_loss'.# "no longer improving" being defined as "no better than 1e-2 less" min_delta=1e-2.# "no longer improving" being further defined as "for at least 2 epochs" patience=2, verbose=1, ) ] model.fit( x_train, y_train, epochs=20, batch_size=64, callbacks=callbacks, validation_split=0.2.)Copy the code
-
Some commonly used callbacks need to be understood and mastered, such as ModelCheckpoint used to store model weight information, TensorBoard used to record some index information, ReduceLROnPlateau used to reduce the learning rate when the model stagnated. For more callbacks, see the implementation in the tf.keras.callbacks module.
-
Also can define callbacks, of course. The subclasses need to inherit from tf keras. Callbacks. The Callback class, and according to the need to implement its built-in methods, such as if you need at the end of each batch training record loss value, you can use the following code:
class LossHistory(keras.callbacks.Callback) : def on_train_begin(self, logs) : self.losses = [] def on_batch_end(self, batch, logs) : self.losses.append(logs.get('loss')) Copy the code
-
Prior to TensorFlow 2.0, ModelCheckpoint content and TensorBoard content were recorded at the same time and stored in the same folder, whereas in the Keras API after 2.0 they can be specified separately through different callback functions. In log files, the files containing the checkpoint keyword are generally checkpoint files, and the files containing the events.out. tfEvents keyword are generally Tensorboard files.
-
Multiple input output model
-
Consider the multiple-input, multiple-output model shown in the figure, which consists of two inputs and two outputs, score_output representing the score and class_output representing the classification, with the following example code:
from tensorflow import keras from tensorflow.keras import layers image_input = keras.Input(shape=(32.32.3), name='img_input') timeseries_input = keras.Input(shape=(None.10), name='ts_input') x1 = layers.Conv2D(3.3)(image_input) x1 = layers.GlobalMaxPooling2D()(x1) x2 = layers.Conv1D(3.3)(timeseries_input) x2 = layers.GlobalMaxPooling1D()(x2) x = layers.concatenate([x1, x2]) score_output = layers.Dense(1, name='score_output')(x) class_output = layers.Dense(5, name='class_output')(x) model = keras.Model( inputs=[image_input, timeseries_input], outputs=[score_output, class_output], ) Copy the code
-
During model compilation, if only one Loss is specified, it obviously cannot meet the loss calculation methods of different outputs. Therefore, loss can be specified as a list at this time, in which each element corresponds to different outputs. Example code is as follows:
model.compile( optimizer=keras.optimizers.RMSprop(1e-3), loss=[ keras.losses.MeanSquaredError(), keras.losses.CategoricalCrossentropy(from_logits=True) ], loss_weights=[1.1],)Copy the code
At this time, the optimization goal of the model is the sum of all single loss values. If you want to specify different weights for different losses, you can set the loss_weights parameter, which receives a list of scalar coefficients to weight the loss values of different outputs of the model. If only one Loss is specified for the model, the loss will be applied to each output. This method can be used if multiple output losses of the model are calculated in the same way.
-
Similarly, you can specify multiple metrics for a model. Note that since the metrics parameter is itself a list, you should specify metrics for multiple outputs using a two-dimensional list. Example code is as follows:
model.compile( optimizer=keras.optimizers.RMSprop(1e-3), loss=[ keras.losses.MeanSquaredError(), keras.losses.CategoricalCrossentropy(from_logits=True), ], metrics=[ [ keras.metrics.MeanAbsolutePercentageError(), keras.metrics.MeanAbsoluteError() ], [keras.metrics.CategoricalAccuracy()], ], ) Copy the code
-
For outputs with explicit names, loss and metrics can be set in a dictionary manner. Example code is as follows:
model.compile( optimizer=keras.optimizers.RMSprop(1e-3), loss={ 'score_output': keras.losses.MeanSquaredError(), 'class_output': keras.losses.CategoricalCrossentropy(from_logits=True), }, metrics={ 'score_output': [ keras.metrics.MeanAbsolutePercentageError(), keras.metrics.MeanAbsoluteError() ], 'class_output': [ keras.metrics.CategoricalAccuracy(), ] }, ) Copy the code
-
Loss may also not be specified for outputs that are used only for prediction. Example code is as follows:
model.compile( optimizer=keras.optimizers.RMSprop(1e-3), loss=[ None, keras.losses.CategoricalCrossentropy(from_logits=True),,)# Or dict loss version model.compile( optimizer=keras.optimizers.RMSprop(1e-3), loss={ 'class_output': keras.losses.CategoricalCrossentropy(from_logits=True),},)Copy the code
-
For the training of the multi-input/output model, data input can also be provided in the same way as its compile method, that is, multiple inputs can be specified in the form of lists or dictionaries.
-
Example code for numpy data is as follows:
# Generate dummy Numpy data img_data = np.random.random_sample(size=(100.32.32.3)) ts_data = np.random.random_sample(size=(100.20.10)) score_targets = np.random.random_sample(size=(100.1)) class_targets = np.random.random_sample(size=(100.5)) # Fit on lists model.fit( x=[img_data, ts_data], y=[score_targets, class_targets], batch_size=32, epochs=3.)# Alternatively, fit on dicts model.fit( x={ 'img_input': img_data, 'ts_input': ts_data, }, y={ 'score_output': score_targets, 'class_output': class_targets, }, batch_size=32, epochs=3.)Copy the code
-
Sample code for dataset type data is as follows:
# Generate dummy dataset data from numpy train_dataset = tf.data.Dataset.from_tensor_slices(( (img_data, ts_data), (score_targets, class_targets), )) # Alternatively generate with dict train_dataset = tf.data.Dataset.from_tensor_slices(( { 'img_input': img_data, 'ts_input': ts_data, }, { 'score_output': score_targets, 'class_output': class_targets, }, )) train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64) model.fit(train_dataset, epochs=3) Copy the code
-
Customize the training process
-
If you don’t want to use the Fit and Evaluate methods provided in model, and want to customize the training and evaluation process for your model using a lower-level API, GradientTape can help. In the backward propagation process, the deep neural network needs to calculate the derivative of loss (also known as gradient) with respect to the weight matrix to update the weight matrix and obtain the optimal solution. GradientTape can automatically provide derivative assistance without us taking the derivative manually. It is essentially a derivative recorder that can record the propagation process of the preceding terms. And compute the derivative from that.
-
The model construction process is no different from before, mainly reflected in the training part, the sample code is as follows:
import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers import numpy as np # Get the model. inputs = keras.Input(shape=(784, ), name='digits') x = layers.Dense(64, activation='relu', name='dense_1')(inputs) x = layers.Dense(64, activation='relu', name='dense_2')(x) outputs = layers.Dense(10, name='predictions')(x) model = keras.Model(inputs=inputs, outputs=outputs) # Instantiate an optimizer. optimizer = keras.optimizers.SGD(learning_rate=1e-3) # Instantiate a loss function. loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True) # Prepare the metrics. train_acc_metric = keras.metrics.SparseCategoricalAccuracy() val_acc_metric = keras.metrics.SparseCategoricalAccuracy() # Prepare the training dataset. batch_size = 64 train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)) train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size) # Prepare the validation dataset. val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val)) val_dataset = val_dataset.batch(64) epochs = 3 for epoch in range(epochs): print('Start of epoch %d' % (epoch, )) # Iterate over the batches of the dataset. for step, (x_batch_train, y_batch_train) in enumerate(train_dataset): # Open a GradientTape to record the operations run # during the forward pass, which enables autodifferentiation. with tf.GradientTape() as tape: # Run the forward pass of the layer. # The operations that the layer applies # to its inputs are going to be recorded # on the GradientTape. logits = model(x_batch_train, training=True) # Logits for this minibatch # Compute the loss value for this minibatch. loss_value = loss_fn(y_batch_train, logits) # Use the gradient tape to automatically retrieve # the gradients of the trainable variables with respect to the loss. grads = tape.gradient(loss_value, model.trainable_weights) # Run one step of gradient descent by updating # the value of the variables to minimize the loss. optimizer.apply_gradients(zip(grads, model.trainable_weights)) # Update training metric. train_acc_metric(y_batch_train, logits) # Log every 200 batches. if step % 200= =0: print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value))) print('Seen so far: %s samples' % ((step + 1) * 64)) # Display metrics at the end of each epoch. train_acc = train_acc_metric.result() print('Training acc over epoch: %s' % (float(train_acc), )) # Reset training metrics at the end of each epoch train_acc_metric.reset_states() # Run a validation loop at the end of each epoch. for x_batch_val, y_batch_val in val_dataset: val_logits = model(x_batch_val) # Update val metrics val_acc_metric(y_batch_val, val_logits) val_acc = val_acc_metric.result() val_acc_metric.reset_states() print('Validation acc: %s' % (float(val_acc), )) Copy the code
-
Note the implementation of with tf.gradienttape () as tape, which records the process of forward propagation, Then the tape.gradient method was used to calculate the derivative (also known as gradient) of loss on the model ownership weight matrix (Model.trainable_weights), and then the optimizer was used to update all the weight matrices.
-
In the above training process, the training indicators of the model are updated in each batch of training (update_state()), and the result of the indicators (result()) is printed after the end of an EPOCH training. This metric (reset_states()) is then reset and the next round of metric logging is performed, as is the case for cross-validated metrics.
-
Note that unlike training using model built-in apis, losses defined in the model, such as regularization losses and losses added by add_loss, are not automatically added to loss_FN in custom training. To include these losses, the flow of custom training needs to be modified to add all the losses of the model to the losses to be optimized by calling model.losses. The sample code looks like this:
with tf.GradientTape() as tape: logits = model(x_batch_train) loss_value = loss_fn(y_batch_train, logits) # Add extra losses created during this forward pass: loss_value += sum(model.losses) grads = tape.gradient(loss_value, model.trainable_weights) optimizer.apply_gradients(zip(grads, model.trainable_weights)) Copy the code
The resources
- Keras model training and evaluation
- Keras model FIT method
- tf.data.Dataset