Premade Estimators

This document introduces the programming environment for TensorFlow and shows you how to use TensorFlow to solve Iris classification problems.

Lead requirements

To use the sample code in this document, you need to do a few things:

  • Install TensorFlow
  • If you have TensorFlow installed on VirtualEnv or Anaconda, start your TensorFlow environment.
  • Install or upgrade pandas with the following command:

      pip install pandas
    Copy the code

Sample code is obtained

Follow these steps to get the sample code we will use:

  1. Download the TensorFlow Models repository from GitHub by typing:

     git clone https://github.com/tensorflow/models
    Copy the code
  2. Go to the folder containing the samples:

     cd models/samples/core/get_started/
    Copy the code

The program we will use in this document is premade_estimator.py. This program uses iris_data.py to get training data.

To run the program

Running TensorFlow is the same as running any other Python program. For example,

python premade_estimator.py
Copy the code

The program will output some training logs and some predicted results for the test set. For example, the first line of the following output shows that the model has a 99.6% probability that the first example in the test set is Setosa. Since the test focus is indeed Setosa, this means that the prediction did well.

. Prediction is "Setosa" (99.6%), Expected "Setosa" Prediction is "Versicolor" (99.8%), Expected "Versicolor" Prediction is "Virginica" (97.9%), Expected "Virginica"Copy the code

If the program error occurs, please check the following problems:

  • Have you installed TensorFlow correctly?
  • Are you using the correct version of TensorFlow?
  • Have you started the environment with TensorFlow installed? (This will only appear in certain installation methods)

The program stack

Before diving into the details of the program, let’s look at the environment of the program. As shown below, TensorFlow provides a stack with many API layers:

We strongly recommend writing TensorFlow programs using the following API:

  • estimator
  • Data sets: A quick look

Classification of Irises: an overview

The sample program in this document builds and tests a model that divides Iris flowers into different species based on their sepals and petals.

From left to right, Iris Setosa (by Radomil, CC by-SA 3.0), Iris Versicolor (by Dlanglois, CC by-SA 3.0), And Iris Virginica (by Frank Mayfield, CC BY-SA 2.0).

The data set

The Iris dataset contains four features and a label. Four features define the floral characteristics of Iris flowers:

  • Calyx length
  • Calyx width
  • Petal length
  • Petals width

Our model represents these characteristics in float32 data.

The label indicates the type of Iris, which must be one of the following:

  • Iris setosa (0)
  • Iris versicolor (1)
  • Iris virginica (2)

Our model represents these labels with data of type INT32.

The table below shows three examples from the dataset:

Calyx length Calyx width Petal length Petals width Category (label)
5.1 3.3 1.7 0.5 0 (Setosa)
5.0 2.3 3.3 1.0 1 (versicolor)
6.4 2.8 5.6 2.2 2 (virginica)

algorithm

The program trained a deep neural network classification model with the following topology:

  • Two hidden layers.
  • Each hidden layer has 10 nodes.

The following figure shows the features, hidden layers, and predicted results in the neural network (the nodes in the hidden layer are not all shown) :

interface

Running a trained model on an unlabeled sample produces three predictions of the probability that the flower falls into each of the three Iris categories. The sum of these three predictions is 1.0. For example, the prediction for an unmarked sample might look like this:

  • Probability of Iris Setosa is 0.03
  • Probability of Iris Versicolor is 0.95
  • Iris Virginica has a probability of 0.02

This prediction implies that the unlabeled sample given has a 95% probability of being Iris Versicolor.

An overview of programming with Estimators

estimator

tf.estimator.Estimator

To write a TensorFlow program based on a prefabricated Estimator, you can perform the following tasks:

  • Create one or more input functions.
  • Define the characteristic columns of the model.
  • Instantiate an Estimator that defines feature columns and various hyperparameters.
  • Call one or more methods on the Estimator object, passing in the appropriate input function as the data source.

Let’s look at how these tasks implement the Iris classification.

Creating an input function

You must create an input function that supports providing data for training, evaluation, and prediction.

tf.data.Dataset

  • features– A Python dictionary:

       Each key is the name of the feature.Each value is an array of all the values for that feature.
  • label– An array containing all examplesThe label 。

Here is an implementation of the input function to show its format:

Def input_evaluation_set(): features = {'SepalLength': np.array([6.4, 5.0]), 'SepalWidth': Np. Array ([2.8, 2.3]), 'PetalLength: np. The array ([5.6, 3.3]),' PetalWidth ': Np. array([2.2, 1.0])} labels = np.array([2, 1]) return features, labelsCopy the code

Your input function can generate the Features dictionary and label list in any way. However, we recommend using TensorFlow’s Dataset API, which can handle all kinds of data. At a high level, the Dataset API contains the following classes:

Where, the individual members are:

  • Dataset– Contains base classes for creating and transferring datasets. It also allows you to initialize data sets from in-memory data, or from a Python generator.
  • TextLineDataset– Reads from a text file line by line
  • TFRecordDataset– Reads records from the TFRecord file
  • FixedLengthRecordDataset– Reads fixed-size records from a binary file
  • Iterator– provides a way to fetch the elements of a dataset at a time

The Dataset API handles many common situations for you. For example, it makes it easy to read records in parallel from a series of huge files and compose a single stream.

To keep this example simple, we will use PANDAS to load the data and set up our input pipe based on the in-memory data.

This is the input function used in the program’s training and can be found at iris_data.py.

def train_input_fn(features, labels, batch_size): """ input functions used in training """ # change the input into a Dataset Dataset = tf.data.dataset. From_tensor_slices ((dict(features), labels)) # scramble, Batch (batch_size) return dataset.shuffle(1000).repeat().batch(batch_size)Copy the code

Defining feature columns

tf.feature_column

For Iris, the four original features are numerical values, so we will build a list of feature columns to tell the Estimator model to represent the four features as 32-bit floating point values. Therefore, the code used to create the feature column is:

# Feature columns describe how to use the input.
my_feature_columns = []
for key in train_x.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))
Copy the code

Characteristics of the column

Now that we have a definition of how the model represents the original features, we can start building an Estimator.

Instantiate an Estimator

Iris problem is a classic classification problem. Fortunately, TensorFlow provides some pre-built Estimator classifiers, including:

  • tf.estimator.DNNClassifier
  • tf.estimator.DNNLinearCombinedClassifier
  • tf.estimator.LinearClassifier

To the problem of Iris tf) estimator) DNNClassifier seems to be the best choice. Here’s how we instantiate the Estimator:

# to create a has two hidden layers and each layer of 10 nodes within DNN classifier = tf. The estimator. DNNClassifier (# feature_columns = my_feature_columns, two hidden layers, Each layer has 10 nodes. Hidden_units =[10, 10], # Model must choose between 3 categories n_classes=3)Copy the code

Training, evaluation and prediction

Now that we have an Estimator object, we can call methods that do the following:

  • Training model
  • Evaluate trained models
  • Use trained models to make predictions

Training model

Call Estimator’s train method to train the model as follows:

Train_input_fn (train_x, train_y, args.batch_size), steps=args.train_steps)Copy the code

Here we include the call to input_fn in a lambda to get the parameters required by Estimator when we provide a parameterless input function. The steps parameter tells the method to stop training after a certain number of sessions.

Evaluate trained models

Now that the model has been trained, we can get some data on its performance. The following code snippet evaluates the model’s accuracy on the test set:

Eval_result = classified. evaluate(input_fn=lambda:iris_data.eval_input_fn(test_x, test_y, Args. Batch_size) print('\nTest set accuracy: {accuracy:0.3f}\n'. Format (**eval_result))Copy the code

Unlike when we call the train method, we cannot pass the Steps parameter to the evaluation method. Our EVAL_INput_FN has produced data for only one EPOCH (number of rounds).

Running this code produces the following output (or something similar) :

The Test set accuracy: 0.967Copy the code

Predictions are made using trained models

We now have a well-trained model, and we can have good evaluation results. We can now use trained models to predict Iris flower types based on some unlabeled samples. As with training and evaluation, we make predictions with a single function call:

Expected = ['Setosa', 'Versicolor', 'Virginica'] predict_x = {'SepalLength': [5.1, 5.9, 6.9], 'SepalWidth': [3.3, 3.0, 3.1], 'PetalLength: [1.7, 4.2, 5.4],' PetalWidth ': Predictions = predictions (input_fn=lambda:iris_data.eval_input_fn) [0.5, 1.5, 2.1],} Predictions = classifier. Predict (predict_x, predict_x, batch_size=args.batch_size))Copy the code

The predict method returns a Python iterable type, which is a dictionary with the predicted results for each example. The following code prints out some predictions and their probabilities:

template = ('\nPrediction is "{}" ({:.1f}%), expected "{}"')

for pred_dict, expec in zip(predictions, expected):
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print(template.format(iris_data.SPECIES[class_id],
                          100 * probability, expec))
Copy the code

Running the above code produces the following result:

. Prediction is "Setosa" (99.6%), Expected "Setosa" Prediction is "Versicolor" (99.8%), Expected "Versicolor" Prediction is "Virginica" (97.9%), Expected "Virginica"Copy the code

conclusion

Prefabricated Estimator is a very effective method to create standard model quickly.

Now, if you’re already writing TensorFlow, pay attention to the following:

  • checkpoint
  • Data sets: A quick look
  • Create a custom Estimator