Premade Estimators
This document introduces the programming environment for TensorFlow and shows you how to use TensorFlow to solve Iris classification problems.
Lead requirements
To use the sample code in this document, you need to do a few things:
- Install TensorFlow
- If you have TensorFlow installed on VirtualEnv or Anaconda, start your TensorFlow environment.
-
Install or upgrade pandas with the following command:
pip install pandas Copy the code
Sample code is obtained
Follow these steps to get the sample code we will use:
-
Download the TensorFlow Models repository from GitHub by typing:
git clone https://github.com/tensorflow/models Copy the code
-
Go to the folder containing the samples:
cd models/samples/core/get_started/ Copy the code
The program we will use in this document is premade_estimator.py. This program uses iris_data.py to get training data.
To run the program
Running TensorFlow is the same as running any other Python program. For example,
python premade_estimator.py
Copy the code
The program will output some training logs and some predicted results for the test set. For example, the first line of the following output shows that the model has a 99.6% probability that the first example in the test set is Setosa. Since the test focus is indeed Setosa, this means that the prediction did well.
. Prediction is "Setosa" (99.6%), Expected "Setosa" Prediction is "Versicolor" (99.8%), Expected "Versicolor" Prediction is "Virginica" (97.9%), Expected "Virginica"Copy the code
If the program error occurs, please check the following problems:
- Have you installed TensorFlow correctly?
- Are you using the correct version of TensorFlow?
- Have you started the environment with TensorFlow installed? (This will only appear in certain installation methods)
The program stack
Before diving into the details of the program, let’s look at the environment of the program. As shown below, TensorFlow provides a stack with many API layers:
We strongly recommend writing TensorFlow programs using the following API:
- estimator
- Data sets: A quick look
Classification of Irises: an overview
The sample program in this document builds and tests a model that divides Iris flowers into different species based on their sepals and petals.
From left to right, Iris Setosa (by Radomil, CC by-SA 3.0), Iris Versicolor (by Dlanglois, CC by-SA 3.0), And Iris Virginica (by Frank Mayfield, CC BY-SA 2.0).
The data set
The Iris dataset contains four features and a label. Four features define the floral characteristics of Iris flowers:
- Calyx length
- Calyx width
- Petal length
- Petals width
Our model represents these characteristics in float32 data.
The label indicates the type of Iris, which must be one of the following:
- Iris setosa (0)
- Iris versicolor (1)
- Iris virginica (2)
Our model represents these labels with data of type INT32.
The table below shows three examples from the dataset:
Calyx length | Calyx width | Petal length | Petals width | Category (label) |
---|---|---|---|---|
5.1 | 3.3 | 1.7 | 0.5 | 0 (Setosa) |
5.0 | 2.3 | 3.3 | 1.0 | 1 (versicolor) |
6.4 | 2.8 | 5.6 | 2.2 | 2 (virginica) |
algorithm
The program trained a deep neural network classification model with the following topology:
- Two hidden layers.
- Each hidden layer has 10 nodes.
The following figure shows the features, hidden layers, and predicted results in the neural network (the nodes in the hidden layer are not all shown) :
interface
Running a trained model on an unlabeled sample produces three predictions of the probability that the flower falls into each of the three Iris categories. The sum of these three predictions is 1.0. For example, the prediction for an unmarked sample might look like this:
- Probability of Iris Setosa is 0.03
- Probability of Iris Versicolor is 0.95
- Iris Virginica has a probability of 0.02
This prediction implies that the unlabeled sample given has a 95% probability of being Iris Versicolor.
An overview of programming with Estimators
estimator
tf.estimator.Estimator
To write a TensorFlow program based on a prefabricated Estimator, you can perform the following tasks:
- Create one or more input functions.
- Define the characteristic columns of the model.
- Instantiate an Estimator that defines feature columns and various hyperparameters.
- Call one or more methods on the Estimator object, passing in the appropriate input function as the data source.
Let’s look at how these tasks implement the Iris classification.
Creating an input function
You must create an input function that supports providing data for training, evaluation, and prediction.
tf.data.Dataset
features
– A Python dictionary:
Each key is the name of the feature.Each value is an array of all the values for that feature.label
– An array containing all examplesThe label 。
Here is an implementation of the input function to show its format:
Def input_evaluation_set(): features = {'SepalLength': np.array([6.4, 5.0]), 'SepalWidth': Np. Array ([2.8, 2.3]), 'PetalLength: np. The array ([5.6, 3.3]),' PetalWidth ': Np. array([2.2, 1.0])} labels = np.array([2, 1]) return features, labelsCopy the code
Your input function can generate the Features dictionary and label list in any way. However, we recommend using TensorFlow’s Dataset API, which can handle all kinds of data. At a high level, the Dataset API contains the following classes:
Where, the individual members are:
Dataset
– Contains base classes for creating and transferring datasets. It also allows you to initialize data sets from in-memory data, or from a Python generator.TextLineDataset
– Reads from a text file line by lineTFRecordDataset
– Reads records from the TFRecord fileFixedLengthRecordDataset
– Reads fixed-size records from a binary fileIterator
– provides a way to fetch the elements of a dataset at a time
The Dataset API handles many common situations for you. For example, it makes it easy to read records in parallel from a series of huge files and compose a single stream.
To keep this example simple, we will use PANDAS to load the data and set up our input pipe based on the in-memory data.
This is the input function used in the program’s training and can be found at iris_data.py.
def train_input_fn(features, labels, batch_size): """ input functions used in training """ # change the input into a Dataset Dataset = tf.data.dataset. From_tensor_slices ((dict(features), labels)) # scramble, Batch (batch_size) return dataset.shuffle(1000).repeat().batch(batch_size)Copy the code
Defining feature columns
tf.feature_column
For Iris, the four original features are numerical values, so we will build a list of feature columns to tell the Estimator model to represent the four features as 32-bit floating point values. Therefore, the code used to create the feature column is:
# Feature columns describe how to use the input.
my_feature_columns = []
for key in train_x.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
Copy the code
Characteristics of the column
Now that we have a definition of how the model represents the original features, we can start building an Estimator.
Instantiate an Estimator
Iris problem is a classic classification problem. Fortunately, TensorFlow provides some pre-built Estimator classifiers, including:
tf.estimator.DNNClassifier
tf.estimator.DNNLinearCombinedClassifier
tf.estimator.LinearClassifier
To the problem of Iris tf) estimator) DNNClassifier seems to be the best choice. Here’s how we instantiate the Estimator:
# to create a has two hidden layers and each layer of 10 nodes within DNN classifier = tf. The estimator. DNNClassifier (# feature_columns = my_feature_columns, two hidden layers, Each layer has 10 nodes. Hidden_units =[10, 10], # Model must choose between 3 categories n_classes=3)Copy the code
Training, evaluation and prediction
Now that we have an Estimator object, we can call methods that do the following:
- Training model
- Evaluate trained models
- Use trained models to make predictions
Training model
Call Estimator’s train method to train the model as follows:
Train_input_fn (train_x, train_y, args.batch_size), steps=args.train_steps)Copy the code
Here we include the call to input_fn in a lambda to get the parameters required by Estimator when we provide a parameterless input function. The steps parameter tells the method to stop training after a certain number of sessions.
Evaluate trained models
Now that the model has been trained, we can get some data on its performance. The following code snippet evaluates the model’s accuracy on the test set:
Eval_result = classified. evaluate(input_fn=lambda:iris_data.eval_input_fn(test_x, test_y, Args. Batch_size) print('\nTest set accuracy: {accuracy:0.3f}\n'. Format (**eval_result))Copy the code
Unlike when we call the train method, we cannot pass the Steps parameter to the evaluation method. Our EVAL_INput_FN has produced data for only one EPOCH (number of rounds).
Running this code produces the following output (or something similar) :
The Test set accuracy: 0.967Copy the code
Predictions are made using trained models
We now have a well-trained model, and we can have good evaluation results. We can now use trained models to predict Iris flower types based on some unlabeled samples. As with training and evaluation, we make predictions with a single function call:
Expected = ['Setosa', 'Versicolor', 'Virginica'] predict_x = {'SepalLength': [5.1, 5.9, 6.9], 'SepalWidth': [3.3, 3.0, 3.1], 'PetalLength: [1.7, 4.2, 5.4],' PetalWidth ': Predictions = predictions (input_fn=lambda:iris_data.eval_input_fn) [0.5, 1.5, 2.1],} Predictions = classifier. Predict (predict_x, predict_x, batch_size=args.batch_size))Copy the code
The predict method returns a Python iterable type, which is a dictionary with the predicted results for each example. The following code prints out some predictions and their probabilities:
template = ('\nPrediction is "{}" ({:.1f}%), expected "{}"')
for pred_dict, expec in zip(predictions, expected):
class_id = pred_dict['class_ids'][0]
probability = pred_dict['probabilities'][class_id]
print(template.format(iris_data.SPECIES[class_id],
100 * probability, expec))
Copy the code
Running the above code produces the following result:
. Prediction is "Setosa" (99.6%), Expected "Setosa" Prediction is "Versicolor" (99.8%), Expected "Versicolor" Prediction is "Virginica" (97.9%), Expected "Virginica"Copy the code
conclusion
Prefabricated Estimator is a very effective method to create standard model quickly.
Now, if you’re already writing TensorFlow, pay attention to the following:
- checkpoint
- Data sets: A quick look
- Create a custom Estimator