directory
Introduction to the
Using the Python API
Using c + + API
-
Introduction to the
Visual recognition seems to be a particularly simple task for our brains. Humans can easily tell a lion from a jaguar, read a road sign or recognize faces. But these are actually hard problems for a computer to deal with: they just seem easy because the brain is so good at understanding images.
In the past few years, the field of machine learning has made great strides in solving such problems. In particular, we found that a model called a deep convolutional neural network can handle difficult visual recognition tasks very well – in some areas performing as well as or even better than the human brain.
The researchers demonstrated steady progress in computer vision by verifying their work with ImageNet, an academic benchmark for computer vision. They have produced the following models, each one an improvement over the last, and each one a new lead: QuocNet, AlexNet, Inception (GoogLeNet), BN-Inception- V2. Researchers inside and outside Google have published papers on all of these models, but the results are still hard to replicate. We will now take the next step and publish the code for image recognition on our latest model, Inception- V3.
Inception- V3 was trained for the ImageNet Large visual Recognition Challenge using data from 2012. Its hierarchy is shown below:
Inception-v3 deals with standard computer vision tasks, in which the model attempts to divide all images into 1,000 categories, such as’ zebra ‘, ‘Dalmatian’ and ‘dishwasher’. For example, here are the results of AlexNet’s classification of some images:
To compare the models, I examine how often the correct answer is not among the five most likely choices predicted by the model, called the “top-5 error rate.” AlexNet achieved a top-5 error rate of 15.3% in the validation data set in 2012; The top-5 error rates of Inception (GoogLeNet), BN-inception -v2 and Inception- V3 are 6.67%, 4.9% and 3.46%, respectively.
How did the humans perform in the ImageNet Challenge? Andrej Karpathy has tried to measure his own performance, Posting a blog post about his top-5 error rate of 5.1%.
This section describes how to use Inception-v3. Learn how to use Python or C++ to classify images into 1000 categories. In addition, we will discuss how higher-level features can be extracted from this model for repeated use in other visual tasks.
-
Using the Python API
When you first run the program, Classify_image.py downloads the trained model from tensorflow.org. You need about 200 MB of free space on your hard drive.
First, clone the TensorFlow model code base from GitHub.
cd models/tutorials/image/imagenet
Copy the code
Classify_image.py Program content is as follows:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os.path
import re
import sys
import tarfile
import numpy as np
from six.moves import urllib
import tensorflow as tf
FLAGS = None
# pylint: disable=line-too-long
DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
# pylint: enable=line-too-long
class NodeLookup(object):
"""Converts integer node ID's to human readable labels."""
def __init__(self,
label_lookup_path=None,
uid_lookup_path=None):
if not label_lookup_path:
label_lookup_path = os.path.join(
FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')
if not uid_lookup_path:
uid_lookup_path = os.path.join(
FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt')
self.node_lookup = self.load(label_lookup_path, uid_lookup_path)
def load(self, label_lookup_path, uid_lookup_path):
"""Loads a human readable English name for each softmax node.
Args:
label_lookup_path: string UID to integer node ID.
uid_lookup_path: string UID to human-readable string.
Returns:
dict from integer node ID to human-readable string.
"""
if not tf.gfile.Exists(uid_lookup_path):
tf.logging.fatal('File does not exist %s', uid_lookup_path)
if not tf.gfile.Exists(label_lookup_path):
tf.logging.fatal('File does not exist %s', label_lookup_path)
# Loads mapping from string UID to human-readable string
proto_as_ascii_lines = tf.gfile.GFile(uid_lookup_path).readlines()
uid_to_human = {}
p = re.compile(r'[n\d]*[ \S,]*')
for line in proto_as_ascii_lines:
parsed_items = p.findall(line)
uid = parsed_items[0]
human_string = parsed_items[2]
uid_to_human[uid] = human_string
# Loads mapping from string UID to integer node ID.
node_id_to_uid = {}
proto_as_ascii = tf.gfile.GFile(label_lookup_path).readlines()
for line in proto_as_ascii:
if line.startswith(' target_class:'):
target_class = int(line.split(': ')[1])
if line.startswith(' target_class_string:'):
target_class_string = line.split(': ')[1]
node_id_to_uid[target_class] = target_class_string[1:-2]
# Loads the final mapping of integer node ID to human-readable string
node_id_to_name = {}
for key, val in node_id_to_uid.items():
if val not in uid_to_human:
tf.logging.fatal('Failed to locate: %s', val)
name = uid_to_human[val]
node_id_to_name[key] = name
return node_id_to_name
def id_to_string(self, node_id):
if node_id not in self.node_lookup:
return ''
return self.node_lookup[node_id]
def create_graph():
"""Creates a graph from saved GraphDef file and returns a saver."""
# Creates graph from saved graph_def.pb.
with tf.gfile.FastGFile(os.path.join(
FLAGS.model_dir, 'classify_image_graph_def.pb'), 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name='')
def run_inference_on_image(image):
"""Runs inference on an image.
Args:
image: Image file name.
Returns:
Nothing
"""
if not tf.gfile.Exists(image):
tf.logging.fatal('File does not exist %s', image)
image_data = tf.gfile.FastGFile(image, 'rb').read()
# Creates graph from saved GraphDef.
create_graph()
with tf.Session() as sess:
# Some useful tensors:
# 'softmax:0': A tensor containing the normalized prediction across
# 1000 labels.
# 'pool_3:0': A tensor containing the next-to-last layer containing 2048
# float description of the image.
# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG
# encoding of the image.
# Runs the softmax tensor by feeding the image_data as input to the graph.
softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
predictions = sess.run(softmax_tensor,
{'DecodeJpeg/contents:0': image_data})
predictions = np.squeeze(predictions)
# Creates node ID --> English string lookup.
node_lookup = NodeLookup()
top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
for node_id in top_k:
human_string = node_lookup.id_to_string(node_id)
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
def maybe_download_and_extract():
"""Download and extract model tar file."""
dest_directory = FLAGS.model_dir
if not os.path.exists(dest_directory):
os.makedirs(dest_directory)
filename = DATA_URL.split('/')[-1]
filepath = os.path.join(dest_directory, filename)
if not os.path.exists(filepath):
def _progress(count, block_size, total_size):
sys.stdout.write('\r>> Downloading %s %.1f%%' % (
filename, float(count * block_size) / float(total_size) * 100.0))
sys.stdout.flush()
filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress)
print()
statinfo = os.stat(filepath)
print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
tarfile.open(filepath, 'r:gz').extractall(dest_directory)
def main(_):
maybe_download_and_extract()
image = (FLAGS.image_file if FLAGS.image_file else
os.path.join(FLAGS.model_dir, 'cropped_panda.jpg'))
run_inference_on_image(image)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
# classify_image_graph_def.pb:
# Binary representation of the GraphDef protocol buffer.
# imagenet_synset_to_human_label_map.txt:
# Map from synset ID to a human readable string.
# imagenet_2012_challenge_label_map_proto.pbtxt:
# Text representation of a protocol buffer mapping a label to synset ID.
parser.add_argument(
'--model_dir',
type=str,
default=r'C:\Users\Administrator\Desktop\imagenet',
help="""\
Path to classify_image_graph_def.pb,
imagenet_synset_to_human_label_map.txt, and
imagenet_2012_challenge_label_map_proto.pbtxt.\
"""
)
parser.add_argument(
'--image_file',
type=str,
default=r'C:\Users\Administrator\Desktop\imagenet\cropped_panda.jpg',
help='Absolute path to image file.'
)
parser.add_argument(
'--num_top_predictions',
type=int,
default=5,
help='Display this many predictions.'
)
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
Copy the code
Run the following command:
python classify_image.py
Copy the code
The above command classifies the provided images of pandas.
If the model runs correctly, the script produces the following output:
Giant Panda, panda, Panda Bear, Coon Bear, Ailuropoda melanoleuca (score = 0.88493) Indri, indris, indri, Indri Brevicaudatus (Score = 0.00878) Lesser Panda, Red Panda, panda, Bear Cat, Cat Bear, Ailurus Fulgens (Score = 0.00317) Custard Apple (Score = 0.00149) EarthStar (Score = 0.00127)Copy the code
If you want to serve other JPEG images, simply change the –image_file parameter.
If you are downloading model data to another directory, you need to make –model_dir point to the directory being used.
In Windows, you can download this program directly from GitHub: github.com/tensorflow/…
But sometimes download recognition model often fail, here I am to share my debugging good Demo:download.csdn.net/download/m0…
-
Using c + + API
You can run the same Inception-v3 model using C++ to use the model in a production environment. To do this, you can download the archive containing GraphDef, which defines the model as follows (run from the root of the TensorFlow codebar) :
curl -L "https://storage.googleapis.com/download.tensorflow.org/models/inception_v3_2016_08_28_frozen.pb.tar.gz" |
tar -C tensorflow/examples/label_image/data -xz
Copy the code
Next, we need to compile the C++ binaries that contain the code to load and run the diagrams. If you follow the instructions for your platform to download the TensorFlow source installation file, you should be able to build the example by running the following command from a shell terminal:
bazel build tensorflow/examples/label_image/...
Copy the code
The above command should create an executable binary, which can then be run as follows:
bazel-bin/tensorflow/examples/label_image/label_image
Copy the code
The default sample image that comes with the framework is used here, and the output should look something like this:
I tensorflow/examples/label_image/main.cc:206] military uniform (653): 0.834306 I tensorflow/examples/label_image/main cc: 206] mortarboard (668) : 0.0218692 I tensorflow/examples/label_image/main cc: 206] academic gown (401) : 0.0103579 I tensorflow/examples/label_image/main cc: 206] pickelhaube (716) : 0.00800814 I tensorflow/examples/label_image/main cc: 206] bulletproof vest (466) : 0.00535088Copy the code
In this example, we used the default image of Admiral Grace Heber, and as you can see, the network correctly identified her as wearing a military uniform with a score of 0.8.
About its working principle, please refer to the tensorflow/examples/label_image/main cc file (www.tensorflowers.cn/t/7558). Hope this… TensorFlow is integrated into its own application, so the main functions are introduced step by step:
Command line tags control the file loading path and the attributes of the input image. Since a 299×299 RGB square image should be entered into the model, the tags input_width and input_height should be set to these values. In addition, we need to scale the pixel values from integers between 0 and 255 to floating-point values, because the graph performs operations on floating-point numbers. We use input_mean and input_STD tags to control scaling; Subtract input_mean from each pixel value, then divide by input_std.
These values may seem strange, but they are only defined by the original model author based on what he/she wants to use as input images for training purposes. If your friends have a self-training chart, just adjust the values to match whatever values you use during your training.
You can see the ReadTensorFromImageFile() function to see how these tags are applied to the image.
// Given an image file name, read in the data, try to decode it as an image,
// resize it to the requested size, and then scale the values as desired.
Status ReadTensorFromImageFile(string file_name, const int input_height,
const int input_width, const float input_mean,
const float input_std,
std::vector<Tensor>* out_tensors) {
tensorflow::GraphDefBuilder b;
Copy the code
First, create a GraphDefBuilder object that can be used to specify the model to run or load.
string input_name = "file_reader";
string output_name = "normalized";
tensorflow::Node* file_reader =
tensorflow::ops::ReadFile(tensorflow::ops::Const(file_name, b.opts()),
b.opts().WithName(input_name));
Copy the code
Nodes are then created for the small model to run to load, adjust, and scale pixel values to get the results that the main model expects as its input. The first node I create is just a Const operation that stores a tensor containing the file name of the image to load. This tensor is then passed to the ReadFile operation as the first input. You may have noticed that I passed b.pts () as the last argument to all operation creation functions. This ensures that the node will be added to the model definition stored in the GraphDefBuilder. I also named the node by naming the ReadFile operator with a WithName() call to B.pts (), which is not absolutely necessary (because the node is automatically assigned a name if you don’t do it), but it does simplify the debugging process.
// Now try to figure out what kind of file it is and decode it.
const int wanted_channels = 3;
tensorflow::Node* image_reader;
if (tensorflow::StringPiece(file_name).ends_with(".png")) {
image_reader = tensorflow::ops::DecodePng(
file_reader,
b.opts().WithAttr("channels", wanted_channels).WithName("png_reader"));
} else {
// Assume if it's not a PNG then it must be a JPEG.
image_reader = tensorflow::ops::DecodeJpeg(
file_reader,
b.opts().WithAttr("channels", wanted_channels).WithName("jpeg_reader"));
}
// Now cast the image data to float so we can do normal math on it.
tensorflow::Node* float_caster = tensorflow::ops::Cast(
image_reader, tensorflow::DT_FLOAT, b.opts().WithName("float_caster"));
// The convention for image ops in TensorFlow is that all images are expected
// to be in batches, so that they're four-dimensional arrays with indices of
// [batch, height, width, channel]. Because we only have a single image, we
// have to add a batch dimension of 1 to the start with ExpandDims().
tensorflow::Node* dims_expander = tensorflow::ops::ExpandDims(
float_caster, tensorflow::ops::Const(0, b.opts()), b.opts());
// Bilinearly resize the image to fit the required dimensions.
tensorflow::Node* resized = tensorflow::ops::ResizeBilinear(
dims_expander, tensorflow::ops::Const({input_height, input_width},
b.opts().WithName("size")),
b.opts());
// Subtract the mean and divide by the scale.
tensorflow::ops::Div(
tensorflow::ops::Sub(
resized, tensorflow::ops::Const({input_mean}, b.opts()), b.opts()),
tensorflow::ops::Const({input_std}, b.opts()),
b.opts().WithName(output_name));
Copy the code
Next, I added more nodes to decode file data into images, convert integers to floating point values, resize, and finally perform subtraction and division on pixel values.
// This runs the GraphDef network definition that we've just constructed, and
// returns the results in the output tensor.
tensorflow::GraphDef graph;
TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));
Copy the code
Finally, I get a model definition stored in variable B and can convert it into a full graph definition using the ToGraphDef() function.
std::unique_ptr<tensorflow::Session> session(
tensorflow::NewSession(tensorflow::SessionOptions()));
TF_RETURN_IF_ERROR(session->Create(graph));
TF_RETURN_IF_ERROR(session->Run({}, {output_name}, {}, out_tensors));
return Status::OK();
Copy the code
Next, create a tf.session object (which is the interface to the actual diagram) and run it to specify which node to get the output from and where to store the output data.
That gives us a vector of Tensor objects, and in this case, we know it’s just going to be the length of a single object. In this case, you can think of the Tensor as a multidimensional array that stores a 299 pixel high, 299 pixel wide, 3-channel image as a floating point value. If you already have your own image processing framework in your product, you should be able to use that framework as long as the same transformation is applied to the image before it is fed to the main diagram.
Here is a simple example of dynamically creating a small TensorFlow diagram using C++, but for a pre-trained Inception model, we need to load a larger definition from the file. Look at the LoadGraph() function to see how to do this.
// Reads a model graph definition from disk, and creates a session object you // can use to run it. Status LoadGraph(string graph_file_name, std::unique_ptr<tensorflow::Session>* session) { tensorflow::GraphDef graph_def; Status load_graph_status = ReadBinaryProto(tensorflow::Env::Default(), graph_file_name, &graph_def); if (! load_graph_status.ok()) { return tensorflow::errors::NotFound("Failed to load compute graph at '", graph_file_name, "'"); }Copy the code
If you’ve been through the image loading code, you should be familiar with many of the terms. Instead of using GraphDefBuilder to generate the GraphDef object, I’ll load a Protobuf file that contains GraphDef directly.
session->reset(tensorflow::NewSession(tensorflow::SessionOptions())); Status session_create_status = (*session)->Create(graph_def); if (! session_create_status.ok()) { return session_create_status; } return Status::OK(); }Copy the code
I then create a Session object from the GraphDef and pass it back to the caller so that the caller can run it later.
The GetTopLabels() function is a lot like image loading, except that in this case, I want to take the result of running the main graph and turn it into a sorted list of the highest-scoring tags. Similar to the image loader, this function creates a GraphDefBuilder, adds a few nodes to it, and then runs the shorter graph to get a pair of output tensors. In this case, they represent the sorted score and index position of the highest result, respectively.
// Analyzes the output of the Inception graph to retrieve the highest scores and
// their positions in the tensor, which correspond to categories.
Status GetTopLabels(const std::vector<Tensor>& outputs, int how_many_labels,
Tensor* indices, Tensor* scores) {
tensorflow::GraphDefBuilder b;
string output_name = "top_k";
tensorflow::ops::TopK(tensorflow::ops::Const(outputs[0], b.opts()),
how_many_labels, b.opts().WithName(output_name));
// This runs the GraphDef network definition that we've just constructed, and
// returns the results in the output tensors.
tensorflow::GraphDef graph;
TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));
std::unique_ptr<tensorflow::Session> session(
tensorflow::NewSession(tensorflow::SessionOptions()));
TF_RETURN_IF_ERROR(session->Create(graph));
// The TopK node returns two outputs, the scores and their original indices,
// so we have to append :0 and :1 to specify them both.
std::vector<Tensor> out_tensors;
TF_RETURN_IF_ERROR(session->Run({}, {output_name + ":0", output_name + ":1"},
{}, &out_tensors));
*scores = out_tensors[0];
*indices = out_tensors[1];
return Status::OK();
Copy the code
The PrintTopLabels() function takes these sorted results and outputs them in a friendly way. The CheckTopLabel() function is very similar, but for debugging purposes, we need to make sure that the most likely label is the value we expect.
Finally, main() binds all of these calls together.
int main(int argc, char* argv[]) { // We need to call this to set up global state for TensorFlow. tensorflow::port::InitMain(argv[0], &argc, &argv); Status s = tensorflow::ParseCommandLineFlags(&argc, argv); if (! s.ok()) { LOG(ERROR) << "Error parsing command line flags: " << s.ToString(); return -1; } // First we load and initialize the model. std::unique_ptr<tensorflow::Session> session; string graph_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_graph); Status load_graph_status = LoadGraph(graph_path, &session); if (! load_graph_status.ok()) { LOG(ERROR) << load_graph_status; return -1; }Copy the code
Loading master figure
// Get the image from disk as a float array of numbers, resized and normalized // to the specifications the main graph expects. std::vector<Tensor> resized_tensors; string image_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_image); Status read_tensor_status = ReadTensorFromImageFile( image_path, FLAGS_input_height, FLAGS_input_width, FLAGS_input_mean, FLAGS_input_std, &resized_tensors); if (! read_tensor_status.ok()) { LOG(ERROR) << read_tensor_status; return -1; } const Tensor& resized_tensor = resized_tensors[0];Copy the code
Load, process, and resize the input image
// Actually run the image through the model. std::vector<Tensor> outputs; Status run_status = session->Run({ {FLAGS_input_layer, resized_tensor}}, {FLAGS_output_layer}, {}, &outputs); if (! run_status.ok()) { LOG(ERROR) << "Running model failed: " << run_status; return -1; }Copy the code
In this example, we run the loaded diagram with the image as input
// This is for automated testing to make sure we get the expected result with // the default settings. We know that label 866 (military uniform) should be // the top label for the Admiral Hopper image. if (FLAGS_self_test) { bool expected_matches; Status check_status = CheckTopLabel(outputs, 866, &expected_matches); if (! check_status.ok()) { LOG(ERROR) << "Running check failed: " << check_status; return -1; } if (! expected_matches) { LOG(ERROR) << "Self-test failed!" ; return -1; }}Copy the code
For testing purposes, we can check below to make sure we get the expected output
// Do something interesting with the results we've generated.
Status print_status = PrintTopLabels(outputs, FLAGS_labels);
Copy the code
Finally, output the tag we found
if (! print_status.ok()) { LOG(ERROR) << "Running print failed: " << print_status; return -1; }Copy the code
In this example, I use TensorFlow’s Status object to handle errors, which is handy because it allows you to use the OK () checker to see if any errors have occurred and, if so, output a readable error message.
In this example, I’m demonstrating object recognition, but friends should be able to use very similar code for other models they’ve found in various fields or trained on themselves. I hope this small example has given you some ideas on how to use TensorFlow in your own products.