preface
Tensorflow’s official website currently only contains python, C, Java, and Go release packages, and there is no C++ release package. Tensorflow’s official website also indicates that it does not guarantee the stability of libraries other than Python, and Python is the most complete in terms of functions. As is known to all, python has a huge advantage in developing efficiency, ease of use, but as an interpreted language, in terms of performance there is a larger defects, in the process of all kinds of AI service, using python as a model to quickly build tool, use the advanced language (such as c + +, Java) as program implementation of service is the trend of The Times. This paper focuses on the implementation of tensorflow C++ servitization and various problems encountered.
Implementation scheme
There are two ways to use the tensorflow c++ library:
(1) the best way of course is to build graph directly in C++, but the current C++ tensorflow library is not full-featured like the python API. See builds a small graph in c++ here. The c++ tensorflow API also includes classes for digital kernel implementations of the CPU and gpu that can be used to add new op’s. Can refer to www.tensorflow.org/extend/addi…
(2) the common way c++ calls python to generate a good graph. This paper mainly introduces the scheme.
Implementation steps
(1) compiling tensorflow source C++ so (2) model training output (3) model solidification (4) model loading and running (5) running problems
(1) source code compilation
Environment: tlinux2.2, GCC >= 4.8.5 components to be installed: protobuf 3.3.0 bazel 0.5.0 python 2.7 java8 machine: 4GB memory
A. Install java8 yum install Java
B. Protobuf 3.3.0 Download download github.com/google/prot…
./configure && make && make install
C. Install bazel download github.com/bazelbuild/… Sh bazel 0.5.0 – installer – Linux – x86_64. Sh
D. Compile source code using the latest release: github.com/tensorflow/…
bazel build //tensorflow:libtensorflow_cc.so
Problems you may encounter during compilation:
Problem a: fatal error: unsupported/Eigen/CXX11 / Tensor: No to the file or directory
Eigen3.3 or above: Java.io. IOException: Cannot run program “patch”
yum install patch
Problem three: Insufficient memory
(2) Model training and output
Output can be reference to use case model training to practice blog. Metaflow. Fr/tensorflow -… , also many on Google, model training saved to get the following file:
(3) Model solidification
There are three curing modes of the model:
A. freeze_graph tools
bazel build tensorflow/python/tools:freeze_graph && bazel-bin/tensorflow/python/tools/freeze_graph
--input_graph=graph.pb
--input_checkpoint=checkpoint
--output_graph=./frozen_graph.pb
--output_node_names=output/output/scores
Copy the code
B. Use the freeze_graph.py tool
# We save out the graph to disk, and then call the const conversion# routine.checkpoint_state_name = "checkpoint"
input_graph_name = "graph.pb"
output_graph_name = "frozen_graph.pb"
input_graph_path = os.path.join(FLAGS.model_dir, input_graph_name)
input_saver_def_path = ""
input_binary = False
input_checkpoint_path = os.path.join(FLAGS.checkpoint_dir, 'saved_checkpoint') + "0"# Note that we this normally should be only "output_node"!!! output_node_names = "output/output/scores"
restore_op_name = "save/restore_all"
filename_tensor_name = "save/Const:0"
output_graph_path = os.path.join(FLAGS.model_dir, output_graph_name)
clear_devices = False
freeze_graph.freeze_graph(input_graph_path, input_saver_def_path,
input_binary, input_checkpoint_path,
output_node_names, restore_op_name,
filename_tensor_name, output_graph_path,
clear_devices)
Copy the code
C. Use Tensorflow Python
import os, argparseimport tensorflow as tffrom tensorflow.python.framework import graph_util
dir = os.path.dirname(os.path.realpath(__file__))def freeze_graph(model_folder):
# We retrieve our checkpoint fullpath
checkpoint = tf.train.get_checkpoint_state(model_folder)
input_checkpoint = checkpoint.model_checkpoint_path # We precise the file fullname of our freezed graph
absolute_model_folder = "/".join(input_checkpoint.split('/')[:-1])
output_graph = absolute_model_folder + "/frozen_model.pb"
print output_graph # Before exporting our graph, we need to precise what is our output node
# This is how TF decides what part of the Graph he has to keep and what part it can dump
# NOTE: this variable is plural, because you can have multiple output nodes
output_node_names = "output/output/scores"
# We clear devices to allow TensorFlow to control on which device it will load operations
clear_devices = True
# We import the meta graph and retrieve a Saver
saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices) # We retrieve the protobuf graph definition
graph = tf.get_default_graph()
input_graph_def = graph.as_graph_def() # fix batch norm nodes
for node in input_graph_def.node: if node.op == 'RefSwitch':
node.op = 'Switch'
for index in xrange(len(node.input)): if 'moving_' in node.input[index]:
node.input[index] = node.input[index] + '/read'
elif node.op == 'AssignSub':
node.op = 'Sub'
if 'use_locking' in node.attr: del node.attr['use_locking'] # We start a session and restore the graph weights
with tf.Session() as sess:
saver.restore(sess, input_checkpoint) # We use a built-in TF helper to export variables to constants
output_graph_def = graph_util.convert_variables_to_constants(
sess, # The session is used to retrieve the weights
input_graph_def, # The graph_def is used to retrieve the nodes
output_node_names.split(",") # The output node names are used to select the usefull nodes
)
# Finally we serialize and dump the output graph to the filesystem
with tf.gfile.GFile(output_graph, "wb") as f:
f.write(output_graph_def.SerializeToString())
print("%d ops in the final graph." % len(output_graph_def.node))if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--model_folder".type=str, help="Model folder to export")
args = parser.parse_args()
freeze_graph(args.model_folder)
Copy the code
Pit BatchNorm bug
In the actual project, the above error is reported when the generated model is loaded using tensorflow c++ API in methods 1 and 2, and the same error is reported when the model is loaded using tensorflow python:
The reason is that BatchNorm is used in the model, and the repair method is as given in C above
(4) Model loading and running
Building inputs and outputs
The Tensor and Eigen::Tensor are very difficult to use compared to python’s numpy library, especially for dynamic matrices. If your compiler supports C++14, you can use xTensor. As powerful as Numpy, and uses machines similarly. Take a look at the eigen library and the tensorflow::Tensor documentation if it’s C++11. Example set simple usage:
Matrix assignment:
tensorflow::Tensor four_dim_plane(DT_FLOAT, tensorflow::TensorShape({1, MODEL_X_AXIS_LEN, MODEL_Y_AXIS_LEN, fourth_dim_size}));
auto plane_tensor = four_dim_plane.tensor<float, 4 > ();for (uint32_t k = 0; k < array_plane.size(); ++k)
{ for (uint32_t j = 0; j < MODEL_Y_AXIS_LEN; ++j)
{ for(uint32_t i = 0; i < MODEL_X_AXIS_LEN; ++i) { plane_tensor(0, i, j, k) = array_plane[k](i, j); }}}Copy the code
SOFTMAX:
Eigen::Tensor<float, 1> ModelApp::TensorSoftMax(const Eigen::Tensor<float, 1>& tensor)
{
Eigen::Tensor<float, 0> max = tensor.maximum(); auto e_x = (tensor - tensor.constant(max())).exp();
Eigen::Tensor<float, 0> e_x_sum = e_x.sum(); return e_x / e_x_sum();
}
Copy the code
Model loading and session initialization:
int32_t ModelApp::Init(const std::string& graph_file, Logger *logger)
{
auto status = NewSession(SessionOptions(), &m_session);
if(! status.ok()) { LOG_ERR(logger,"New session failed! %s", status.ToString().c_str()); return Error::ERR_FAILED_NEW_TENSORFLOW_SESSION;
}
GraphDef graph_def;
status = ReadBinaryProto(Env::Default(), graph_file, &graph_def); if(! status.ok()) { LOG_ERR(logger,"Read binary proto failed! %s", status.ToString().c_str()); return Error::ERR_FAILED_READ_BINARY_PROTO;
}
status = m_session->Create(graph_def); if(! status.ok()) { LOG_ERR(logger,"Session create failed! %s", status.ToString().c_str()); return Error::ERR_FAILED_CREATE_TENSORFLOW_SESSION;
} return Error::Success;
}
Copy the code
Run:
Tensorflow libraries up to 0.10 are thread-safe, so predict can be called in multiple threads
int32_t ModelApp::Predict(const Action& action, std::vector<int>* info, Logger *logger)
{
...
auto tensor_x = m_writer->Generate(action, logger);
Tensor phase_train(DT_BOOL, TensorShape());
phase_train.scalar<bool>()() = false;
std::vector<std::pair<std::string, Tensor>> inputs = {
{"input_x", tensor_x},
{"phase_train", phase_train}
};
std::vector<Tensor> result;
auto status = m_session->Run(inputs, {"output/output/scores"}, {}, &result); if(! status.ok()) { LOG_ERR(logger,"Session run failed! %s", status.ToString().c_str()); returnError::ERR_FAILED_TENSORFLOW_EXECUTION; }... auto scores = result[0].flat<float> (); .return Error::SUCCESS;
}
Copy the code
(5) Operation problems
Fault 1: Running alarm
Tensorflow 14:11:14 2017-08-16. 393295: W/core/platform/cpu_feature_guard. Cc: 45] The tensorflow library wasnCompiled to use SSE4.1 instructions, The largest one goes to each of the above segments. The largest one goes to each of the above segments. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn'Already compiled to use SSE4.2 instructions, The largest CPU segments consists of the largest one. Each of these segments is the largest one. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, The largest number of computations each takes place. The largest number of computations each takes place. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Copy the code
Since these CPU acceleration instructions are not compiled into the tensorFlow SO library, the acceleration instructions can be added to the compilation. In the absence of a GPU, the addition of these libraries can actually improve the CPU calculation by about 10%.
Bazel build - opt - copt c = - mavx - copt = - mfma - copt = - mfpmath = both copt = - msse4.2 - k / / tensorflow: libtensorflow_cc. SoCopy the code
It is important to note that not all cpus support these instructions, so be sure to test them in the real world to avoid ABORT.
C++ libtensorflow is mixed with python tensorflow
In order to verify the accuracy of C++ loading model call, we use swig to package C++ API into python library for python call. When importing tensorflow as tf and import encapsulated python swig interface, core dump
This is a problem tensorflow does not intend to solve officially
This article is from the global artificial intelligence wechat public account
reading
TensorFlowLite has been flooded by TensorFlowLite
TF066: TensorFlow mobile application, iOS and Android system practice
Study Notes TF065: TensorFlowOnSpark
This article has been published by Tencent Cloud Technology community authorized by the author
The original link: https://cloud.tencent.com/community/article/867586?utm_source=jueji
Massive technical practical experience, all in Tencent cloud community