TensorFlow c ++ practice and various pits!

preface

Tensorflow’s official website currently only contains python, C, Java, and Go release packages, and there is no C++ release package. Tensorflow’s official website also indicates that it does not guarantee the stability of libraries other than Python, and Python is the most complete in terms of functions. As is known to all, python has a huge advantage in developing efficiency, ease of use, but as an interpreted language, in terms of performance there is a larger defects, in the process of all kinds of AI service, using python as a model to quickly build tool, use the advanced language (such as c + +, Java) as program implementation of service is the trend of The Times. This paper focuses on the implementation of tensorflow C++ servitization and various problems encountered.

Implementation scheme

There are two ways to use the tensorflow c++ library:

(1) the best way of course is to build graph directly in C++, but the current C++ tensorflow library is not full-featured like the python API. See builds a small graph in c++ here. The c++ tensorflow API also includes classes for digital kernel implementations of the CPU and gpu that can be used to add new op’s. Can refer to www.tensorflow.org/extend/addi…

(2) the common way c++ calls python to generate a good graph. This paper mainly introduces the scheme.

Implementation steps

(1) compiling tensorflow source C++ so (2) model training output (3) model solidification (4) model loading and running (5) running problems

(1) source code compilation

Environment: tlinux2.2, GCC >= 4.8.5 components to be installed: protobuf 3.3.0 bazel 0.5.0 python 2.7 java8 machine: 4GB memory

A. Install java8 yum install Java

B. Protobuf 3.3.0 Download download github.com/google/prot…

./configure && make && make install

C. Install bazel download github.com/bazelbuild/… Sh bazel 0.5.0 – installer – Linux – x86_64. Sh

D. Compile source code using the latest release: github.com/tensorflow/…

bazel build //tensorflow:libtensorflow_cc.so

Problems you may encounter during compilation:

Problem a: fatal error: unsupported/Eigen/CXX11 / Tensor: No to the file or directory

Eigen3.3 or above: Java.io. IOException: Cannot run program “patch”

yum install patch

Problem three: Insufficient memory

(2) Model training and output

Output can be reference to use case model training to practice blog. Metaflow. Fr/tensorflow -… , also many on Google, model training saved to get the following file:

(3) Model solidification

There are three curing modes of the model:

A. freeze_graph tools

bazel build tensorflow/python/tools:freeze_graph && bazel-bin/tensorflow/python/tools/freeze_graph 
        --input_graph=graph.pb 
        --input_checkpoint=checkpoint 
        --output_graph=./frozen_graph.pb 
        --output_node_names=output/output/scores
Copy the code

B. Use the freeze_graph.py tool

# We save out the graph to disk, and then call the const conversion# routine.checkpoint_state_name = "checkpoint"
input_graph_name = "graph.pb"
output_graph_name = "frozen_graph.pb"

input_graph_path = os.path.join(FLAGS.model_dir, input_graph_name)
input_saver_def_path = ""
input_binary = False
input_checkpoint_path = os.path.join(FLAGS.checkpoint_dir, 'saved_checkpoint') + "0"# Note that we this normally should be only "output_node"!!! output_node_names = "output/output/scores"
restore_op_name = "save/restore_all"
filename_tensor_name = "save/Const:0"
output_graph_path = os.path.join(FLAGS.model_dir, output_graph_name)
clear_devices = False

freeze_graph.freeze_graph(input_graph_path, input_saver_def_path,
                          input_binary, input_checkpoint_path,
                          output_node_names, restore_op_name,
                          filename_tensor_name, output_graph_path,
                          clear_devices)
Copy the code

C. Use Tensorflow Python

import os, argparseimport tensorflow as tffrom tensorflow.python.framework import graph_util

dir = os.path.dirname(os.path.realpath(__file__))def freeze_graph(model_folder):
    # We retrieve our checkpoint fullpath
    checkpoint = tf.train.get_checkpoint_state(model_folder)
    input_checkpoint = checkpoint.model_checkpoint_path    # We precise the file fullname of our freezed graph
    absolute_model_folder = "/".join(input_checkpoint.split('/')[:-1])
    output_graph = absolute_model_folder + "/frozen_model.pb"
    print output_graph    # Before exporting our graph, we need to precise what is our output node
    # This is how TF decides what part of the Graph he has to keep and what part it can dump
    # NOTE: this variable is plural, because you can have multiple output nodes
    output_node_names = "output/output/scores"

    # We clear devices to allow TensorFlow to control on which device it will load operations
    clear_devices = True

    # We import the meta graph and retrieve a Saver
    saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)    # We retrieve the protobuf graph definition
    graph = tf.get_default_graph()
    input_graph_def = graph.as_graph_def()    # fix batch norm nodes
    for node in input_graph_def.node:        if node.op == 'RefSwitch':
            node.op = 'Switch'
            for index in xrange(len(node.input)):                if 'moving_' in node.input[index]:
                    node.input[index] = node.input[index] + '/read'
        elif node.op == 'AssignSub':
            node.op = 'Sub'
            if 'use_locking' in node.attr: del node.attr['use_locking']    # We start a session and restore the graph weights
    with tf.Session() as sess:
        saver.restore(sess, input_checkpoint)        # We use a built-in TF helper to export variables to constants
        output_graph_def = graph_util.convert_variables_to_constants(
            sess, # The session is used to retrieve the weights
            input_graph_def, # The graph_def is used to retrieve the nodes 
            output_node_names.split(",") # The output node names are used to select the usefull nodes
        ) 

        # Finally we serialize and dump the output graph to the filesystem
        with tf.gfile.GFile(output_graph, "wb") as f:
            f.write(output_graph_def.SerializeToString())
        print("%d ops in the final graph." % len(output_graph_def.node))if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_folder".type=str, help="Model folder to export")
    args = parser.parse_args()

    freeze_graph(args.model_folder)
Copy the code

Pit BatchNorm bug

In the actual project, the above error is reported when the generated model is loaded using tensorflow c++ API in methods 1 and 2, and the same error is reported when the model is loaded using tensorflow python:

The reason is that BatchNorm is used in the model, and the repair method is as given in C above

(4) Model loading and running

Building inputs and outputs

The Tensor and Eigen::Tensor are very difficult to use compared to python’s numpy library, especially for dynamic matrices. If your compiler supports C++14, you can use xTensor. As powerful as Numpy, and uses machines similarly. Take a look at the eigen library and the tensorflow::Tensor documentation if it’s C++11. Example set simple usage:

Matrix assignment:

tensorflow::Tensor four_dim_plane(DT_FLOAT, tensorflow::TensorShape({1, MODEL_X_AXIS_LEN, MODEL_Y_AXIS_LEN, fourth_dim_size}));
auto plane_tensor = four_dim_plane.tensor<float, 4 > ();for (uint32_t k = 0; k < array_plane.size(); ++k)
{    for (uint32_t j = 0; j < MODEL_Y_AXIS_LEN; ++j)
    {        for(uint32_t i = 0; i < MODEL_X_AXIS_LEN; ++i) { plane_tensor(0, i, j, k) = array_plane[k](i, j); }}}Copy the code

SOFTMAX:

Eigen::Tensor<float, 1> ModelApp::TensorSoftMax(const Eigen::Tensor<float, 1>& tensor)
{
    Eigen::Tensor<float, 0> max = tensor.maximum();    auto e_x = (tensor - tensor.constant(max())).exp();
    Eigen::Tensor<float, 0> e_x_sum = e_x.sum();    return e_x / e_x_sum();
}
Copy the code

Model loading and session initialization:

int32_t ModelApp::Init(const std::string& graph_file, Logger *logger)
{
    auto status = NewSession(SessionOptions(), &m_session); 
    if(! status.ok()) { LOG_ERR(logger,"New session failed! %s", status.ToString().c_str());        return Error::ERR_FAILED_NEW_TENSORFLOW_SESSION;
    }

    GraphDef graph_def;
    status = ReadBinaryProto(Env::Default(), graph_file, &graph_def);    if(! status.ok()) { LOG_ERR(logger,"Read binary proto failed! %s", status.ToString().c_str());        return Error::ERR_FAILED_READ_BINARY_PROTO;
    }

    status = m_session->Create(graph_def);    if(! status.ok()) { LOG_ERR(logger,"Session create failed! %s", status.ToString().c_str());        return Error::ERR_FAILED_CREATE_TENSORFLOW_SESSION;
    }    return Error::Success;
}
Copy the code

Run:

Tensorflow libraries up to 0.10 are thread-safe, so predict can be called in multiple threads

int32_t ModelApp::Predict(const Action& action, std::vector<int>* info, Logger *logger)
{
    ...
    auto tensor_x = m_writer->Generate(action, logger);

    Tensor phase_train(DT_BOOL, TensorShape());
    phase_train.scalar<bool>()() = false;
    std::vector<std::pair<std::string, Tensor>> inputs = {
        {"input_x", tensor_x},
        {"phase_train", phase_train}
    }; 

    std::vector<Tensor> result;
    auto status = m_session->Run(inputs, {"output/output/scores"}, {}, &result);    if(! status.ok()) { LOG_ERR(logger,"Session run failed! %s", status.ToString().c_str());        returnError::ERR_FAILED_TENSORFLOW_EXECUTION; }... auto scores = result[0].flat<float> (); .return Error::SUCCESS;
}
Copy the code

(5) Operation problems

Fault 1: Running alarm

Tensorflow 14:11:14 2017-08-16. 393295: W/core/platform/cpu_feature_guard. Cc: 45] The tensorflow library wasnCompiled to use SSE4.1 instructions, The largest one goes to each of the above segments. The largest one goes to each of the above segments. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn'Already compiled to use SSE4.2 instructions, The largest CPU segments consists of the largest one. Each of these segments is the largest one. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, The largest number of computations each takes place. The largest number of computations each takes place. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Copy the code

Since these CPU acceleration instructions are not compiled into the tensorFlow SO library, the acceleration instructions can be added to the compilation. In the absence of a GPU, the addition of these libraries can actually improve the CPU calculation by about 10%.

Bazel build - opt - copt c = - mavx - copt = - mfma - copt = - mfpmath = both copt = - msse4.2 - k / / tensorflow: libtensorflow_cc. SoCopy the code

It is important to note that not all cpus support these instructions, so be sure to test them in the real world to avoid ABORT.

C++ libtensorflow is mixed with python tensorflow

In order to verify the accuracy of C++ loading model call, we use swig to package C++ API into python library for python call. When importing tensorflow as tf and import encapsulated python swig interface, core dump

This is a problem tensorflow does not intend to solve officially

This article is from the global artificial intelligence wechat public account

reading

TensorFlowLite has been flooded by TensorFlowLite

TF066: TensorFlow mobile application, iOS and Android system practice

Study Notes TF065: TensorFlowOnSpark

This article has been published by Tencent Cloud Technology community authorized by the author

The original link: https://cloud.tencent.com/community/article/867586?utm_source=jueji

Massive technical practical experience, all in Tencent cloud community