This series is the first of the new TensorRT, why is it called new, because two previous articles have been written about TensorRT- version 5.0. It’s been a long time since I wrote about TensorRT, but I’m glad TO start with the new guy
TensorRT, which will be explained next, will be based on version 7.0.
TensorRT has changed a lot at the beginning of version 7, adding many new features, but the core workings of TensorRT remain the same.
- TensorRT is used to accelerate deep learning
- Speed up neural Network with TensorRT (Read ONNX model and run)
The content of this article is mainly to explain:
- How to use TensorRT custom plug-in
- How do I add my own custom operator
After reading this can let you less step on a lot of pit, guest officer remember to often come to see ah.
preface
As tensorRT continues to evolve (V5 ->v6-> V7), the way tensorRT plug-ins are used is constantly updated. The plugin interface is also constantly changing, from v5 version IPluginV2IOExt to V6 version IPluginV2DynamicExt. We don’t know if there will be a new API in the future, but that’s not an issue to worry about, because TensorRT’s post-compatibility is pretty good, so there’s no need to worry about writing an old plugin that won’t work on the new version.
Current plugin-API:
The main purpose of the TensorRT plugin is to allow us to implement operators that are not currently supported by TensorRT. At this point, we need to use the TensorRT plugin to implement our own OP. At this point, we need to implement our own OP through the interface provided by TensorRT, so the life cycle of the plugin also needs to follow TensorRT rules.
A simple understanding
As of this writing, the master branch of the TensorRT plugin is version 7.2:
Github.com/NVIDIA/Tens…
There are already quite a few plugins available, and TensorRT has an open source plugin section. . You can also see the source code and learn how the plugin was written by imitating it.
If you want to add your own operator, you can modify it in the official Plugin library, and then compile the official Plugin library. Replace the generated libnvinfer_plugin.so.7 with the original. Or write a component that looks like an official plugin, replace the name with.so, and reference the dynamic link library in TensorRT’s inference project.
In the following introduction, the IPlugin we need to write is called plug-in OP for short.
Start writing plug-ins
If you are interested, you can take a look at TensorRT’s official documentation. And the purpose of this article is to let you as little pit as possible.
First of all, according to the arrangement of official plugin, the following randomly selected official plugin:
Prepare your own plugins :custom. CPP and custom.h, copy and paste the official code and replace it with your own. Use the latest IPluginV2DynamicExt class as the interface.
We need to write two classes:
MyCustomPlugin
, inheritanceIPluginV2DynamicExt
, is a plug-in class that writes plug-in concrete implementationsMyCustomPluginCreator
, inheritanceBaseCreator
Is a plug-in factory class that is used to create the plug-in on demand
By the way, the plugin class inherits IPluginV2DynamicExt to support dynamic sizing, and other plugin class interfaces such as IPluginV2IOExt are largely similar to IPluginV2IOExt.
// Inherit IPluginV2DynamicExt
class MyCustomPlugin final : public nvinfer1::IPluginV2DynamicExt
class MyCustomPluginCreator : public BaseCreator
Copy the code
MyCustomPlugin plug-in class
Overview:
class MyCustomPlugin final : public nvinfer1::IPluginV2DynamicExt
{
public:
MyCustomPlugin( int in_channel,
const std::vector<float>& weight,
const std::vector<float>& bias);
MyCustomPlugin( int in_channel,
nvinfer1::Weights const& weight,
nvinfer1::Weights const& bias);
MyCustomPlugin(void const* serialData, size_t serialLength);
MyCustomPlugin() = delete;
~MyCustomPlugin(a)override;
int getNbOutputs(a) const override;
DimsExprs getOutputDimensions(int outputIndex, const nvinfer1::DimsExprs* inputs, int nbInputs, nvinfer1::IExprBuilder& exprBuilder) override;
int initialize(a) override;
void terminate(a) override;
size_t getWorkspaceSize(const nvinfer1::PluginTensorDesc* inputs, int nbInputs, const nvinfer1::PluginTensorDesc* outputs, int nbOutputs) const override;
int enqueue(const nvinfer1::PluginTensorDesc* inputDesc, const nvinfer1::PluginTensorDesc* outputDesc,
const void* const* inputs, void* const* outputs,
void* workspace,
cudaStream_t stream) override;
size_t getSerializationSize(a) const override;
void serialize(void* buffer) const override;
bool supportsFormatCombination(int pos, const nvinfer1::PluginTensorDesc* inOut, int nbInputs, int nbOutputs) override;
const char* getPluginType(a) const override;
const char* getPluginVersion(a) const override;
void destroy(a) override;
nvinfer1::IPluginV2DynamicExt* clone(a) const override;
void setPluginNamespace(const char* pluginNamespace) override;
const char* getPluginNamespace(a) const override;
DataType getOutputDataType(int index, const nvinfer1::DataType* inputTypes, int nbInputs) const override;
void attachToContext(cudnnContext* cudnn, cublasContext* cublas, nvinfer1::IGpuAllocator* allocator) override;
void detachFromContext(a) override;
void configurePlugin(const nvinfer1::DynamicPluginTensorDesc* in, int nbInputs,
const nvinfer1::DynamicPluginTensorDesc* out, int nbOutputs) override;
private:
int _in_channel;
std::vector<float> weight;
std::vector<float> bias;
float* weight;
float* bias;
bool _initialized;
const char* mPluginNamespace;
std::string mNamespace;
};
Copy the code
Member variables
If your plugin has weights(like weights and bias) and parameters (like kernel-size and padding in conv), you need to define them as member variables, which are of type private:
Taking MyCustomPlugin as an example, suppose our MyCustomPlugin has two weights, weight and bias, and one parameter, in_channel:
private:
int _in_channel; / / parameters
std::vector<float> _weight; // Weight is stored in CPU space
std::vector<float> _bias; // Offset the weight in CPU space
float* _d_weight; // Weight is stored in GPU space
float* _d_bias;
bool _initialized;
cudnnHandle_t _cudnn_handle;
const char* mPluginNamespace;
std::string mNamespace;
Copy the code
Constructors and destructors
Constructors are typically set to three.
The first is used in the Parse phase, where PluginCreator is used to call the constructor when the plug-in is created, passing weight information and parameters.
The second constructor is used to copy the plugin during the Clone phase.
The third is used in the Deserialize phase to pass serialized weights and parameters into the plugin and create love you oh.
Take our MyCustomPlugin for example:
MyCustomPlugin(int in_channel, nvinfer1::Weights const& weight, nvinfer1::Weights const& bias);
MyCustomPlugin(float in_channel, const std::vector<float>& weight, const std::vector<float>& bias);
MyCustomPlugin(void const* serialData, size_t serialLength);
Copy the code
The destructor needs to execute terminate, which frees up some display space before the op:
MyCustomPlugin::~MyCustomPlugin()
{
terminate(a); }Copy the code
Note that the default constructor needs to be deleted:
MyCustomPlugin() = delete;
Copy the code
getNbOutputs
For example, MyCustomPlugin will only output one Tensor, so return 1:
// MyCustomPlugin returns one output.
int MyCustomPlugin::getNbOutputs(a) const
{
return 1;
}
Copy the code
initialize
Initialize the function to be executed before the plug-in is ready to run.
Mainly initialize some parameters that open up space in advance, generally those required by CUDA operations (for example, conv operation needs to perform convolution operation, so we need to open up weight and bias video memory in advance). If our operator needs these parameters, we need to open up video memory in advance here.
Note that if the plug-in operator needs to open a relatively large video memory space, it is not recommended to apply for video memory space, you can use the workspace pointer passed by the official Tensorrt interface to obtain video memory space. Because if the plug-in is called many times by a network, and the plug-in OP needs to open up a lot of video memory, TensorRT will open up a lot of video memory according to the number of times the plug-in is called when building the network, which will easily lead to video memory overflow.
getOutputDataType
Generally speaking, our plugin op returns the same result type as the input type:
nvinfer1::DataType InstanceNormalizationPlugin::getOutputDataType(
int index, const nvinfer1::DataType* inputTypes, int nbInputs) const
{
ASSERT(inputTypes && nbInputs > 0 && index == 0);
return inputTypes[0];
}
Copy the code
getWorkspaceSize
This function needs to return the actual size (bytesize) of the intermediate video variable required by the plug-in op, which is obtained through the TensorRT interface in a canonical manner.
We need to determine how much video memory space the OP needs to run, so that we can directly use TensorRT to create space instead of applying for video memory space by ourselves.
size_t MyCustomPlugin::getWorkspaceSize(const nvinfer1::PluginTensorDesc* inputs, int nbInputs, const nvinfer1::PluginTensorDesc* outputs, int nbOutputs) const
{
// Calculate the amount of intermediate video memory you think is required during the op forward process
size_t need_num;
return need_num * sizeof(float);
}
Copy the code
enqueue
The implementation function of the plugin op, our own cuda operation can be put here (of course, the op written in C++ can also be put in, but because it is CPU implementation, the speed is slower), as usual, accept input inputs to generate outputs, pass to the corresponding pointer.
int enqueue(const nvinfer1::PluginTensorDesc* inputDesc, const nvinfer1::PluginTensorDesc* outputDesc,
const void* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream){
// If fun is the intermediate variable you need, you can use TensorRT directly to create video memory space for you
fun = static_cast<float*>(workspace);
}
Copy the code
Note that if our operation requires some intermediate variables distributed in video memory, we can get them by passing the pointer parameter workspace. The above code briefly explains how to use them.
Cu is fp32 by default, TensorRT will automatically switch to FP32 when it runs in a plugin op that does not support FP16, and then switch back when the plugin op is finished.
getOutputDimensions
When TensorRT supports dynamic-Shape, the batch dimension must be explicit. That is, the dimensions TensorRT handles are changed from three dimensions [3,-1,-1] to [1,3,-1,-1]. The latest onnX-Tensorrt must also set the explicit batchsize, and this Batch dimension is available in getOutputDimensions.
In the old IPluginV2 class, getOutputDimensions was defined as follows:
virtual Dims getOutputDimensions(int index, const Dims* inputs, int nbInputDims) TRTNOEXCEPT = 0;
Copy the code
The new IPluginV2DynamicExt class defines the following:
virtual DimsExprs getOutputDimensions(int outputIndex, const DimsExprs* inputs, int nbInputs, IExprBuilder& exprBuilder) = 0;
Copy the code
What we need to do is deduce the output dimension of the model from the input dimension in this member function. It should be noted that although the output dimension is determined by the input dimension, the output dimension is actually “determined” (that is, calculated before calculation). If the output dimension of our plugin op needs to be computed by running it, then this function will not satisfy us.
set/getPluginNamespace
Set a namespace name for this plugin. If you do not set a namespace name, the default is “”. Note that plugins with the same namespace name will conflict.
PluginFieldCollection
This is a member variable and will also be the return type of the getFieldNames member function. The main function of PluginFieldCollection is to pass the weights and parameters required by the plugin op. It is not used in the actual engine reasoning process, but in the parse (e.g. caffe2trt, onnx2trt).
When the op is parsed using these parses, the weights and parameters of the OP go through Models -> TensorRT Engine -> TensorRT Runtime.
For example, in onnx-tensorrt, we used DEFINE_BUILTIN_OP_IMPORTER to register the OP, and then parse the ONNX model, and build the model one by one according to the registered OP. If we define the op as my_custom_op, in DEFINE_BUILTIN_OP_IMPORTER(my_custom_op) we will do this:
DEFINE_BUILTIN_OP_IMPORTER(mycustom_op)
{
ASSERT(inputs.at(0).is_tensor(), ErrorCode::kUNSUPPORTED_NODE); .const std::string pluginName = "CUSTOM-OP";
const std::string pluginVersion = "001";
// This f holds the weights and parameters required by the op, obtained from the onnx model
std::vector<nvinfer1::PluginField> f;
f.emplace_back("in_channel", &in_channel, nvinfer1::PluginFieldType::kINT32, 1);
f.emplace_back("weight", kernel_weights.values, nvinfer1::PluginFieldType::kFLOAT32, kernel_weights.count());
f.emplace_back("bias", bias_weights.values, nvinfer1::PluginFieldType::kFLOAT32, bias_weights.count);
// This gets the plugin from the plugin factory and passes in the weights and parameters
nvinfer1::IPluginV2* plugin = importPluginFromRegistry(ctx, pluginName, pluginVersion, node.name(), f);
RETURN_FIRST_OUTPUT(ctx->network() - >addPluginV2(tensors.data(), tensors.size(), *plugin));
}
Copy the code
Inside the importPluginFromRegistry function, you can see that the parameter is passed to plugin via the FC variable createPlugin:
nvinfer1::IPluginV2* importPluginFromRegistry(IImporterContext* ctx, const std::string& pluginName,
const std::string& pluginVersion, const std::string& nodeName,
const std::vector<nvinfer1::PluginField>& pluginFields)
{
const auto mPluginRegistry = getPluginRegistry(a);const auto pluginCreator
= mPluginRegistry->getPluginCreator(pluginName.c_str(), pluginVersion.c_str(), "ONNXTRT_NAMESPACE");
if(! pluginCreator) {return nullptr;
}
// Accept the weight and parameter information passed to the plugin
nvinfer1::PluginFieldCollection fc;
fc.nbFields = pluginFields.size(a); fc.fields = pluginFields.data(a);return pluginCreator->createPlugin(nodeName.c_str(), &fc);
}
Copy the code
In the above steps, pluginName and pluginVersion are provided to initialize the MyCustomPluginCreator, where the createPlugin member function is what we need to write (described below).
configurePlugin
Configure the plug-in op to determine whether the number of input and output types is correct. This configuration tells TensorRT to select the appropriate algorithm to tune the model.
However, automatic tuning has not been tried so far, we generally write our own plugin execution code is fixed, the so-called tuning steps may be more for the official OP.
The configurePlugin function in the plugin below simply identifies the inputs and outputs and types.
void MyCustomPluginDynamic::configurePlugin(
const nvinfer1::DynamicPluginTensorDesc *inputs, int nbInputs,
const nvinfer1::DynamicPluginTensorDesc *outputs, int nbOutputs) {
// Validate input arguments
assert(nbOutputs == 1);
assert(nbInputs == 2);
assert(mType == inputs[0].desc.type);
}
Copy the code
clone
Clone the plugin object into TensorRT’s Builder, network, or engine. This member function calls the second constructor mentioned above:
MyCustomPlugin(float in_channel, const std::vector<float>& weight, const std::vector<float>& bias);
Copy the code
The weights and arguments of the plugin to be cloned are passed to the constructor.
IPluginV2DynamicExt* MyCustomPlugin::clone(a) const
{
//
auto plugin = new MyCustomPlugin{_in_channel, _weight, _bias};
plugin->setPluginNamespace(mPluginNamespace);
return plugin;
}
Copy the code
The clone member function is mainly used to transfer constant weights and parameters, and to copy plugin n copies, so that it can be used by different engines, Builders or networks.
getSerializationSize
Returns how many bytes need to be written to buffer during serialization.
size_t MyCustomPlugin::getSerializationSize(a) const
{
return (serialized_size(_in_channel) +
serialized_size(_weight) +
serialized_size(_bias)
);
}
Copy the code
supportsFormatCombination
TensorRT calls this method to determine whether the input/output of the POS index supports the format/data type specified by inOut[pos]. Format and inOut[pos].type.
Returns true if the plug-in supports the format/data type at inOut[pos]. If support depends on other input/output formats/data types, the plug-in can make its results depend on the format/data type in inOut[0..pos-1], which will be set to values supported by the plug-in. This function does not need to check inOut[pos + 1..nbInputs + nbOutputs-1] and pos decisions must be based only on inOut[0..pos].
bool MyCustomPlugin::supportsFormatCombination(
int pos, const nvinfer1::PluginTensorDesc* inOut, int nbInputs, int nbOutputs)
{
// Suppose there is one input and one output
assert(0 <= pos && pos < 2);
const auto *in = inOut;
const auto *out = inOut + nbInputs;
switch (pos) {
case 0:
return in[0].type == DataType::kFLOAT &&
in[0].format == nvinfer1::TensorFormat::kLINEAR;
case 1:
return out[0].type == in[0].type &&
out[0].format == nvinfer1::TensorFormat::kLINEAR; }}Copy the code
serialize
Serialize the required data into buffer in sequence.
void MyCustomPlugin::serialize(void *buffer) const
{
serialize_value(&buffer, _in_channel);
serialize_value(&buffer, _weight);
serialize_value(&buffer, _bias);
}
Copy the code
attachToContext
If the op uses something else, such as the Cublas Handle, it can directly use the Cublas Handle provided internally by TensorRT:
void MyCustomPlugin::attachToContext(cudnnContext* cudnnContext, cublasContext* cublasContext, IGpuAllocator* gpuAllocator)
{
mCublas = cublasContext;
}
Copy the code
MyCustomPluginCreator plug-in factory class
Overview:
class MyCustomPluginCreator : public BaseCreator
{
public:
MyCustomPluginCreator(a); ~MyCustomPluginCreator(a)override = default;
const char* getPluginName(a) const override; / / no introduction
const char* getPluginVersion(a) const override; / / no introduction
const PluginFieldCollection* getFieldNames(a) override; / / no introduction
IPluginV2DynamicExt* createPlugin(const char* name, const nvinfer1::PluginFieldCollection* fc) override;
IPluginV2DynamicExt* deserializePlugin(const char* name, const void* serialData, size_t serialLength) override;
private:
static PluginFieldCollection mFC;
static std::vector<PluginField> mPluginAttributes;
std::string mNamespace;
};
Copy the code
The constructor
Create an empty mPluginAttributes to initialize the mFC.
MyCustomPluginCreator::MyCustomPluginCreator()
{
mPluginAttributes.emplace_back(PluginField("in_channel".nullptr, PluginFieldType::kFLOAT32, 1));
mPluginAttributes.emplace_back(PluginField("weight".nullptr, PluginFieldType::kFLOAT32, 1));
mPluginAttributes.emplace_back(PluginField("bias".nullptr, PluginFieldType::kFLOAT32, 1));
mFC.nbFields = mPluginAttributes.size(a); mFC.fields = mPluginAttributes.data(a); }Copy the code
createPlugin
This member function creates the plugin using PluginFieldCollection, takes the weights and arguments required by the op one by one, and calls the first constructor mentioned above:
MyCustomPlugin(int in_channel, nvinfer1::Weights const& weight, nvinfer1::Weights const& bias);
Copy the code
To create a plugin.
MyCustomPlugin example:
IPluginV2DynamicExt* MyCustomPlugin::createPlugin(const char* name, const nvinfer1::PluginFieldCollection* fc)
{
int in_channel;
std::vector<float> weight;
std::vector<float> bias;
const PluginField* fields = fc->fields;
for (int i = 0; i < fc->nbFields; ++i)
{
const char* attrName = fields[i].name;
if (!strcmp(attrName, "in_channel"))
{
ASSERT(fields[i].type == PluginFieldType::kINT32);
in_channel= *(static_cast<const int32_t*>(fields[i].data));
}
else if (!strcmp(attrName, "weight"))
{
ASSERT(fields[i].type == PluginFieldType::kFLOAT32);
int size = fields[i].length;
h_weight.reserve(size);
const auto* w = static_cast<const float*>(fields[i].data);
for (int j = 0; j < size; j++)
{
h_weight.push_back(*w); w++; }}else if (!strcmp(attrName, "bias"))
{
ASSERT(fields[i].type == PluginFieldType::kFLOAT32);
int size = fields[i].length;
h_bias.reserve(size);
const auto* w = static_cast<const float*>(fields[i].data);
for (int j = 0; j < size; j++)
{
h_bias.push_back(*w);
w++;
}
}
}
Weights weightWeights{DataType::kFLOAT, weight.data(), (int64_t) weight.size(a)}; Weights biasWeights{DataType::kFLOAT, bias.data(), (int64_t)_bias.size(a)}; MyCustomPlugin* obj =new MyCustomPlugin(in_channel, weightWeights, biasWeights);
obj->setPluginNamespace(mNamespace.c_str());
return obj;
}
Copy the code
deserializePlugin
This function is called by a transformation op of onnX-Tensorrt called TRT_PluginV2, which reads the onNX model’s data and deserializes it into the network.
Some official plugin considerations
There are minor issues with using official plug-ins.
Topk problem
The official topk plugin supports k<=3840 at most. Otherwise:
[TensorRT] ERROR: Parameter check failed at: .. /builder/Layers.cpp::TopKLayer::3137, condition: k > 0 && k <= MAX_TOPK_K
Related questions: github.com/tensorflow/…
Batchednms problem
The official BatchedNMS has a maximum topK of 4096, which will crash if it is too large. However, you can modify the source code to break this number, but there are still bugs:
void (*kernel[])(const int.const int.const int.const int.const float.const bool.const bool.float *, T_SCORE *, int *,
T_SCORE *, int *, bool) = {
P(1), P(2), P(3), P(4), P(5), P(6), P(7), P(8), P(9), P(10),
P(11), P(12), P(13), P(14), P(15), P(16)};Copy the code
About plugin registration
A brief description of the plugin registration process.
When you load the nvinferruntimecommon.h header file, you get a getPluginRegistry, which contains all registered iPlugIncreators, At the time of use we get the corresponding IPluginCreator using the getPluginCreator function.
There are two ways to register a plugin. The first way is to look at the official plugin code.
extern "C" {
bool initLibNvInferPlugins(void* logger, const char* libNamespace)
{ initializePlugin<nvinfer1::plugin::GridAnchorPluginCreator>(logger, libNamespace); initializePlugin<nvinfer1::plugin::NMSPluginCreator>(logger, libNamespace); initializePlugin<nvinfer1::plugin::ReorgPluginCreator>(logger, libNamespace); .return true;
}
Copy the code
Where the initializePlugin function executes the addPluginCreator function:
template <typename CreatorType>
void initializePlugin(void* logger, const char* libNamespace)
{
PluginCreatorRegistry::getInstance().addPluginCreator<CreatorType>(logger, libNamespace);
}
Copy the code
The addPluginCreator function then executes getPluginRegistry()->registerCreator to register the pluginCreator, which completes the registration task:
void addPluginCreator(void* logger, const char* libNamespace)
{...if (mRegistryList.find(pluginType) == mRegistryList.end())
{
bool status = getPluginRegistry() - >registerCreator(*pluginCreator, libNamespace);
if (status)
{
mRegistry.push(std::move(pluginCreator));
mRegistryList.insert(pluginType);
verboseMsg = "Plugin creator registration succeeded - " + pluginType;
}
else
{
errorMsg = "Could not register plugin creator: "+ pluginType; }}else
{
verboseMsg = "Plugin creator already registered - "+ pluginType; }... }Copy the code
Another type of registration can be registered directly via REGISTER_TENSORRT_PLUGIN:
/ /!
/ /! \brief Return the plugin registry
/ /!
// When the 'nvinferruntimecommon. h' header file is loaded, a 'getPluginRegistry' is obtained
extern "C" TENSORRTAPI nvinfer1::IPluginRegistry* getPluginRegistry(a);
namespace nvinfer1
{
template <typename T>
class PluginRegistrar
{
public:
PluginRegistrar() { getPluginRegistry() - >registerCreator(instance, ""); }
private:
T instance{};
};
#define REGISTER_TENSORRT_PLUGIN(name) \
static nvinfer1::PluginRegistrar<name> pluginRegistrar##name {}
} // namespace nvinfer1
Copy the code
That is, if we have already executed REGISTER_TENSORRT_PLUGIN(BatchedNMSPluginCreator) in the.h file of the plugin; There is no need to create an official initLibNvInferPlugins() function to register one by one.
Refer to the link
Github.com/NVIDIA/Tens… Github.com/triton-infe… Blog.csdn.net/u010552731/… Docs.nvidia.com/deeplearnin… Forums.developer.nvidia.com/t/tensorrt-… Forums.developer.nvidia.com/t/tensorrt-… Github.com/NVIDIA/Tens… Forums.developer.nvidia.com/t/unable-to…
DCNv2-github
Github.com/CharlesShan… Github.com/chengdazhi/…
communication
If you are like-minded with me, Lao Pan is willing to communicate with you; If you like Lao Pan’s content, welcome to follow and support. The blog is updated with an in-depth original article every week. Follow the public account “Oldpan Blog” and don’t miss the latest articles. Lao Pan will also organize some of his own private collection, hope to help you, the public account reply “888” to get Lao Pan learning route information and article summary, there are more waiting for you to dig. If you don’t want to miss Penn’s latest tweets, check out the mystery link.