Introduce TensorFlow Lite
TensorFlow Lite is a lightweight solution customized to solve the problem of overloading TensorFlow on both mobile and embedded platforms. It is a completely separate project from TensorFlow and has little code sharing with TensorFlow. TensorFlow itself is designed for desktop and server applications and is not optimized for ARM mobile platforms, so it is not comfortable to use directly on mobile or embedded platforms. TensorFlow Lite implements a low-power, low-latency machine learning framework for mobile platforms and makes the compiled binary distribution smaller. TensorFlow Lite not only supports traditional ARM acceleration, but also provides support for the Android Neural Networks API, providing better performance on ann-enabled devices.
TensorFlow Lite not only uses ARM Neon instruction set acceleration, but also presets activation functions that provide quantization, speed up execution, and reduce model size.
TensorFlow Lite features
TensorFlow Lite has a number of features that enable it to perform well on mobile platforms. The following are the features of TensorFlow Lite:
- Support for a set of core operators, all of which support floating-point input and quantized data, that are individually optimized and customized for mobile platforms. These operators also include preset activation functions to improve computing performance on mobile platforms while ensuring the accuracy of quantized calculations. We can use these operators to create and execute our own custom model, or we can write our own custom operator implementation if we need some special operators in the model.
- A new model file format is defined for mobile platforms that is based on FlatBuffers. FlatBuffers is a high-performance open source cross-platform serialization library that is very similar to Protobuf, but the big difference between the two is that FlatBuffers do not require any parsing or reporting of the data before it is accessed (corresponding to Protobuf’s compression mechanism). Because the data format of FlatBuffers is generally memory aligned. In addition, the code for FlatBuffers is smaller than for Protobuf, making it easier to integrate with mobile platforms. Therefore, TensorFlow Lite and TensorFlow model file formats are different.
- Provides a web interpreter optimized for mobile platforms, which makes the entire code much simpler and faster. The idea of optimizing the interpreter is to use static graph paths to speed up decision-making at runtime, while customizing the memory allocator to reduce dynamic memory allocation, ensuring reduced model loading and initialization time and resource consumption, and improving execution speed.
- Hardware acceleration interface is provided, one is the traditional ARM instruction set acceleration, one is the Android Neural Networks API. If the target device is running Android 8.1 (API 27) or later, you can use Android’s built-in acceleration API to speed up the overall execution.
- Model transformation tools are provided to convert training models generated by TensorFlow into TensorFlow Lite models. This solves the problem of the different TensorFlow and TensorFlow Lite model formats.
- The compiled binary volume is very small. Under the optimized setting of O3 with ARM Clang, the whole library is less than 300KB after compilation, which can basically meet the operators required by most deep learning networks at present.
- Both Java and C++ apis are provided to facilitate the integration of TensorFlow Lite into Android apps and embedded applications.
TensorFlow Lite architecture
To understand how TensorFlow Lite achieves these features, we must first understand the architectural design of TensorFlow Lite. The architecture design of TensorFlow Lite is shown in The following figure.
We can understand the difference between TensorFlow Lite and TensorFlow in this way. First, we need to train a TensorFlow model file. Then use the TensorFlow Lite model converter to convert the TensorFlow pattern into a TensorFlow Lite model file (.tflite format). The converted files can then be used in mobile applications. We can use TensorFlow Lite on Android and iOS to load the converted. Tflite model file. TensorFlow Lite provides the following invocation methods.
- Java API: the Java API encapsulated by C++ API on Android is easy to call directly by the application layer of the Android App.
- C++ API: can be used to load TensorFlow Lite model files to construct the call interpreter. The API is available on both Android and iOS platforms.
- Interpreter: An accounting method that executes the model and invokes operators based on the network structure. Accounting method loading is optional. If we don’t use accelerated accounting, we only need 100KB of space; If all the accelerated accounting methods are linked, it is only 300KB, which is very small overall. On some Android devices, the interpreter directly calls the Android Neural Networks API for hardware acceleration.
Finally, we can implement our own accounting method through the C++ API and load it through the interpreter for deep web implementation.
TensorFlow Lite code structure
TensorFlow Lite and TensorFlow are two basically independent sets of code, the complete code can be found in TensorFlow’s GitHub repository illustration. Here is a brief introduction to the code structure of TensorFlow Lite, which will be of great help later in the game. There is a Lite directory under the TensorFlow code root, which is the complete TensorFlow Lite code. To make it easier for the reader to read the complete TensorFlow code, here is a brief introduction to the TensorFlow Lite code structure and some important files, as shown in the following table.
Directories/Files | explain |
---|---|
c | TensorFlow Lite C API layer implementation, this part of the interface can also be used in C++ |
core | TensorFlow Lite core interface and implementation, mainly including SubGraph structure, FlatBuffer conversion and model operator analysis and other functions |
delegates | TensorFlow Lite proxy implementation, this part of the code is currently experimental interface, mainly used to support TensorFlow Lite to call GPU or NNAPI interface to complete actual computing tasks. This ensures that a consistent invocation interface is provided to the caller no matter what the computation causes. |
java | TensorFlow Lite Java interface implementation, including C++ JNI implementation and Java class definition, this will be detailed in the mobile platform implementation. |
kernels | TensorFlow’s core function implementation, including all activation functions and operator implementation, can satisfy most deep learning tasks |
kernels/internal | The core internal implementation of TensorFlow provides the necessary infrastructure for operator implementation, including basic type definition, tensor implementation, quantization implementation and even MFCC algorithm. The kernels/internal/optimized directory is optimized implementations for different mobile platforms, including implementation based on the acceleration of Eigen or Neon. The Neon implementation includes a number of assembly and C API optimizations. |
nnapi | Android NNAPI implementation, this part can only be used under Android. |
profiling | TensorFlow Lite internal performance measurement statistics implementation. |
python | TensorFlow Lite Python interface implementation, including Python C layer implementation and Python layer class and function definition, this will be detailed in the mobile platform implementation. |
schema | TensorFlow Lite data structure FlatBuffer schema definition, mainly includes header file generator and model definition file. The schema definition file consists of the FBS file (the original definition) and the header file used in the C code generated by the FlatBuffer compiler. |
toco | Model transformation tool implementation, including model, model quantization and other model optimization implementation. Support for direct calls from the command line as well as Python interface calls for easy integration into our own build toolchain |
tools | Internal tool library implementation, including model accuracy measurement library, interpreter performance statistics library, model optimization library, model validation library, etc |
allocation.h/cc | Internal allocator implementation, the main purpose is to optimize the mobile platform memory allocation implementation |
builtin_ops.h | Internal operator definition, mainly defines the enumeration value of the internal operator |
context.h | TensorFlow Lite implements the Context implementation. Since the TensorFlow Lite Context has been defined in the C implementation, this file directly refers to the internal implementation of the C-layer API |
graph_info.h/cc | TensorFlow Lite graph information implementation, mainly defines the graph data structure, including all nodes of the graph, input, output and storage variables |
interpreter.h/cc | Interpreter implementation for TensorFlow Lite. Defines the Interpreter class, which is the outer interface for the C++ layer to call TensorFlow Lite. This class mainly supports: 1, to establish the internal structure of the graph and model; 2. Add internal nodes of the graph and set node parameters; 3. Get/set the input of the graph; 4. Get/set the output of the graph; 5. Get/set the variables of the graph; 6. Obtain/modify the internal tensor data of the graph; 7. Other parameter Settings; 8. Execution model |
model.h/cc | TensorFlow Lite model implementation, mainly including FlatBufferModel class and InterpretBuilder class. The FlatBufferModel class is responsible for reading the data from the model file and converting it into TensorFlow Lite’s internal model data structure. The InterpretBuilder class is responsible for building an Interpreter object from the user-specified FlatBufferModel and OpResolver. Avoid manually building diagrams at user runtime |
nnapi_delegate.h/cc | The proxy implementation of NNAPI, primarily used to separate the interface from the implementation |
string_util.h/cc | StringRef type implementation, definition encapsulates internal string storage and associated utility function implementation |
util.h/cc | Other tool function implementation |