1New Jiyuan translation

Source: AWS AI Blog



Author: Li Mu

Translator: Fei Xinxin

A new compiler, NNVM, can greatly simplify the design of new AI front-end frameworks and back-end hardware, providing users with consistent results across a variety of front-end and back-end frameworks. The NNVM compiler compiles advanced computational graphs into optimized machine code, matching or even exceeding state-of-art performance on two very different hardware (ARM CPUS and Nvidia Gpus) with minimal effort.


There are many artificial intelligence (AI) frameworks you can choose from to develop AI algorithms. You can also choose from a variety of hardware to train and deploy AI models. Diversity of frameworks and hardware is critical to keeping the AI ecosystem healthy. However, this diversity also presents several challenges for AI developers. This article Outlines these challenges and introduces a compiler solution to help solve them.

Let’s first review the challenges, introduce the UNIVERSITY of Washington and the AWS research team, and then explain how the compiler works.

First, switching from one AI framework to another is cumbersome due to the difference between the front-end interface and the back-end implementation. In addition, algorithm developers may use multiple frameworks in their development and delivery processes. At AWS, we have customers who want to deploy their Caffe model on MXNet to enjoy accelerated performance on Amazon EC2. According to Joaquin Candela’s recent blog, users may use PyTorch for rapid development and then deploy on Caffe2. However, we have also heard about the difficulty of debugging for differences in results after transforming models from one framework to another.

Second, framework developers need to maintain multiple backends to ensure the performance of hardware ranging from smartphone chips to data center Gpus. MXNet, for example, has a portable C++ implementation built from scratch that comes with specialized back-end support such as CuNNN for nvidia gpus and MKLML for Intel cpus. Ensuring that these different backends provide consistent numerical results for users can be challenging.

Finally, each new chip made by chip makers also needs to support multiple AI frameworks. The workload in each framework is represented and performed in a unique way, so even operations like convolution may need to be defined differently. Therefore, supporting multiple frameworks also requires a lot of engineering in chip design and manufacturing.

Different AI frameworks and hardware provide significant benefits to users, but it is challenging for AI developers to provide consistency to end users. Fortunately, we’re not the first to face this problem. Computer science has a long history of running various programming languages on different hardware. A key technique to solve this problem is the compiler. Powered by compilation technology, Tianqi Chen, Thierry Moreau, Haichen Shen, Luis Ceze, from the Paul Allen School of Computer Science and Engineering at the University of Washington (UW), A group of researchers, including Carlos Guestrin and Arvind Krishnamurth, along with Ziheng Jiang of the AWS AI team, proposed the TVM stack to simplify this problem.

AWS is pleased to join the UW research team today to announce an end-to-end compiler based on the TVM stack, which compiles workload directly from various deep learning front ends into optimized machine code.

Let’s look at the compiler architecture first.

Note that a typical AI framework can be divided into three parts:

  • The front end opens up an easy-to-use interface to the user;

  • Workload received from the front end is usually represented as a calculation graph composed of data variables (A, B, and C) and operators (* and +);

  • Everything from basic arithmetic operations to neural network layer operators can be implemented and optimized for multiple hardware.

The new compiler, the NNVM compiler, is based on two components in the TVM stack: NNVM for calculating graphs and TVM for tensor operators.

NNVM – Computes the graph intermediate representation (IR) stack

The goal of NNVM is to represent the workload of different frameworks as standard computation diagrams and then convert these high-level diagrams into execution diagrams. This graph is inspired by the layer definitions in Keras and numpy’s tensor operators.

NNVM also comes with a routine called Pass that follows the LLVM convention. These routines can add new properties to the diagram to execute them or modify the diagram for efficiency.

TVM – Tensor IR stack

The TVM from Halide implements the operators used in the computation diagrams and is optimized for the target back-end hardware. Unlike NNVM, TVM provides a hardware-independent domain-specific language that simplifies operator implementation at the tensor index level. TVM also provides scheduling primitives (such as multithreading, tiling, and caching) to optimize computation to take full advantage of hardware resources. These plans are hardware-specific and can be manually coded or automatically searched for optimized patterns.

The following figure shows the supported front and back ends.

MXNet’s direct conversion of its computational diagrams to NNVM diagrams is supported, and Keras is supported in a similar manner, but is under development. The NNVM compiler can also adopt a model format, such as CoreML. Therefore, any framework that can use these formats can use this compilation stack.

TVM currently ships with multiple code generators that support a variety of back-end hardware. For example, TVM generates LLVM IR for cpus such as X86 and ARM, and can also output CUDA, OpenCL, and Metal cores for various Gpus.

Adding new support is also simple. For the new front end, we just need to convert its workload to NNVM, which defines the calculation graph and operator specifications. To add new hardware, we can reuse TVM’s operator implementation by specifying a valid schedule.

We used MXNet to demonstrate NNVM compiler performance with front-end hardware configurations: Raspberry PI on ARM and Nvidia GPU on AWS. While the architectural differences between the two chips are huge, the only difference in code is in the scheduling part.

The schedule of GPU was mainly written by Leyuan Wang (AWS) and Yuwei Hu (TuSimple) during their internship. Compare NNVM compiler with MXNet, Nvidia K80’s cuDNN back end. Things like deep convolution operators that are not effectively supported in cuDNN are implemented through hand-optimized CUDA kernels. As you can see, the NNVM compiler runs ResNet18 and MobileNet slightly better (1.2 times faster) than the cuDNN back end.

In the case of Raspberry PI, we selected the optimal schedule through an automatic tuner. We benchmarked operator performance on Raspberry Pi to give the optimal schedule for each operator in a given shape.

Again, compare the NNVM compiler to MXNet. MXNet has OpenBLAS and NNPACK enabled by default, and we also manually turned on Winograd convolution in NNPACK for best performance.

As you can see, the NNVM compiler is 2.2 times faster on Resnet18. On MobileNet, the difference was 11.5 times. This is mainly due to the fact that deep convolution is not optimized in MXNet (due to the lack of such operators in the DNN library), whereas the NNVM compiler benefits from directly generating efficient code.

We introduced the NNVM compiler, which compiles high-level computational diagrams into optimized machine code. The NNVM compiler is based on two components in the TVM stack: NNVM uses graph optimization routines to provide specification for computation graphs and operators, which are implemented and optimized for target hardware using TVM. We demonstrated that with very little effort, this compiler can match or even exceed state-of-art performance on two completely different hardware (ARM CPU and Nvidia GPU).

Our hope is that the NNVM compiler will greatly simplify the design of new AI front-end frameworks and back-end hardware, and help provide consistent results for users across the various front-end and back-end ends.

The original link: https://amazonaws-china.com/cn/blogs/ai/introducing-nnvm-compiler-a-new-open-end-to-end-compiler-for-ai-frameworks/