Run the deep learning framework model on the Web - MegEngine

At the end of last week’s open Source Deep Learning Framework Project Engagement, we mentioned that MegEngine has implemented MegEngine. Js — the MegEngine javascript version, with the help of community developers, You can quickly deploy the MegEngine model in a javascript environment.

This project is the open Source Software Supply Chain Lighting Project – Summer 2021. This article is an excerpt from the final report written by Tricster, the developer of MegEngine. enjoy~

Project information

Scheme described

Use WebAssembly to associate MegEngine with the Web.

My implementation will keep most of the C++ source code, rewrite Python in Typescript, and finally use WebAssembly to connect Typescript to C++.

The advantage of this is that reusing the operators in MegEngine, even including the definition of the model and the serialization method, ensures maximum compatibility between MegEngine and MegEngine.

Why is Megengine. Js needed?

Before building a wheel, it is best to know the value of the wheel and avoid repeating the wheel. The value of Megengine. Js is mainly in two aspects:

Demand for on-end computing increases

With the development of deep learning, users are becoming more aware of their privacy and data protection. If the application needs to upload some sensitive data (ID photos, etc.) to the server, users will definitely be worried. The increasing computing power of edge devices also makes on-end computing feasible. In addition to call API to calculate the system level, such as small WeChat programs need to run inside another program, can’t direct contact with the system API, there is no more appropriate method to calculate, many small deep learning application program is still need to send the data to the server to calculate, in high-risk scenario doesn’t work.

The demand on the Web side increases

It must be admitted that the Web has a strong ability of expression, and many novel ideas can be implemented on the Web and achieve good results. However, almost all deep learning frameworks at present do not provide THE JS interface, so they cannot run on the Web. If the deep learning framework can be conveniently run on the Web, There will be a lot of interesting applications.

What is the architecture of Megengine. Js?

A good way to get a quick look at a project is to take a high-level look at the architecture, the technologies used in the project, and then dive into the details of the code.

The architecture of most deep learning frameworks

It is not hard to find that almost all deep learning frameworks have similar architectures, which are mainly divided into three parts:

Basic computing module: support different devices, different architectures, provide unified interface upward, efficiently complete the calculation, general useCorC++To write.
The main logic module of the framework: based on the basic operation module, the main logic of deep learning training and reasoning is completed, including but not limited to: the construction of calculation graph, the realization of differential module, and the realization of serialization and deserializationC++To write.
External interfaces: Many users of deep learning frameworks are not familiar with themC++, therefore, it is necessary toC++To create bindings in a variety of other languages, the most common is usingPybindTo create aPythonBinding. In this way, users can be retainedPythonGood performance despite ease of use.

Pytorch, for example, has this three-tier structure:

ATen 和 C10Provide basic computing power.
byC++Implement the core logic.
willC++Partly as aExtension 供 PythonCall, only inPythonFor simple packaging.

MegEngine, for example

The structure of the MegEngine file is pretty clear. It’s basically as follows:

.Heavy Exercises ── heavy Exercises ─ SRCCopy the code

Although MegEngine has a similar structure, it’s still a bit different.

MegDnn in the DNN folder is a low-level computing module that supports different architectures and platforms, such as x86, CUDA, and ARM. These modules are implemented differently, but they all provide a common interface to be called by MegEngine.

As shown in the figure on the right, operators of different architectures form a tree structure according to inclusion relations. Although leaf operators are now commonly used, naive and fallback operators are also an important part of the development process and can be very helpful in implementing new operators.

In addition, sampling such a tree structure, can be very good reuse code, for example, we can only achieve part of the operator, other operators can look up the existing implementation, can save a lot of work.

MegDnn operator organization diagram

SRC contains the main code for MegEngine, how to build a static diagram, the basic definition of Tensor, and a lot of optimizations for storing and calculating diagrams. Basically, just MegDnn and SRC code. It can carry out efficient Inference Only, which does not include the part required by the training model and is more used in the deployment of relevant scenes.

The final imperative completes other parts of a neural network framework, such as back propagation, definitions of various layers, and optimizers, and provides an easy-to-use interface to the outside world using Python. It is worth mentioning that in imperative, C++ and Python are deeply coupled using Pybind, so that Python is no longer just an exposed interface, but is involved in writing the execution logic as part of the framework. The ability to convert a dynamic graph to a static graph, for example, is a good example, both using decorators in Python and working with the static graph part of C++.

This architecture is intuitive and flexible. If you want to increase the capability of the underlying computing modules, you only need to modify MegDnn. If you want to add static graph-related features, just change the SRC section; If you want to add more interface functionality externally, just modify Imperative to do so.

Theoretically, if you wanted to convert MegEngine Port to another language, you could just replace imperative, but since C++ is tightly coupled to Python in imperative, you have to strip all Python parts first. Then, as needed, add the target language implementation (C++, JS, or another language).

Design idea of MegEngine. Js

Based on the above analysis, Megengine. Js uses the architecture shown below.

The implementation of MegEngine reuse, including computing modules, and computing diagrams; Then, imitating the Python part, write a Runtime in C++ that completes the functionality provided in imperative and stores all states; We then use WebAssembly to expose all of the above modules to TypeScript and use TypeScript to implement the rest of the logic, providing an easy-to-use interface for users to use.

With this architecture, megengine.js is best integrated into MegEngine as a top-level module, rather than implementing a Web-side deep learning framework from scratch like tensorflow.js. The advantage of this is that not only can MegEngine enjoy the highly optimized features of MegEngine, but it can also run the MegEngine trained model directly, paving the way for future deployments.

What is the status of Megengine. Js?

From a framework point of view

At present, MegEngine. Js is a working framework, which proves the feasibility of the whole implementation. Users can run static graph models directly from MegEngine using MegEngine. You can build your own network from scratch, train, reason, and load and save your models in a browser. You can also do this from a Node environment.

MegEngine. Js has been posted to NPM and can be easily downloaded from there.

megenginejs

From the completion of the task

All tasks listed in the original mission statement have been completed:

Models and data can be loaded

Static graph models dumped by MegEngine can be loaded and run directly, supporting graph optimization and storage optimization in the original framework.

The forward OP of Dense/Matmul (required) passed single test

Twenty-one common operators, including Matmul, are implemented and all pass unit tests.

Running through linear regression model forward, running through linear regression model backward and training

The task is complete. For the implementation, see Demo3

Mnist model forward running, mnIST backward running and training

The task is complete. For the implementation, see Demo4

Mnist demo

Mnist training and verification have been completed, but relevant visualization (loss change, accuracy change, test sample) has not been realized. See Demo4

Resolve performance bottlenecks

In addition, due to the limitations of WebAssembly and the cross-platform nature of the Web, I couldn’t use the highly optimized operators in MegEngine, which didn’t give me a smooth experience at the beginning, so I referred to tensorflow.js after the middle stage. XNNPACK was introduced and a new set of operators was implemented, which effectively increased the speed of Megengine.

Benchmark of operator on MacOS platform, the running time of convolution operator is reduced by 83%.

WASM.BENCHMARK_CONVOLUTION_MATRIX_MUL (6169 ms)
WASM.BENCHMARK_CONVOLUTION_MATRIX_MUL (36430 ms)
Copy the code

Mnist training in Safari resulted in a 52% drop in single session time.

Main Achievements Exhibition

Demo1

Megengine. Js Playground. Users are free to use Megengine.

Megengine.js Starter

Demo2

Megengine. Js Model Executor. Users can load the Megengine Model and reason. The Model used in the Demo was exported using sample code from the official MegEngine repository.

Megengine.js Model Executor

Demo3

Linear Regression, Linear Regression Demo, showing dynamic training using Megengine.

Megengine.js Linear Regression

Demo4

Megenging. Js Mnist, realizes the complete handwritten number recognition training and verification.

Megengine.js Mnist

More Demo

See the Example folder in the warehouse.

Megenginejs/Example · Megjs · Summer201/210040016

What were the problems encountered in implementing Megengine. Js?

Although the architecture was conceived from the beginning and the layering was fairly clear, many problems were encountered.

Compilation problems

Problem description

MegEngine is written in C++, so the first step should be to compile MegEngine into WebAssembly. Emscripten can compile a simple C++ program into WASM, but for a project of MegEngine’s size, There’s no way to compile without changing it.

The solution

The biggest problem is that the MegDnn operator library contains too many platform dependent parts and optimizations. After trying many solutions, there is still no way to include those optimizations, so we have to remove all optimizations first, use Naive Arch, and turn off other compilation options. The compilation is complete.

However, the processing here has to choose a slower operator, resulting in the overall speed of the framework is not ideal.

Interaction problem

Problem description

Both MegEngine and megengine.js require the underlying language written in C++ to interact with other languages. With Pybind, you can more closely combine C++ and Python to create and manage C++ objects in Python, but on the Emscripten side, you can either use the lower-level ccall and cwrap, Or you can use Embind to bind C++ objects to Python. Embind mimics Pybind but does not provide a better way to manage C++ objects, so it cannot tightly couple Python and C++ the way Pybind does.

Ideally, JS and C++ should manage the same variables, like the Tensor created by Python, inherited from the Tensor created by C++, and then when a Tensor goes out of scope in Python and is collected by the GC, it also destroys the resources created in C++. The Tensor can be passed back and forth between C++ and Python as a parameter. It’s tightly coupled and very intuitive.

Cwrap and CCall only support basic types. Embind supports binding custom classes, but it is cumbersome to use. Variables declared using this method must be manually deleted, adding a lot of burden.

The solution

In this case, I’ve built a Runtime into C++, and I’ll use that Runtime to manage the Tensor’s life cycle and keep track of the state variables that happen when you run your application.

For example, when you create a Tensor in JS, you’ll copy the actual numbers into C++, you’ll create a Tensor in C++ to actually manage the numbers (also the Tensor used in MegEngine), and then you give the C++ Runtime to manage the numbers. Take this Tensor ID and give it back to JS. So Tenosr in JS is more like a pointer to that Tensor in C++.

You need to manage the correspondence between C++ and JS at the Tensor level, but you’ll have a lot easier interaction between JS and C++ at the Tensor level. You can pass the Tensor through basic ccall, cwrap, Embind.

Of course, there are drawbacks to doing this, as C++ and JS are separate designs that require a lot of repetitive functions.

The GC problem

Problem description

JS and Python both have GC. Python has played a great role in MegEngine, taking away Tensor when it’s no longer used, which is efficient, but in JS it’s more complicated. Although JS does have GC, JS is much more primitive than Python’s aggressive recycling strategy, perhaps due to the usage scenarios of the browser or JS design philosophy. There is no way to determine if or when a variable is reclaimed, or even to execute a callback function when a variable is reclaimed.

The solution

To solve this problem, I had to implement a naive markup method to recycle variables outside the Scope to avoid running out of memory. However, this naive approach is still too simple, and while it does avoid memory overflow, it is still not very efficient.

On finalizers

In the new JS standard, we have added a mechanism that allows us to call a callback function (Finalizer) when a variable is collected by GC to handle some resources. The Tensor data is actually stored in WebAssembly. JS GC doesn’t monitor memory usage in WASM. This means that even if the MEMORY in WASM is full, the GC will not reclaim it because the JS side of the memory is relatively small.

For these two reasons, Finalizer is not a good choice.

P.S. Many browsers don’t support Finalizers yet.

Performance issues

Problem description

As mentioned earlier, many things were sacrificed to successfully compile MegEngine into WebAssembly, including high performance operators, and while the framework worked, it wasn’t efficient enough for normal use. The reason for the problem is simple, there is no optimization for Web platforms in MegEngine, so to solve this problem I had to implement a set of operators for the Web myself.

The solution

There aren’t many BLAS optimized for the Web. Google’s XNNPACK is based on Pytorch’s QNNPACK, which is also used in Tensorflow.js, so I’m going to include XNNPACK here. However, due to the limitations of XNNPACK, not all operators were added, but the speed was improved.

What happens after Megengine. Js?

After three months of development, I got to know MegEngine better and better and wanted to be a part of the community. Megengine. Js has basic functionality, but it’s still a long way from being a complete framework, and there’s a lot more work to be done.

Further improve the various modules

A qualified deep learning framework should have comprehensive operator support and module support. At present, MegEngine. Js supports few operators and modules, and more practical operators need to be added in order to promote the framework further.

Further improve performance

Performance improvements are never enough, and in such a fickle age, running speed is an indicator that cannot be ignored. While the addition of XNNPACK has increased the speed, it’s not enough, not only because of the lack of operator support, but also because there should be more room for improvement.

Further optimize the framework

Don’t over-optimize, but don’t let the code become dead water, and in due course (once the necessary functional modules have been completed) it may be necessary to make Megengine. Js easier to use, and consider more boundary cases.

Extending reading

Author blogs on the Web 】 deep learning | Avalon

Use the MegEngine. Js tutorial in the applet

Welcome more developers to join the MegEngine community, and there is a participation tutorial and task list for beginners:

Open source Deep learning framework project participation north – includes easy-to-use task lists

MegEngine Technical Exchange Group, QQ group number: 1029741705

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Run the deep learning framework model on the Web – MegEngine

Project information

Scheme described

Why is Megengine. Js needed?

Demand for on-end computing increases

The demand on the Web side increases

What is the architecture of Megengine. Js?

The architecture of most deep learning frameworks

MegEngine, for example

Design idea of MegEngine. Js

What is the status of Megengine. Js?

From a framework point of view

From the completion of the task

Resolve performance bottlenecks

Main Achievements Exhibition

What were the problems encountered in implementing Megengine. Js?

Compilation problems

Interaction problem

The GC problem

Performance issues

What happens after Megengine. Js?

Further improve the various modules

Further improve performance

Further optimize the framework

Extending reading

Run the deep learning framework model on the Web – MegEngine

Project information

Scheme described

Why is Megengine. Js needed?

Demand for on-end computing increases

The demand on the Web side increases

What is the architecture of Megengine. Js?

The architecture of most deep learning frameworks

MegEngine, for example

Design idea of MegEngine. Js

What is the status of Megengine. Js?

From a framework point of view

From the completion of the task

Resolve performance bottlenecks

Main Achievements Exhibition

What were the problems encountered in implementing Megengine. Js?

Compilation problems

Interaction problem

The GC problem

Performance issues

What happens after Megengine. Js?

Further improve the various modules

Further improve performance

Further optimize the framework

Extending reading

Related Posts

Kaiyuan Big Data Weekly – 24th issue

Image segmentation based on MATLAB Snake model image segmentation

China Merchants Bank’s AI ambitions: Transforming a fintech company?