Use Emscripten to compile the WASM version of OpenCC and convert it to traditional Chinese in the browser

  • Github project location
  • Playground and document


OpenCC is great, but unfortunately we have to open a service to use it. I had always wanted to run it directly in the browser, converting text directly from a page.

I was surprised by the maturity of wASM-related technologies when I discovered that Tesseract.js was compiled using Emscripten. The idea was to build a WASM version of OpenCC, as well as explore Emscripten.

This project added and modified OpenCC and compiled it using Emscripten. It has the following features in the ability of OpenCC to convert Chinese simplified and traditional characters:

  • It runs directly in the browser environment.
  • Running in Node, Eletron eliminates addon compilation, avoiding complex Addon deployments. Theoretically it should work with React Native and Web workers as well (untested).
  • The function of dictionary data loading and text conversion is separated, only necessary dictionary data is loaded in the browser, and custom data loading mode is allowed, which is convenient to load data from THE CDN.

results

You can go directly to the document page and try converting it in your browser:

Wasm – Opencc open Chinese translation wASM version, can run directly in the browser.

Edit descriptionoyyd.github.io

After compiling, relevant file sizes are as follows (excluding dictionary files) :

  • Opencc-asm.js (655kB, 164kB after gzip)
  • Opencc-asm.js. mem (25kB, 8kB after gzip)

Although the code is bulky, it can be used effectively in a Web environment with proper caching. Of course, you can also use NPM to install and run directly on the Node environment, and no memory problems have been found.


Some feelings about WASM and Emscripten

One of my biggest concerns with WASM before was whether Emscripten/WebAssembly was mature enough now. If you’re looking for something that’s out of the box and has a full documentation community (which is pretty much complete, except for the fact that it’s hard to find the corresponding document when you run into a problem) and doesn’t run into too many problems, I’m thinking “no.” You certainly need to know about C/CPP and build tools, and I ran into a lot of problems, especially with memory operations, where Emscripten threw an error number without any other error messages, making it very difficult to locate. There may be tools like GDB or LLDB to help solve these problems, but I’m not aware of them.

But if you understand that the problems WebAssembly itself is trying to solve aren’t easy, and you’re willing to put in the time to face them, I think you’ll come out of a project feeling more mature than you thought before you started using Emscripten. After I developed this project, I have not tested any memory related problems (of course, it is the JS runtime environment itself, which is probably not worth mentioning); After solving or avoiding a few of the problems we encountered, most of the rest of the code is fine, and the rest is just wrapped calls to the pure JS realm.

Also, I didn’t intend to provide the Node version of the code at first, because @byvoid already made the Addon version itself. But then, mindful of the problems I had with developing Addon, and aware of the maintenance and deployment challenges of Addon, I built a version to run on Node. So I think wASM still has an advantage in the Node environment, even with addon calling c/ CPP. Because you can build a version that doesn’t need to be compiled and runs in a browser in less time, without having to understand v8.h, Node. h, Nan, and just learn the much simpler Embind.

Looking back, the biggest advantage of WebAssembly, as stated in its documentation, is that you can run LLVM generated projects directly in JS, just like Node Addon. If you are using technical tools solely for the performance of your application, you should be careful.