background

In the process of front-end development, ICONS in the design draft pictures need to be restored. Most of the time, ICONS in the design draft do not have the corresponding Type field. If the naked eye looks for hundreds of ICONS, the user experience is very poor.

Therefore, the author submitted a Pull Request in Ant Design open source project last year. The PR contributed a screenshot search Icon function based on deep learning technology. Users can directly click, drag or paste the screenshot of the Icon in the Design draft or any picture and upload it. You can search for the ICONS with the highest matching degree and the corresponding matching degree. And all the identification is done on the front end!

The effect is as follows:


You can also go to the official website for direct experience:

Ant. The design/components /…

So how does this technology work? This article will reveal gradually:

  • Introduction to Deep Learning
  • Samples to generate
  • Model training
  • Model compression and transformation
  • Tensorflow. Js recognition

Introduction to Deep Learning

As mentioned earlier, this feature is based on deep learning. So what is deep learning? Deep learning is a type of machine learning, which can be simply understood as:

Machine learning is the study of computer algorithms that can be automatically improved through “experience”.


The key word is experience. Humans have been using experience to solve problems for a long time. As far back as the Middle Ages, for example, someone determined the average foot length of all men by measuring the average foot length of 16 men.


For another example, given a lot of data about height and weight, and given a person’s height, can you estimate their weight?


Of course you can! You can figure out the formula above, the a’s and the b’s of y is equal to ax plus b, and then you can just do that, right? Simple elementary school math. In machine learning, a is called weight and B is called bias. This is already machine learning, and more specifically linear regression.

Since machines can learn the patterns of numbers, if we turn pictures/speech/text into numbers and let computers learn them, can computers recognize their patterns? Of course you can! But the underlying model is much more complicated.

Image classification


Voice assistant


We use a deep learning model called convolutional neural network to classify icon screenshots.

Whether it’s simple linear regression or complex deep learning, you learn from experience. And this experience, in machine learning, is called a sample. So, first we need to generate samples for machine learning.

Samples to generate

In this icon classification task, the sample includes two parts:

  • The picture
  • The label corresponding to the image

A tag is a category name for an image, so if you want to identify a cat or a dog in an image, then a cat and a dog are tags.


Research shows that the larger the sample, the better the deep learning model. Therefore, we adopted the method of sample page + Puppeteer + Faas to quickly generate tens of thousands of icon pictures and corresponding labels. How do you do that?

  1. Preparation of sample page: a new front page, this page only renders one Antd icon, but this icon may be any of more than 300 Antd ICONS, not only that, even the icon size, color, location, etc., are randomly rendered.
  2. Using Puppeteer loop screenshots: Once the sample page is written, we open the page with Puppeteer (a headless browser) and automatically loop through the refresh – screenshot operation, generating tens of thousands of images.
  3. Faas concurrency: since it is too slow to generate tens of thousands of pictures on PC, we hope to create screenshots concurrently on 100 machines, so we use The Function calculation of Ali Cloud (Faas), and open 100 instances for concurrent screenshots. The measured results can generate 20,000 pictures per minute.

So you have a sample.


Model training

Once you have the sample, you can start your model training. We use the framework Tensorflow. There is an example of image classification based on transfer learning on the official website, which can be downloaded directly and run with the parameters specified as the sample we just generated.

Github.com/tensorflow/…

On the PC can train, although the speed is not fast, but eat a lunch is about the same!

However, it is worth mentioning that There is PAI service on Aliyun, which has a ready-made image classification algorithm, and also provides GPU for accelerated training. Although THE author did not use PAI image classification algorithm, but the Tensorflow code deployed to PAI training, fast!

Model transformation and compression

Once the model is trained, it can be recognized directly, but since it is Python code, it must be deployed to the server before it can be used by anyone. This has many disadvantages:

  • Server costs: The deployment model requires a server, and Ant Design is an open source project, so we don’t want to incur any linear increase in costs.
  • Identification speed: the server is centralized, far away from foreign users, the use of speed will inevitably be affected.
  • Stability: Ant Design is used by hundreds of thousands of developers. If there is a problem with the server, the stability will be a concern. The influence is too wide, and you may not sleep well at night.
  • Security: The Ant Design site is statically open, without any authentication or authorization, and there are bound to be some security issues if the interface is open.

With this in mind, we are going to turn our model into a tensorflow.js model that users can download into their browsers for identification. This has many benefits:

  • Edge computing: Every user has a computer with A GPU on the computer. After downloading our model to the browser, we can use the GPU computing power of a large number of users, saving the server cost and not worrying about various server attacks and server stability problems.
  • Fast recognition: Since the model is downloaded to the user’s browser, the recognition process is almost real-time, with no network transmission.

Tfjs-converter is used for model conversion and compression:

Github.com/tensorflow/…

We use Mobilenet for transfer learning, the original model is 16 M, after compression into 3M or so, published to jsDelivr CDN, global acceleration, permanent effective.

Tensorflow. Js recognition

Now that you have the model, all you need to do is write some tensorflow.js code to identify it.

First, load the model file:

Const MODEL_PATH = 'https://cdn.jsdelivr.net/gh/lewis617/[email protected]/model/model.json'. model = await tfconv.loadGraphModel(MODEL_PATH);Copy the code

Then, put the screenshot of your icon into tensor:

Tensor is a data structure, it’s a lot like multidimensional arrays, and in Tensorflow, you put in and you put out your model, so you have to translate your data into a tensor, either for training or for identification.


Tensor const img = tf.browser.frompixels (imgEl).tofloat (); Const offset = tf. The scalar (127.5); // Normalize an image from [0, 255] to [-1, 1]. Const normalized = img.sub(offset).div(offset); // Let resized = normalized; if (img.shape[0] ! == IMAGE_SIZE || img.shape[1] ! == IMAGE_SIZE) { const alignCorners = true; resized = tf.image.resizeBilinear( normalized, [IMAGE_SIZE, IMAGE_SIZE], alignCorners, ); } // Change shape of tensor to fit model const batched = resized. 0 ([-1, IMAGE_SIZE, IMAGE_SIZE, 3]); 0Copy the code

Then, identify:

pred = model.predict(batched).squeeze().arraySync(); Predictions = findIndicesOfMax(Pred, 5). Map (I => ({className: ICON_CLASSES[I], score: score) pred[i], }));Copy the code

You can get the final result!

Complete code:

Github.com/lewis617/an…

Recruitment of author’s Team

The above technologies are only the tip of the iceberg of front-end intelligent work. Ali CCO front-end team, where the author works, has also explored and practiced more fields related to front-end intelligent. We welcome interested students to join us at [email protected].