Literature/Ali Tao Department
F(x) Team
– 
Lack of month


preface

In order to help you better learn Pipcook and machine learning, we have prepared a series of practical tutorials, which will respectively explain how to use Pipcook in our daily development from front-end component identification, image style transfer, AI poetry, and automatic blog classification. If you need to know Pipcook 1.0, Read the article AI ❤️ JavaScript, Pipcook 1.0.

background

Have you ever been in a front-end business where you have some image and you want an automatic way to recognize what the image is, whether it’s a button, a navigation bar, or a table? This is a typical image sorting task.


The task of predicting image categories is called image classification. The purpose of training image classification model is to recognize all kinds of images


This identification can be useful for code generation, automated testing, and so on.


For example, if we have a sketch design, the whole sketch is made up of different components. We can go through the entire layer of the design, and for each layer, we can use the image classification model to identify what component each layer is. Then we can replace the original layer of the design with the front-end component, which generates the front-end code, and a background page is done.


For example, in the scenario of automated testing, we need the ability to identify the type of each layer. For those identified as buttons, we can automatically click to see if the button works. For those identified as feeds streams, we can automatically track the loading speed of each time to monitor performance and so on.


Sample scenario

For example, in the scenario where the form is automatically generated in the middle and background, we need to identify which components are line charts and which are bar charts, pie charts or ring charts, as shown below:




FIG. 1 Line chart


Figure 2 pie chart


FIG. 3 Ring diagram


FIG. 4 Bar chart


After the training is done, the model will eventually give us the prediction we want for each image. For example, when we enter the line chart in Figure 1, the model gives a prediction similar to the following

[[0.1, 0.9, 0.05, 0.05]]Copy the code

At the same time, we will generate LabelMap during training. Labelmap is a mapping between a serial number and the actual type. This is mainly generated because our classification name in the real world is text, but before entering the model, we need to convert the text into numbers. Here is a LabelMap

{
    "column": 0."line": 1,
  "pie": 2."ring": 3}Copy the code

First of all, why is the prediction a two-dimensional array? First, the model allows you to predict multiple images at once, so how many images you predict will have several elements in the outermost layer. For each image, the model will also provide an array, which describes the possibility of each classification. As shown in LabelMap, the classification is arranged in the order of column, line, pie and ring, so the corresponding prediction results of the model are We can see that line has the highest confidence of 0.9, so this image is predicted to be a line graph, which means the prediction is correct.

Data preparation

For image categorization tasks like this, we need to organize our data sets in a certain format


We need to divide our data set into train, Validation and test in a certain proportion. Among them, the training set is mainly used to train the model, while the validation set and test set are used to evaluate the model. The validation set is mainly used to evaluate the model in the training process, so as to conveniently check the over-fitting and convergence of the model. The test set is used to evaluate the model as a whole after all the training.


In the training/verification/test set, we will organize the data by category. For example, we now have two categories, line and ring, so we can create two folders for these two category names and place images under the corresponding folders. The overall directory structure is:

  • train
  • ring
  • xx.jpg
  • .
  • line
  • xxjpg
  • .
  • column
  • .
  • pie
  • .
  • validation
  • ring
  • xx.jpg
  • .
  • line
  • xx.jpg
  • .
  • column
  • .
  • pie
  • .
  • test
  • ring
  • xx.jpg
  • .
  • line
  • xx.jpg
  • .
  • column
  • .
  • pie
  • .

We have prepared such a data set, you can download down to see: download address

Start training

After the data set is ready, we can start the training. Pipcook can be very convenient for the training of image classification. You only need to build the following pipeline,


{
  "plugins": {
    "dataCollect": {
      "package": "@pipcook/plugins-image-classification-data-collect"."params": {
        "url": "http://ai-sample.oss-cn-hangzhou.aliyuncs.com/pipcook/datasets/component-recognition-image-classification/component-rec ognition-classification.zip"}},"dataAccess": {
      "package": "@pipcook/plugins-pascalvoc-data-access"
    },
    "dataProcess": {
      "package": "@pipcook/plugins-image-data-process"."params": {
        "resize": [224, 224]}}"modelDefine": {
      "package": "@pipcook/plugins-tensorflow-mobilenet-model-define"."params": {
        "batchSize": 8,
        "freeze": false}},"modelTrain": {
      "package": "@pipcook/plugins-image-classification-tensorflow-model-train"."params": {
        "epochs": 15}},"modelEvaluate": {
      "package": "@pipcook/plugins-image-classification-tensorflow-model-evaluate"}}}Copy the code

From the above plugins, we can see that they are used separately:


  1. The plugins-image-disintegration-data-collect plugin is used to download data sets that fit the image classification described above. Mainly, we need to provide url parameters, and we provide the address of the data set prepared above
  2. @pipcook/plugins-pascalvoc-data-access Now that we have downloaded the data set, we need to plug the data set into pipcOOK format for subsequent access to the model
  3. In the process of image classification, we need to perform some necessary operations on the original data. For example, image classification requires that all images are of the same size, so we use this plug-in to resize the images to the same size
  4. Mobilenet model is used for training. This model is generally used to train moderately complex data. For more complex data sets, It is recommended that you use @pipcook/plugins-tensorflow-resnet-model-define
  5. We use this plug-in for training, which is a general plug-in for image classification based on Tensorflow, and has nothing to do with the specific model selected in the last stage
  6. We use this plug-in to evaluate the model. Model evaluation refers to the performance of the model on the test set. This is a generic plugin for image classification based on TensorFlow, independent of the model selected in the previous phase


Mobilenet is a lightweight model that can be trained on a CPU. If using Resnet, due to the size of the model itself, it is recommended to run on a GPU machine with A CUDA environment before executing this Pipeline


CUDA, short for Compute Unified Device Architecture, It is a parallel computing platform and programming model created by NVIDIA based on their Graphics Processing Units (GPUs).

With CUDA, GPUs can easily be used for general-purpose computing (sort of like numerical computation in the CPU, etc.). Prior to CUDA, GPUs were generally used only for graphical rendering (e.g., through OpenGL, DirectX).

pipcook run image-classification.json --verbose --tunaCopy the code

Models tend to converge at 10-20 epochs, depending on the complexity of your dataset, of course. Model convergence means that the loss (loss value) is sufficiently low and the accuracy is sufficiently high, in which case the changes in the performance of the model from each epoch are no longer significant.


The specific logs are as follows:

Epoch 1/15 187/187 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 12 65 ms/s step - loss: 0.0604 accuracy: 0.9823 - val_loss: 8.8755 - val_accuracy: 0.4112 Epoch 2/15 187/187 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 11 61 ms/s step - loss: 0.0056-accuracy: 0.9993 - val_loss: 5.5883 - val_accuracy: 0.4925 Epoch 3/15 187/187 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 11 s 59 ms/step - loss: 0.0107 accuracy: 0.9980-val_loss: 0.3830-val_accuracy: 0.8388... 187/187 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 11 61 ms/s step - loss: 3.0090 e-05 - accuracy: 1.0000 - val_loss: E-08 val_accuracy - 1.5646:1.0000 Epoch 14/15 187/187 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 11 61 ms/s step - loss: 5.1657E-05-accuracy: 1.0000-val_loss: 1.9073E-08-val_accuracy: 1.0000 Epoch 15/15 187/187 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 11 61 ms/s step - loss: 5.1657 e-05 - accuracy: 1.0000 - VAL_loss: 1.9073E-08-VAL_accuracy: 1.0000Copy the code

After training, output will be generated in the current directory, which is a brand new NPM package, so we first install dependencies:

cd output
BOA_TUNA=1 npm installCopy the code

After setting up the environment (set BOA_TUNA to use the Tsinghua mirror source to download Python-related dependencies), we can start to predict:

const predict = require('./output');
(async () => {
  const v1 = await predict('./test.jpg'); console.log(v1); // [[0.1, 0.9, 0.05, 0.05]]})();Copy the code

Note that the predictions we give are probabilities for each category, and you can manipulate the probabilities to get the results you want.

conclusion

In this way, the component identification task based on the image classification model is completed. After completing the pipeline in our example, if you are interested in this type of task, you can also start preparing your own data set for training. We have described the format of the data set in detail in the chapter of data preparation before, you just need to follow the way of the file directory can be very easy to prepare the data in line with our picture classification pipeline.


Believe that readers here have learned to how to classify a front-end component in an image, can be applied to some examples of relatively special, but in the component recognition of scenario, often a picture will include different multiple components, then this time classification model may be unable to cope with the demand, so in the next article, We’ll show you how to use Pipcook to identify multiple components in a design diagram.