This paper introduces flying Pig interactive Double eleven egg “Flying Pig find a find”, a pure client game based on AI image recognition

Taking this game as a primer, trying to discuss:

  1. How to use TensorFlow to quickly train a personalized image recognition model
  2. How to use tensorflow.js to load and use models in the front end, and how to integrate all experiences into the React.js framework
  3. Some potholes to watch out for
  4. How far can the front-end students go with tensorflow.js? What are the chances of landing business?
  5. And what technologies can it be combined with? What the future looks like

You can scan this QR code by Alipay or Feizu to try it:

Machine Learning 101

Machine learning is an algorithm, not a mystery. The industry even avoids using the term artificial intelligence (AI) because it implies intelligence, but machine learning is not yet intelligent. Babies can learn without rules or goals, and it takes 5-10 cats to recognize what a cat is. Machine learning, by contrast, is unwieldy, requiring millions of images to be fed, and the results are highly sensitive to the data sources.

How does the AI recognize it

  • A simplified model is a three-way simultaneous equation with a constant solution, except that he has to solve millions of yuan equations.
  • Neural network: Each element is a “parameter”, while simultaneous equations are called a layer of “neural network”. Common simple models may have 3-6 layers, and complex models even have 50+ layers.
  • Training: the process of solving these equations. The computer gradually approximates the correct “parameters” through the approach algorithm. After thousands of training sessions, these parameters are enough to approximate the correct answer, and an AI model is trained. Feed another image pixel data and it will answer 0.99, which is 99% cat.

The significance of Client Side Machine Learning

1. There are two types of end-to-end machine learning

  • Use the trained model on the end – side
  • Model training was carried out at the end side

The latter is not mature yet, and the training has high requirements on the model, such as the need to call GPU acceleration training through WebGL.

2. Pros and cons of end-to-end machine learning

  • advantages

    • Development cost, from training, development to release all the front-end done, iteration efficiency has been greatly improved
    • No need for network local playability, which also ensures some privacy applications
    • After loading resources at one time, there is no need to repeatedly request the server to transfer data and obtain AI analysis results, which is very suitable for high-frequency AI analysis scenarios such as image recognition
  • Disadvantages: Model size is limited, the volume of fine non-customized mobile model is often hundreds of meters, which is not suitable for directly putting H5

How to simply run an existing model using tensorflow.js

Tensorflow.js incorporates four apis:

  • CoreThe lowest level API, which is used to create the model, is the previous one

    deeplearn.js

    .

  • Layers a high-level API similar to Keras
  • Data An API for loading and processing Data, similar to tf.data.
  • Converter is a loader for introducing TensorFlow models into JS

[A]. Use existing models. Including how to load and the high-level APIS provided by each model b. Retraining existing models c. Fully customize and create brand new models

This paper will involve A and B, focusing on A, and will only analyze B at the principle level.

1. Obtain pictures by photographing

Camera call (WebAR) (new WebAR template, using a new version of WebAR) The following ordinary H5 page call camera is unable to use due to security restrictions

if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
  const stream = await navigator.mediaDevices.getUserMedia({
    'audio': false.'video': {facingMode: 'environment'}});Copy the code

We used Alipay’s webar. Js to force open the camera. Webar provides the getWebCameraAsync method to get the camera

. <canvas ref={node => (this.glCanvas = node)} /> ... InitCamera = async () => {// Get the webGL context const CVS = this.glcanvas; const gl = cvs.getContext('webgl'); This. camera = await getWebCameraAsync({facing: CAMERA_FACING_BACK}); // Open camera await this.camera.openAsync(); / / draw on canvas camera projection enclosing displayTarget = this. Camera. CreateDisplayTarget ('ar-container', { autoResize: 0, gl });
  this.displayTarget.loop();
};
Copy the code

Webar’s camera provides pause, resume, Snapshot, close, and other methods for starting/pausing/capturing and destroying. See the documentation for

2. AI processing and prediction of picture content:

  • Front-end loading model:

    Converter using TFJS. The 0.x and 1.x versions are different from the Save format and call API, the new version reference document Save and Load Models, the old version as follows:

import { loadFrozenModel } from '@tensorflow/tfjs-converter';

const WEIGHT_MANIFEST_FILE_URL =
  'https://gw.alipayobjects.com/os/fliggy-play/181301-3/weights_manifest.json';
const MODEL_FILE_URL =
  'https://gw.alipayobjects.com/os/fliggy-play/181301-3/tensorflowjs_model.pb';
this.model = await loadFrozenModel(MODEL_FILE_URL, WEIGHT_MANIFEST_FILE_URL);
Copy the code

And the prediction is very simple:

This.model. execute(data,'final_result'); Dispose () in didMount using this.model.dispose(); // Dispose () in didMount using this.model.dispose();Copy the code

The old VERSION of TFJS needs to load three model files:

  1. Tensorflowjs_model.pb (model file)
  2. Json (weight manifest)
  3. Group1-shard1of1 (binary weight table)

Group1-shard1of1 is not directly referenced, but must be placed in the same path. Otherwise, an error occurs.

  • Use models for processing and predictionThe API is also slightly different between versions 0.x and 1.x, such as getting pixel information from images, which is available in 0.x
    fromPixelsIn 1. X is
    browser.fromPixels

Capture and process pixel data from screenshots:

import * as tfc from '@tensorflow/tfjs-core'; . Const result = TFC. tidy(() => {// Use webar camera capture const img = await this.displayTarget.snapshotImageDataURLAsync({type: 'imageData'}); Const pixels = tfC.browser.frompixels (img); Const centerHeight = pixels. Shape [0] / 2; const beginHeight = centerHeight - 112; const centerWidth = pixels.shape[1] / 2; const beginWidth = centerWidth - 112; const pixelsCropped = pixels.slice( [beginHeight, beginWidth, 0], [224, 224, 3] ); // Input normalized 224*224 pixel information into the model to obtain model prediction resultsreturn predict(pixelsCropped);
});
Copy the code

See GITLab for the specific prediction method of Predict, which is not expanded here. At this point, you can successfully load the existing model, take a screenshot from the camera, and return the prediction (which is an array containing the prediction and the prediction certainty) through analysis.

3. Attention to optimization

  • use
    tfc.tidyWrap model prediction, used to clean up memory
  • The model first runs a set of empty data warm-up
    tfc.zeros([VIDEO_PIXELS, VIDEO_PIXELS, 3])
  • Note that in
    willUnmountDestroy the model and camera, clean out the memory
  • Load optimization

Take advantage of girl and lazy syntax of React, we code splitting main Game components (and all AI model loading preheats and camera calls), and make sure Game components preload immediately after Menu loading. Achieve a satisfying game loading experience

const Menu = lazy(() => import('./Menu'));
const Playground = lazy(() => import('./Playground'));

function Game() {
  const [isMenu, set] = useState(true);

  useEffect(() => {
    import('./Playground'); } []); .return (
    <div>
      {isMenu ? (
        <Suspense fallback={<Loading show />}>
          <Menu toGame={toGame} />
        </Suspense>
      ) : (
        <Suspense fallback={<Loading show />}>
          <div className="game-page">
            <Playground backToMenu={backToMenu} />
          </div>
        </Suspense>
      )}
    </div>
  );
};
Copy the code

How to retrain a model

1. What is retrain

Image recognition model has millions of parameters, and training an image recognition model from scratch requires massive data and hundreds of GPU training hours. As students who want to use the front end at a low cost, the ab initio training model is not desirable. But existing models probably won’t meet the customization requirements you need to identify “flying pigs.” At this time, the best way is retrain, specifically transfer learning, which means using the results of the existing training model and adding a layer of neural network to convert them into customized training results.

The result is certainly not as good as starting from scratch, but the precision of this simple approach is high enough to make it an excellent input-output solution. The Retrain is running on my MAC Pro without GPU acceleration and can train 3000 rounds in 30 minutes or so (the first time will be time-consuming).

2. How to retrain

We used the ready-made image Retrain model on HUB. For specific training details, you can refer to Google’s Emoji-Scavenger – Hunt warehouse README, and put the collected pictures into the training folder train/data according to the following classification. Install environments and dependencies such as Python, and then run the Hub’s Retrain model.

Data └ ─ ─ images ├ ─ ─ the cat │ ├ ─ ─ cat1. JPG │ ├ ─ ─ cat2. JPG │ └ ─ ─... ├── Dog1.jpg ├─ dog2.jpg ├─...Copy the code

Emoji-scavenger Hunt even wrote a Dockerfile containing the entire process, just installing the Docker to run it

$ cd training
$ docker build -t model-builder .
$ docker run -v /path/to/data:/data -it model-builder
Copy the code

After 4000 training sessions, the data/saved_model_web folder is eventually generated, three files are used for tensorflow.js loading, and a.ts containing the names of all the trained objects.

3. How to optimize the training results

  • In addition to the necessary object class, an unknown class is used to train all “noise”, such as various background images. This reduces the possibility of “overtraining”, such as when the model recognises a type of background in an image as a reference to distinguish between objects.

  • Attention should be paid to the selection of materials. The exposure and Angle of photos will affect the final training results. Try to be close to the real scene

  • When using imageOptim to compress the image, the orientation information will be lost, resulting in the image for recognition is horizontal, this small and subtle problem greatly affects the accuracy of recognition. Computer vision models don’t work well, and you may be fooled by the camera’s Exif information. You can use the batch processing tool to turn the image to the correct orientation before compression.

Combine business outlook

Image recognition is just a starting point, there are tons of models on the Hub that developers have already trained for you to use directly or retrain. Currently commonly used models are classified as:

  1. Image recognition (what is an image)
  2. Object recognition (is there something in the image)
  3. Posture recognition (the posture of the person in the image)
  4. Speech recognition (high precision)
  5. Text classification (from meaning analysis to emotion judgment)

tensorflow

In fact, AI and AR are a very natural combination. Image recognition, object recognition and posture recognition interpret the real world into meaningful data through camera screenshots, and AR enhances the experience of the real world through data. There are a lot of interesting things to do here.

The principle of Easter eggs is as follows

It can be seen that AI has well completed the step of obtaining data from the reality, that is, providing the front end with a contact with the real world, then further use this contact + AR technology to “augmented reality”, the imagination is very large

For example, taobao small program team and L ‘Oreal group deep cooperation modiface lipstick test is a unique example. Under the theme of flying pigs, how to combine with the gameplay of various destinations is the direction of the author’s future thinking.

References:

  1. A look at how we built the Emoji Scavenger Hunt using TensorFlow.js
  2. How to Retrain an Image Classifier for New Categories
  3. The computer vision model doesn’t look good, and you might be fooled by the camera’s Exif information
  4. Webar document
  5. Save and load models