Simple, ARCore development principle

Author: Li Chaoqian, senior manager of live RESEARCH and development of “Learn from Who”

In fact, I have been paying attention to ARCore for a long time, but I haven’t spare time to write a summary. At the invitation of a friend, we will have a good talk about ARCore today.

I won’t go into the history of ARCore and its competition with Apple’s ARKit, but you can find a lot of information online. But there’s really not much on the web about ARCore in depth.

There are two main purposes of this paper. One is to introduce the basic concepts of ARCore to you. Understanding these concepts plays a key role in your subsequent in-depth study of ARCore. The second is an in-depth analysis of the working mechanism of ARCore, so that you can understand ARCore more easily.

In addition, the basic concepts of ARCore and ARKit are very close, and once you understand one, you basically know the other.

Basic concepts of ARCore

ARCore’s job is to do two things: first track the phone’s movements and then build its understanding of the real world.

ARCore’s motion tracking technology identifies feature points through Camera and tracks how these feature points move over time. Using movement data from these features and information read from the phone’s inertial sensors, ARCore calculates the position and direction of the phone’s movement, which it calls attitude.

In addition to identifying these features, ARCore can also detect flat information like floors and desktops, as well as the intensity of light in a place. This information allows ARCore to build the real world it understands. Once you’ve built such a model, you can put some virtual content on it.

How does ARCore do that? It uses three key technologies to integrate virtual content with the real world:

Motion tracking
Environment to understand
The light assessment

Motion tracking

ARCore can know the phone’s position and orientation (posture) relative to the real world as it moves.

As the phone moves in the real world, ARCore uses a process called concurrent ranging and mapping to learn where the phone is relative to the world around it.

ARCore can detect different visual features of images captured by Camera, which is called feature points. It uses these points to calculate its change in position. Over time, using visual information and inertial measurements from the IMU device, ARCore can estimate the Camera’s attitude (position and orientation) relative to the real world.

By aligning the rendered 3D virtual content with the pose of the physical Camera, the developer can render the virtual content from the correct Angle. By rendering the image of the virtual object on top of the image taken from the Camera, it looks as if the virtual content is part of the real world.

Environment to understand

ARCore allows the phone to detect the location and size of a piece of horizontal surface. Be like ground, desk, bookshelf and so on. This allows the virtual object to be placed on the detected level.

How does it do it? ARCore continuously improves its understanding of real-world environments by detecting feature points and planes.

In addition to finding clusters of feature points on common horizontal surfaces, such as desktops, ARCore can determine the boundaries of each plane and provide this information to your application. This allows developers to use this information and place virtual objects on a flat surface.

Because ARCore uses feature points to detect planes, flat surfaces without textures (such as white desktops) may not be correctly detected.

The light assessment

ARCore allows the phone to estimate the light intensity of the current environment, which makes the virtual physics display more realistic in the real world.

The user interaction

ARCore uses Hit testing to obtain (X, Y) coordinates corresponding to the mobile phone screen (such as by clicking the screen and other interactive ways), and projects them into the 3D coordinate system of the Camera, and returns all planes and feature points that intersect with the ray of the hit point, as well as the attitude of the intersection point in the world coordinate system. This enables users to interact with objects in the ARCore environment.

Anchor points and tracking

ARCore can change its understanding of its position and environment to adjust its posture. If we want to place a virtual object in the ARCore environment, we first need to determine an anchor point to ensure that ARCore can keep track of the object’s position over time. Typically, an anchor is created based on the posture returned by the hit test.

Posture changes are particularly critical because only with posture can ARCore update the position of environment objects (like planes and feature points) over time. ARCore considers planes and points to be special types of objects that can be tracked. You can anchor virtual objects to these traceable objects to ensure a stable relationship between virtual objects and traceable objects as the device moves. It is as if you placed a virtual vase on the desktop, and if ARCore later adjusts the tabletop related posture, the vase will remain on the desktop.

Introduction to the ARCore core class

Session

Com. Google. Ar. Core. The Session, the Session state and Session handling life cycle management ar system. This class is the main entry point to the ARCore API. This class allows users to create sessions, configure sessions, start/stop sessions, and most importantly receive video frames to allow access to Camera images and device poses.

Config

Com. Google. Ar. Core. Config class, used to hold the Session Settings.

Frame

Com.google.ar.core.frame class, which gets status information and updates the AR system by calling the Update () method.

HitResult

Com. Google. Ar. Core. HitResult class, the class defines the rays hit and estimate the intersection between the real geometry of the world.

Point

The com.google.ar.core.Point class, which represents the spatial points that ARCore is tracking. It is the result returned when an anchor point is created (calling the createAnchor method) or when hit detection is performed (calling the hitTest method).

PointCloud

Com. Google. Ar. Core. PointCloud class, it contains a set of observations and confidence value of 3 d.

Plane

The com.google.ar.core.Plane class, which describes the latest information about the Plane surface of the real world.

Anchor

Com. Google. Ar. Core. The Anchor class, describe the fixed position and direction in the real world. In order to maintain a fixed location in physical space, the digital description of this location will be updated as ARCore’s understanding of space continues to improve.

Pose

Com.google.ar.core.pose class.Pose represents a position invariant transformation from one coordinate space to another. In all ARCore apis, postures always describe transitions from the object’s local coordinate space to the world coordinate space.

As ARCore’s understanding of its environment changes, it adjusts coordinate system patterns to match the real world. At this point, the positions (coordinates) of the Camera and anchor points may change significantly so that the objects they represent handle the proper positions.

This means that each frame should be considered in a completely independent world coordinate space. Anchors and Camera coordinates should not be used outside of the render frame. If a location is considered outside the scope of a single render frame, an anchor should be created or a position relative to existing nearby anchors should be used.

ImageMetadata

Com. Google. Ar. Core. ImageMetadata class that provides the result of the Camera image capture metadata access.

LightEstimate

Com. Google. Ar. Core. LightEstimate save information about real scene illumination estimation. Obtained by getLightEstimate().

The example analysis

Google’s ARCore SDK includes some sample programs, and with this basic knowledge, it’s easy to understand the flow of the Demo program he wrote.

Create Session and Conig

A good place to create Session and Config is in the onCreate method of your Activity.

mSession = new Session(/*context=*/this);

mDefaultConfig = Config.createDefaultConfig();
if(! mSession.isSupported(mDefaultConfig)) { Toast.makeText(this,"This device does not support AR", Toast.LENGTH_LONG).show();
    finish();
    return;
}
Copy the code

Session: is the management class for ARCore, which is very important. ARCore opens, closes, gets video frames and so on are managed through it.
Config: Stores configuration information, such as plane search mode and illumination mode. So far, this class is relatively simple and doesn’t store much in it.
IsSupported: This method mainly controls SDK versions and models. Currently, only a few Google and Samsung phones are officially available for testing. Other models do not support ARCore, of course, some models can use ARCore through cracked SDK. The Config parameter in this method is not used.

Create GLSurfaceView for AR presentation

In the Demo provided by Google, GLSurfaceView is used for the AR presentation. Those of you who do video development know that Android can use three views for video rendering. Respectively is:

SurfaceView
GLSurfaceView
TextureView

Among them, SurfaceView is the most flexible and efficient, but it is more troublesome to use. GLSurfaceView is much simpler than SurfaceView, just need to implement its Render interface. TextureView is the easiest to use and Android’s window manager does a lot of the work for you, but it’s less flexible.

In order to render efficiently, Google makes extensive use of OpenGL technology in the Demo. Because OpenGL is too big an area of image processing to cover in one or two articles, and it’s not the focus of this article, we won’t cover it in detail here, but you can go online to learn about it.

mSurfaceView = (GLSurfaceView) findViewById(R.id.surfaceview); . mSurfaceView.setPreserveEGLContextOnPause(true);
mSurfaceView.setEGLContextClientVersion(2);
mSurfaceView.setEGLConfigChooser(8, 8, 8, 8, 16, 0); // Alpha used for plane blending.
mSurfaceView.setRenderer(this);     mSurfaceView.setRenderMode(GLSurfaceView.RENDERMODE_CONTINUOUSLY);
Copy the code

This code first creates a GLSurfaceView object from the resource file, and then associates the GLSurfaceView with the EGL context. And make the Activity the callback object of the GLSurfaceView (that is, the Activity implements the interface defined in glSurfaceView.renderer, Hanged onSurfaceCreated, hanged onSurfaceChanged, hanged onDrawFrame, hanged onSurfaceChanged, hanged onDrawFrame, hanged onSurfaceChanged That’s the constant rendering of the GLSurfaceView.

Creating various threads

To understand this section, you need to know how AR works in detail. Let me give you a brief explanation here.

Background show

As anyone who has used AR knows, AR is about putting virtual objects into real situations. So where does this real scene come from? From the Camera of the phone, of course.

We take the video from the Camera as the background of AR. In fact, AR is to put virtual items on the video, but not simply, but after a lot of calculation, find the plane position in the video and then place.

Android video collection is relatively simple, such as live broadcast system, camera to use this technology.

Platform testing

As we said above, AR is real-time video + virtual goods. However, virtual objects cannot be simply put on the video. Instead, each frame in the video is detected first, the plane in the video is found, and the position is determined before the virtual objects are placed on the video. That’s AR 🙂

Point cloud

As we know above, AR= live video + plane + virtual goods. In addition, it should be able to track virtual objects, that is, look at the same object from different angles and get different poses, hence the “point cloud” technology. So what is a point cloud? As the name suggests, figuratively speaking, it’s a bunch of dots, and these are shaped a little bit like clouds. Each point in the point cloud is a feature point, which is obtained by Camera.

Placing virtual items

We found the plane, we had the tracking means, we could place the prepared virtual items on the platform, now is the real AR ha.

Ok, now that we know the basics, how does Google Demo work?

Create a thread

For each of the above points, the Demo starts a thread with the following code:

. // Create the texture and pass it to ARCore session to be filled during update(). mBackgroundRenderer.createOnGlThread(/*context=*/this); mSession.setCameraTextureName(mBackgroundRenderer.getTextureId()); // Prepare the other rendering objects. try { mVirtualObject.createOnGlThread(/*context=*/this,"andy.obj"."andy.png"); MVirtualObject. SetMaterialProperties (0.0 f, f 3.5, 1.0 f, 6.0 f); . } catch (IOException e) { Log.e(TAG,"Failed to read obj file");
}
try {
    mPlaneRenderer.createOnGlThread(/*context=*/this, "trigrid.png");
} catch (IOException e) {
    Log.e(TAG, "Failed to read plane texture"); } mPointCloud.createOnGlThread(/*context=*/this); .Copy the code

The above code first creates a background thread that renders the video from the Camera to the screen as a background. Where does the data come from? Get Camera data via session. update and texture it to the background thread.

For those of you who don’t know anything about texture, think of it as a memory space.

Then start the virtual item thread for drawing virtual items and updating the posture of virtual objects when the Angle changes. Next, create a plane thread to draw the plane. Finally, the start point cloud thread draws feature points.

At this point, the various threads are created. Now let’s talk about how to render.

### Hit detection and rendering

Hit testing

When we want to draw virtual objects to the background, we first do hit detection. The code is as follows:

MotionEvent tap = mQueuedSingleTaps.poll();
if(tap ! = null && frame.getTrackingState() == TrackingState.TRACKING) {for (HitResult hit : frame.hitTest(tap)) {
        // Check if any plane was hit, and if it was hit inside the plane polygon.
        if (hit instanceof PlaneHitResult && ((PlaneHitResult) hit).isHitInPolygon()) {
            // Cap the number of objects created. This avoids overloading both the
            // rendering system and ARCore.
            if (mTouches.size() >= 16) {
                mSession.removeAnchors(Arrays.asList(mTouches.get(0).getAnchor()));
                mTouches.remove(0);
            }
            // Adding an Anchor tells ARCore that it should track this position in
            // space. This anchor will be used in PlaneAttachment to place the 3d model
            // in the correct position relative both to the world and to the plane.
            mTouches.add(new PlaneAttachment(
                ((PlaneHitResult) hit).getPlane(),
                mSession.addAnchor(hit.getHitPose())));

            // Hits are sorted by depth. Consider only closest hit on a plane.
            break; }}}Copy the code

In the example, it looks to see if there is a click event, and the image is processed to the trace state? If so, it is hit tested to see if a plane can be found, and if so, an anchor is created and tied to that platform.

Rendering the background

// Draw background.
mBackgroundRenderer.draw(frame);
Copy the code

The above code pushes the contents of the texture to EGL, and the rendering thread created above takes the data from the EGL context and renders the video to the screen.

Draw the point cloud

mPointCloud.update(frame.getPointCloud());
mPointCloud.draw(frame.getPointCloudPose(), viewmtx, projmtx);
Copy the code

Similarly, with the above code, data can be passed to the point cloud thread to draw the point cloud.

Draw the plane

// Visualize planes.
mPlaneRenderer.drawPlanes(mSession.getAllPlanes(), frame.getPose(), projmtx);
Copy the code

Pass the data to the plane thread through the above code to draw the plane.

Draw virtual items

for (PlaneAttachment planeAttachment : mTouches) {
    if(! planeAttachment.isTracking()) {continue;
    }
    // Get the current combined pose of an Anchor and Plane in world space. The Anchor
    // and Plane poses are updated during calls to session.update() as ARCore refines
    // its estimate of the world.
    planeAttachment.getPose().toMatrix(mAnchorMatrix, 0);

    // Update and draw the model and its shadow.
    mVirtualObject.updateModelMatrix(mAnchorMatrix, scaleFactor);
    mVirtualObjectShadow.updateModelMatrix(mAnchorMatrix, scaleFactor);
}

Copy the code

Finally, all anchor points are traversed, drawing virtual items on each anchor point.

At this point, our analysis of ARCore comes to an end.

summary

ARCore is quite difficult for beginners. Because there are a lot of new concepts that need to be absorbed.

On the other hand, only a few models of ARCore can be tested at present, and these models are not widely used in China, so it is impossible for most people to conduct experiments, which also increases the difficulty of learning.

In addition to the above two points, ARCore uses a lot of OpenGL knowledge. And OpenGL is a deep science, so the difficulty of learning is even steeper.

Through the above three points, it can be said that the threshold of learning ARCore is much more difficult than Apple’s ARKit.

I hope you found this article helpful.

reference

ARCore github

In fact, AR has been widely used in live broadcast scenes, such as sunglasses or other AR animation accessories that can be put on the host’s face in some live broadcasts.

So far, we have shared the basic knowledge of ARKit and ARCore, and we will share further practical cases based on the combination of ARKit, ARCore and live broadcasting in the future. Welcome to continue to pay attention!

Welcome to the RTC developer community to exchange experience with more developers and participate in more technical events.