SLAM in my eyes

I have been in contact with SLAM for 3 years since the very beginning. I have had some contact with BOTH 2D laser SLAM and 3D laser SLAM, and then visual SLAM. Now I will briefly summarize the functions and implementation methods of each module of SLAM, and play a leading role in this series of articles.

1 What is SLAM

Simultaneous Localization and Mapping (SLAM) is a Chinese translation of Simultaneous Localization and Mapping. There are two purposes, one is to locate and the other is to build the map of the surrounding environment. The two are interdependent, and only by solving at the same time can this problem be solved.

Why do we have to do it at the same time?

People can understand the surrounding environment by perceiving through their eyes and touching their hands and other limbs. The human brain automatically builds a model (map building) for the surrounding environment, so that people can reach objects such as a water cup with their eyes closed by feeling. At the same time, people can know that their hands are in front of the water glass (positioning) through the perception of their eyes, and they can touch the water glass by reaching forward. Therefore, when people look at the water glass, almost all of them can reach the water glass accurately.

The same goes for robots that want to navigate around the house. A map of the environment (map building) is built using sensors such as lidar and cameras. When the robot moves one meter forward, it builds a map of the surrounding environment and places the map exactly where it was built earlier (map building).

Positioning is to determine the position of the robot in the current map by matching the environmental information perceived by the current sensor with the constructed environmental map. Positioning can be accurate only when the map is accurate.

In map construction, the environmental information perceived by the current sensor is built into a map. At this time, the map should be placed at the current position of the robot, so only when the location is accurate can the map constructed be consistent with the real environment.

Therefore, positioning and map construction are interdependent and must be solved at the same time to build a good map.

What is the purpose or application of SLAM

In my opinion, the biggest application of SLAM is mapping. Through THE processing of SLAM, a map can be used in the future. And because SLAM itself involves localisation, it can be used as a localisation algorithm when not saving a map.

The 2-D grid map constructed by 2-D laser SLAM can be used for robot positioning and navigation.
The THREE-DIMENSIONAL point cloud map built by THREE-DIMENSIONAL laser SLAM can be used for positioning and navigation of unmanned vehicles, and can also be used for three-dimensional modeling.
The sparse point cloud map constructed by visual SLAM can be used for positioning.
The semi-dense and dense point cloud maps built by visual SLAM can be used for positioning and navigation, interactive scenes in THE VR field, and 3D modeling.

The figure above shows a 2-D grid map constructed by 2-D laser SLAM

Above is a 3D point cloud map constructed by 3D laser SLAM

Above is a sparse point cloud map built by Visual SLAM(ORB-SLAM2)

Above, visual SLAM constructs a dense point cloud map

3. Three modules of SLAM

As we all know, the framework of SLAM has been generally fixed at the present stage, which is divided into front-end odometer module, back-end optimization module, and loopback detection module.

Next, I’ll briefly describe what these three modules do and how to implement each one.

3.1 Front odometer

3.1.1 What is a front odometer

The robot’s wheels have sensors called encoders that measure exactly how far the wheels have traveled. The purpose of the front odometer is the same, which is to measure how far the robot has traveled from the beginning to the present, relative distance and relative attitude (pose) from the initial position.

3.1.2 How to achieve this

For laser SLAM, the frequency of lidar is usually between 10Hz and 40Hz. I just make sure the first frame radar data and the second frame interval of radar data, the robot walk how far, and to determine the second frame into the third frame interval of radar data, the robot walk how far (pose transformation), so on, we can always determine the robot exactly how far, determine the robot’s current position and the relative position of the initial time.

For visual SLAM, the camera data is a frame-by-frame image, which can be RGB color images or color images plus depth images. General processing method to extract the feature points in the image, and then determine the feature points in the space coordinates, through these feature points, determine the robot pose transformation between the two frames, and then determine the next frame image and the third position transformation between frames, so on, can determine the robot’s current position and the relative position of the initial time.

The above process determines the pose transformation of the robot relative to the initial time and the arrival of each frame data. This process is the process of positioning.

3.1.3 Concrete implementation method

For laser SLAM, the process of finding the pose transformation between the previous frame radar data and the current frame radar data is generally referred to as scan-matching. Scan is the radar data, by matching with the previous frame data, so as to determine the position change.

The current scanning matching method is:

Scan-to-scan: Matches radar data with radar data
Scan-to-map: Matches radar data with a map
Scan-to-submap: Radar data is matched to a submap
Map-to-map: matches maps with maps

For visual SLAM, the process of obtaining the pose transformation between the previous frame image and the current frame image is generally called BA(Bundle Adjustment). There are many methods to solve BA. As I don’t know much about visual SLAM at present, I will not explain more here.

3.2 Back-end optimization

3.2.1 Why is back-end optimization needed

Neither the odometer obtained using the encoder nor the odometer calculated at the front end is completely accurate.

Even with a very accurate encoder, when the wheel skidded on smooth ground (the data was larger than the actual value) or when the wheel passed a pit or dirt bump, the odometer’s measurement would not match the actual value.

Similarly, since all sensors are incorrect, the odometer values we calculate with the front odometer must also be incorrect, and this error will increase with time.

This will lead to a bigger and bigger difference between the robot’s position and the actual position, which will eventually lead to a discrepancy between the robot’s position and the real position, so that a good map cannot be constructed.

3.2.2 What is back-end optimization

Because the front odometer has accumulated errors, is there a way to reduce or even eliminate this accumulated error?

This is the role of the back-end optimization, through the graph structure, the individual position of the robot and the generated map data, conduct joint optimization, through optimizing solution, disperses all the error of the average to each robot and each map data, when my optimization process is very perfect, the accumulated error can be reduced to the point where you can ignore it.

For laser SLAM and visual SLAM, their back-end optimization process is similar, both of them use graph structure to reduce the error, but due to the different data types of sensors, the specific implementation method is not exactly the same.

3.3 Loopback detection

We can reduce the pose through back-end optimization, so is there a strong constraint to constrain the optimized equation?

The answer, of course, is loop detection.

When we humans start from the east gate of the park, turn around for 10 minutes and come back to the east gate, we can easily tell that this is where I came before, and it is the same east gate as before.

However, for the robot, it also took 10 minutes to walk from the east gate. Since the posture of the robot is obtained by gradual accumulation, there is a cumulative error in such calculation. When the robot returns to the East gate again, it may think that it is still 20 meters away from the east gate. These 20 meters are the deviation caused by the robot positioning for a long time.

By some means, we can compare the environmental information perceived by the current sensor with the map built by the robot before. If the matching degree is high, we think that the robot has been to a place before, so the current position should not be too far away from the position when it passed by before (constraint).

We can put this constraint into the back-end optimization process as a new and very strong constraint, and through this strong constraint, when we are done with the optimization, we can significantly eliminate the error.

4. Development of SLAM

4.1 Maps with high accuracy

This is the focus of current academic research, and a lot of work is going into how to build more accurate, more accurate maps.

4.2 Construction drawings in long-term environment

The environment that people touch is never the same, but the map that we save every time we conduct SLAM will not change again. How to update the previous map while running the navigation task without taking up resources is also a research focus. There are already several companies that can do this.

4.3 Reduce the amount of calculation

Current SLAM algorithms are computatively heavy, and if they can run dense map building in real time, it will take the VR field to a new height. That’s why I’ve been following this VR space.

4.4 the semantic SLAM

People can perceive the environment through their eyes and recognize the names and types of objects they can see. Semantic SLAM can do the same, and environment maps with semantic tags will be even more useful for navigation.

4.5 Perception replaces SLAM

When the understanding ability of robots is gradually improved, we don’t need to simply build a map any more. What the robot sees can be a map. It can sense the environment like a human and determine its own position, so there is no need to save the map.

conclusion

This article will briefly explain my understanding of laser SLAM and visual SLAM. Due to my limited understanding, please forgive me if there is any mistake, and let me know in the comments. Thank you very much.

The next article will provide a brief overview of the main directions and functions of this series.

The article will beOfficial account: Build SLAM from scratchTo synchronize the update, welcome to follow.