Shiv ENOW large front end

Company official website: CVTE(Guangzhou Shiyuan Stock)

Team: ENOW team of CVTE software Platform Center for Future Education

The author:

preface

This article assumes that the reader has a certain understanding of vectors and matrices.

For the coordinate system believe everyone is familiar, write CSS when the absolute positioning is widely used to describe the position of a DOM through the coordinate system. In WebGL, coordinate systems are not only widely used, but also a variety of coordinate systems, some used to describe a model itself, and some used to describe a 3D scene.

These coordinate systems have their own usage scenarios and are independent of each other. But we can string them together by coordinate transformation, and ultimately tell the computer where we want to draw the vertices on the screen.

This article will start with a model and describe how it is rendered to the screen step by step. Along the way, you will learn what coordinate systems WebGL involves, why there are so many coordinate systems, what each coordinate system does, and how they are related to each other.

The overall concept

Types of coordinate systems

As mentioned earlier, there are many types of coordinate systems used in WebGL. They are:

  • Model coordinate system (object space)
  • World coordinate system (world space)
  • Observation coordinate system (view space)
  • Clipping coordinate system (clip space)
  • Normalized device coordinate system (NDC, Normalized Device Coordinates)
  • Screen coordinate system (screen space)

Coordinate system transformation pipeline

The transformation between these coordinate systems is carried out through the transformation matrix, that is, if the coordinate of A point is A in coordinate system A and the transformation matrix between coordinate systems A and B is M, then the coordinate B of this point in coordinate system B is:

Given the coordinates of a point in the model coordinate system, the computer will transform the coordinates to the world coordinate system and then to the screen coordinate system step by step so that the computer knows where to render the point on the screen.

We know that matrix multiplication does not comply with the commutative law, so such coordinate system transformations must be in a fixed order, otherwise the wrong coordinates will be obtained. This fixed order of coordinate transformation, we call the coordinate system transformation pipeline.

The figure above describes the coordinate systems involved in the pipeline, their order, and how they transform. This diagram is very important, and we will explain its various parts in detail later.

Why do we need so many coordinate systems

You might ask why do you need so many frames if you can convert between them? In fact, it is not difficult to understand from the idea of software stratification. Each coordinate system has its own appropriate scene, and they are relatively independent. The final result can be obtained only by connecting them with a transformation pipeline. But it’s not the program that really needs all these coordinate systems, all these layers, it’s us. Our brains can’t process too much information at once. They have to break things down and focus on small, individual questions that combine the answers to solve big problems.

Model coordinate system

To build a 3D scene, we first need a model, like this little forklift.

A model coordinate system is a coordinate system based on the model itself, which describes where the points of the model are located in the model, regardless of where the model will end up. The origin of this coordinate system is usually in the center of the model, but it can also be somewhere else, like the bottom center.

World coordinate system

As the name implies, the world coordinate system is the coordinate system used to describe the entire 3D scene, and its origin is the point (0, 0, 0). We put our model in world coordinates, and if we don’t transform it, then the model will be placed at the origin. Obviously more often than not we need to put our model outside the origin of the world coordinate system, and that’s when we need to transform. The transformation from a model coordinate system to a world coordinate system is called a model matrix. It also includes three basic transformations: rotation, scaling and translation. After model transformation, the car will be placed somewhere in the world coordinate system:

Observation coordinate system

When you do not see the flower, it is silent with your heart. When you look at the flower, its color suddenly dawned on you.

The world coordinate system seems to be enough to describe a 3D scene, but the same world, viewed from different angles and locations, will obviously see different things. So how do you describe the impact of this Angle and position? To solve this problem, we need to introduce observation coordinates. View space is also called camera space or eye space. It simulates the result of human eye/camera observing the world, taking the position of human eye/camera as the origin of coordinates, and the direction of human eye/camera as the positive direction of z axis.

The world coordinate system can be transformed to the observation coordinate system through the View matrix. If our eyes/cameras are placed in different places in the world coordinate system, facing in different directions, there will be different view transformation matrices. The same point in the world coordinate system will have different coordinates in the observation coordinate system. This simulates the human eye/camera observing the world in different locations and directions, and the results are not the same, this result.


Clipped coordinate system

At this point we have a coordinate system centered on our eyes/camera, with a heroic sense that we are the center of the universe. Unfortunately, our screens are 2D and too small to fit into our wide world. And that’s where the clipping coordinate system comes in.

Cropping coordinate systems does two things: limit the visual range and prepare for the subsequent transition from a 3D world to a 2D one. And to do both of those things, we first need to define a visual range, which is defined by our projection transformation matrix.

Projection transformation matrix is the projection transformation matrix used to transform the observation coordinate system to the trimmed coordinate system, generally there are two kinds: orthogonal projection matrix and perspective projection matrix.

Orthogonal projection

As shown in the figure above, orthogonal projection limits the visual range by defining a cuboid in which points will eventually be displayed on the screen. In addition, after the projection transformation (whether orthogonal or perspective projection), the coordinates will add a w component, that is, (0,0,0) will become (0,0,0,w). This w component will be used in subsequent transformations to describe the effect of a point’s distance from the origin of the observation frame. The introduction of this W component is the second thing cropping coordinates do: prepare for the subsequent transformation of the 3D world into the 2D world. In orthogonal projections, this w component is always 1, which means that the distance from the observation point (human eye/camera) does not make any difference.

Perspective projection

As shown in the figure above, the visual range of a perspective projection is a cone with a truncated head. Similarly, after the projection transformation, the coordinates will add a w component. Unlike orthogonal projection, this component is not always unique, but the further the point from the observation point, the larger the W component, which means that the distance from the observation point will affect the coordinate.

Some doubt

Looking at this, you may have some questions, and I’ll try to answer some of them.

Why is it called a projection transformation?

Remember one of the things cropping coordinate systems does: prepare for the subsequent transition from a 3D world to a 2D world. So far, our world has been 3D, but apparently our screens are flat and can only display 2D images. This requires us to map points from a 3D coordinate system onto a 2D screen, just as sunlight projects objects onto the ground in real life. We assume that the clipped coordinate system is a 2D coordinate system, so the projection transformation is to project the 3D observation coordinate system into the clipped coordinate system. That’s where the name projection transformation comes from.

What is the use of the W component?

The same coordinate, the three components of x, y, and z are obviously going to change as a result of the projection transformation, so what does this new w do? As mentioned above, the cutting coordinate system is to prepare for the subsequent 3D to 2D, which is obviously still a 3D coordinate system itself (considering the W component, it is actually a 4D coordinate system), and the W component plays the role of “preparation”.

The W component plays a key role in the subsequent transformation. It plays a role in preserving the calculation results of the previous projection. By using it, the coordinates on the 2D screen can be obtained simply by calculation in the subsequent transformation. It mainly preserves two information: the information about the size of the coordinate after the projection transformation and the information about the distance from the observation point.

What are the uses of orthogonal projection and perspective projection?

Readers of Three. js may recall that there are two types of cameras in Three. js: orthogonal and perspective. In fact, they are respectively to observe the coordinate system orthogonal projection transformation and perspective projection transformation. Perspective projection, as its name implies, simulates the perspective effect of human eyes. This effect has two characteristics:

  1. Things that are near can be seen in a smaller range, and things that are far can be seen in a larger range. For example, if you stand on a hill and you can see the whole city, you may only see the bottom of the tree.
  2. Things that are near look bigger and things that are far look smaller.

The first feature explains why a perspective projection’s visible range is a cone with a smaller near plane and a larger far plane. The second feature explains why the W component grows as the coordinates move away from the observation point. Orthogonal projections, by contrast, are a little anti-human. First of all, its visual range is a cuboid, and there is no such feature that things near can be seen in a small range. Secondly, the W components of its output coordinates are all 1, and there is no feature of near large and far small. Such characteristics after orthogonal projection transform objects, to preserve its original shape and size, this is very useful for the design, you don’t want when design a 3 d scene are thinking in front of the two look the same big ball is really as big or as one of the relatively far distance.

On the left is the perspective projection effect and on the right is the orthogonal projection effect.

Normalized Equipment Coordinate System (NDC)

From here you’re in the GPU world. Normalized device coordinates, as the name suggests, normalize coordinates.

We have a variety of screen devices, and we need a unified coordinate system to describe where we need to render a point on the screen. This coordinate system is our NDC. For example, we define this point at (1,1) of NDC coordinates, and when GPU draws it on 400*400 screen A, we only need to know that (1,1) in NDC corresponds to (400,400) of screen A, that is, we need to know that this point needs to be rendered on (400,400).

Converting from clipped coordinate systems to NDC is simple, requiring only perspective division. As the name suggests, it simply divides the coordinates, dividing the xyz components by the W components, producing the new XYZ component. Obviously, the farther away from the observation point, the larger w is, and the smaller XYZ is after division. Smaller XY means smaller size, smaller Z means further away from the observation point, and the larger Z will cover the smaller Z, thus restoring the information retained by W. At this point, the transformation from 3D to 2D is complete.

Screen coordinate system

The coordinates in the NDC coordinate system cannot, of course, be drawn directly to the screen (NDC coordinates are between -1 and 1), and one final transformation is required to draw to the screen.

The screen coordinate system describes the real screen space. To convert the NDC coordinate system to the screen coordinate system, viewport transformation is required. Viewport transformation matrix defines the corresponding relationship between screen coordinates and NDC coordinates. Screens with different resolutions have different viewport transformation matrices. At this point, our conversion is finally over.

conclusion

This article highlights the various coordinate systems involved in WebGL and how they fit together, as well as explaining some of the details. The next article, WebGL Coordinate Systems ii, will focus on how the transformation matrices between these coordinate systems are derived.

Afterword.

While writing this article, I will think, similar “popular science” articles emerge after another, what value can my little article bring to the readers? Compared to some excellent articles, the value of this article is really limited. However, this article can provide another way to explain, perhaps closer to the way of thinking of some people. If this part of the people can read this article and gain something, it is afraid that only one such person, then my little article is not a waste of network space. This article is mixed with a lot of personal understanding, although it will be convenient for some readers to understand obscure concepts, but may not be accurate, if there are omissions please point out in the comments section. The resources

  1. Introduction and Practice of webGL
  2. LearnOpenGL
  3. Interactive Computer Graphics — A Top-down Approach Based on WEBGL (7th edition)