Calculation of various WebGL Spaces

Vertex Transformation in Rendering pipeline – Zhihu (zhihu.com)

Derivation of MVP Matrix in Rendering Pipeline – Zhihu (Zhihu.com)

An overview of the

Each space
- Vertex data input (the actual position of an object in three-dimensional space) Object coordinate system
- Vertex shader handles (just do the model’s offset, camera at origin) camera space
- Projection matrix (do projections) in clipping space
- Perspective division is NDC space
- Viewport Coordinates are screent-space Coordinates

In the graphics rendering pipeline, a vertex coordinate, presumably, goes through the local coordinate system, the world coordinate system, the camera coordinate system, the clipping coordinate system, and finally to the window coordinate system, displayed on the screen.

In each of these processes, you have to do some transformation to go from one frame to the other. Below, I will describe how each transformation works.

Note that this article is for OpenGL.

Local space -> World space

This transformation process is mainly to place the model in the world space for a certain scale, rotation or translation. This step is relatively simple, just apply the corresponding matrix to the local space coordinates of the model.

For example, scaling the modelAnd then rotate about the Z axisDegree, and then proceedIn the translation. Note that the order of transformation is invariant, that is, scaling, rotation, and finally translation. Based on this, we can construct model transformation matrix.

World Space -> Camera space

First, define the camera:

Coordinate for
To observe the direction
Upward direction

The schematic diagram is as follows:

Here’s a property to note: when the camera changes with the object it sees, what the camera sees doesn’t change. In this way, you can move the camera coordinate to the origin of the world coordinate, align the Y-axis of the world coordinate upward, and align the -z axis of the world coordinate in the observation direction. And then you do the same thing to the object.

In mathematics, the process goes something like this:

Move the camera to the origin of the coordinates
Rotation observation directionThe Z axis
Upward direction of rotationTo Y
Rotation () to the X axis

Roughly divided into two steps: first displacement, then rotation. namely 。

Translation part:

For the rotation part, just a few things. For two dimensions:

By definition, rotationAngle and rotationThe angles are reciprocal, that is: 。

Therefore, for the rotation transformation, it can be concluded that the inverse of the rotation matrix is equal to its transpose, namely:

Going back to the rotation section above, it is not very convenient to directly find the matrix of the rotation of the camera’s coordinate axis to the world axis, but conversely, it is easy to find the rotation of the world axis to the camera’s coordinate axis:

According to the inverse of the rotation matrix is equal to its transpose, it can be concluded that:

According to the, without considering scaling, it can be concluded that:

Camera Space -> Cropping space

The Viewing Box created by the projection matrix is called a Frustum, and each coordinate that appears within the Frustum range will eventually appear on the user’s screen. The process of converting coordinates within a specific scope to a standardized device coordinate system (and it is easy to map to 2D viewing space coordinates) is called Projection because a Projection matrix can be used to Project 3D coordinates into a standardized device coordinate system that can be easily mapped to 2D.

Note here that OpenGL is right handed, but in NDC, it is left handed, pay special attention here!!

Orthogonal projection

Let’s first define an orthogonally projected visual cone(Note that n and f are both negative numbers, f is the far plane, so f<n), it’s a cuboid. What we need to do is convert the orthogonally projected visual cone to a standard cube,). ** attention, hereMaps to [1,-1] in the NDC. ** Note the Z axis in the third picture below

Here, there are two steps: translation and scaling. Note: r – > 0, l t – > 0, b f – n < 0

Perspective projection

For perspective projection, there are two steps:

First, “squish” the visual cone into a cuboid (n-> N, F -> F) ();
Then, do the orthogonal projection operation (, the orthogonal projection above).

Look at the picture below:

According to the relationship of similar triangles, it can be concluded that:

Similarly, it can be concluded:

Thus, the following relationship can be obtained:

Now, the property of a homogeneous coordinate: in a 3D coordinate system, ，， They all represent the same coordinates. Such as: 和 They represent coordinates 。

Therefore, the relationship is as follows:

Further, we can get:

Now, we have a third column that we don’t know.

By observing the perspective projection cone above, the following inferences can be drawn:

The coordinates of the points near the plane don’t change;
A point in the far plane, the Z coordinate doesn’t change.

According to corollary 1, points near the planeWhen you transform it, it doesn’t change. That is:

According to:

becauseIt has nothing to do with either x or y, so we getThe third column of theta is of the form 。

According to:

It can be concluded that:

According to corollary 2, the center of the far planeAfter the transformation, it is still itself. As follows:

Therefore, it can be concluded that:

That is:

Here, the system of equations can be derived:

So if I go here, I get this :

Finally, perspective projection matrix:

Cropping space -> Window space

The W value of the clipped space point is 1
At the end of the clipping space, all the visible points are in the standard equipment coordinate system (NDC), that is, the coordinates are located in the range [-1 -1].
NDC to window space, define a screen space:. Coordinates in the lower left corner of the plane, the coordinates on the upper right corner are. For the transformation of X and Y coordinates, from 到。
Here, through two steps, the translation matrix is translated [1,1,1] and the scaling matrix is scaled [w/2,h/2,1/2]
For the Z coordinate, fromMapped to. Scale first and then offset.

Post-processing space conversion to obtain the z-value of viewport space

Given the depth map depth, the Z value for viewport space can be obtained

float viewZ = getViewZ( depth ); // This viewZ is used to get clipW
float getViewZ( const in float depth ) {
    #if PERSPECTIVE_CAMERA == 1
        return perspectiveDepthToViewZ( depth, cameraNear, cameraFar );
    #else
        return orthographicDepthToViewZ( depth, cameraNear, cameraFar );
    #endif
}
float perspectiveDepthToViewZ( const in float invClipZ, const in float near, const in float far ) {
    return ( near * far ) / ( ( far - near ) * invClipZ - far );
}
float orthographicDepthToViewZ( const in float linearClipZ, const in float near, const in float far ) {
    return linearClipZ * ( near - far ) - near;
}
Copy the code

You can get the w value from the z value and the camera matrix, so in the orthogonal case,w is 1, in the projection case

float clipW = cameraProjectionMatrix[2] [3] * viewZ + cameraProjectionMatrix[3] [3];
Copy the code

Space transformation in post-processing to obtain the coordinates of viewport space

Post-processing vUV screen space -> cropping space converts the vUV [0,1] interval to the [-1,1] interval
- 0.5 after by 2

vec4 clipSpeace = vec4((vec3(vUV,depth) - 0.5) *2.0.1.0)
Copy the code

Clipping space -> Projection space

vec4 projectionSpace = clipSpeace*w;
Copy the code

Projection space -> Viewport space

vec4 viewSpace = projectionSpace*cameraInverseProjectionMatrix;
Copy the code

Space conversion to get the coordinates of screen space

After * 0.5 + 0.5

vec4 samplePointNDC = cameraProjectionMatrix * vec4( samplePoint, 1.0 ); // Projection space
samplePointNDC /= samplePointNDC.w; // Perspective division
vec2 samplePointUv = samplePointNDC.xy * 0.5 + 0.5; // Screen space
Copy the code

supplement

I came across a diagram that clearly describes the transformation process described above, which is also recorded here: