Basic overview

As we all know, OpenGL is a 3D graphics library that is widely used on terminal devices. But our display devices are 2D flat, so how does OpenGL map 3D graphics to 2D screens? That’s what OpenGL coordinate transformation is all about. Normally, we’re looking at a 3D world on a 2D screen. So what we’re actually seeing is a projection of the 3D world on a 2D screen. Through OpenGL coordinate transformation, we can project 3D objects onto 2D screens at a given viewing Angle, and then after rasterization and pixel shading, the whole 3D objects will be mapped into pixels on 2D screens. The coordinate transformation process of OpenGL is as follows:

  • The vertex shader is responsible for the first and second lines of model transformation, visual transformation, and projection transformation, which determines the position of an element in 3D space.
  • The third line of perspective division and viewport transformation is done during the element assembly phase, which determines the position of an element on the screen.

Let’s take a quick look at the process:

  1. First, the input vertices are typically 3D models represented in local coordinates. Local coordinates are used to study isolated 3D models, and the origin of coordinates is usually the center of the model. Each 3D model has its own Local Coordinate, which is not correlated with each other.
  2. When we need to render multiple 3D objects at the same time, we need to transform different 3D models into a unified Coordinate system, which is the World Coordinate system. To transform an object from a local coordinate system to a world coordinate system, we use aModelMatrix done. The model matrix can realize a variety of transformations: translation, scale, rotation, reflection, shear, etc. For example, by translating, we can draw the same 3D model at different locations in the world coordinate system.
  3. Multiple objects in the world coordinate system together form a 3D scene. Looking at the 3D scene from different angles, we can see different screen projections. OpenGL puts forward the concept of camera coordinate system, that is, to observe the whole 3D scene from the camera position. To transform an object from the world frame to the camera frame, we use aViewMatrix done. View matrix defines the basic information of camera coordinate system, such as camera position, direction vector and upper vector.ViewThe matrix is left multiplied by the coordinates of vertex A in the world coordinate system, which transforms vertex A to the camera coordinate system. The same 3D object, in the world coordinate system, has a world coordinate; In the camera coordinate system, I have a camera coordinate,ViewTransformation is taking the coordinates of the object from the world to the camera.
  4. Because we’re looking at a 3D scene from a 2D screen, and the screen itself is not infinite. Therefore, when viewing a 3D scene from the perspective of the camera, you may not be able to see the whole scene, so you need to cut out the invisible scene. The projection transformation is responsible for clipping. The projection matrix specifies a View Frustum. Objects inside the View Frustum will appear on the projection plane, while objects outside the View Frustum will be clipped. There are many types of Projection. In OpenGL, Perspective Projection and Orthographic Projection are mainly considered. The differences between them will be discussed in detail later. Other than that, passProjectionMatrix, which transforms the object from camera frame to clipped frame. In the clipping coordinates, a visible range is specified for each axis X, Y, and Z. Any vertex exceeding the visible range is clipped out.
  5. The range of visibility specified by each cropping coordinate system may be different, and in order to achieve a unified coordinate system, Perspective Division of the cropping Coordinates is required to obtain Normalized Device Coordinates for NDC. Perspective division divides the clipping coordinates by the homogeneous componentsW, obtain NDC coordinates:

    In the NDC coordinate system, the interval of X, Y and Z coordinate axes is [-1,1]. Therefore, the NDC coordinate system can be thought of as a cube with sides of length 2, and all visible objects are inside the cube.

  6. The NDC coordinates range is [-1,1], but our screen size is variable, so how does OpenGL map NDC coordinates to screen coordinates? That’s what the Viewport Transform does. In OpenGL, we just passglViewportSpecify the coordinates and width of the drawing area, and the system will automatically transform the viewport for us. After the viewport transformation, we get the screen coordinates on the 2D screen. Note that screen coordinates are not the same as the pixel position of the screen. Screen coordinates are the exact position of any vertex on the screen, which can be any decimal. But pixel positions can only be integers (specific pixels). The viewport transformation here is from NDC coordinates to screen coordinates, and the final pixel position has not yet been generated. The mapping from screen coordinates to corresponding pixel positions is done later by rasterization.

In OpenGL, the local coordinate system, the world coordinate system and the camera coordinate system belong to the right hand coordinate system, while the final clipping coordinate system and the standardized device coordinate system belong to the left hand coordinate system. The diagram of the left and right coordinate system is shown below, in which the thumb, index finger and other fingers point to the positive direction of the X,y and Z axes respectively.

Below we see the derivation and use of model transformation, view transformation, projection transformation and viewport transformation respectively.

Model transformation

Model transformation adjusts the position of the 3D model in the world coordinate system by performing operations such as translation, scaling, rotation, mirroring, and miscutting. Model transformation is accomplished through model matrices. Let’s look at the derivation process of each model matrix.

Translational transform

Translation is just taking one vertex, A = (x,y,z), and moving it to another position= (..), travel distance D = – A = ( – x , – y , – z) = ( , , ), soIt can be represented by vertex A:


Expressed by translation matrix as follows:


Among themIt’s the translation matrix,Represents the displacement along the X axis,Represents the displacement on the Y axis,It’s the displacement along the Z axis. It might seem tedious, but in OpenGL, we can passGLM libraryTo translate it.

glm::mat4 model; // Define identity matrix
model = glm::translate(model, glm::vec3(1.0 f.1.0 f.1.0 f));
Copy the code

The above code defines the translation model matrix, representing the simultaneous displacement of 1 on the X, Y, and Z axes.

Scaling transformation

Objects can be scaled on the X, Y, and Z axes independently of each other. For scaling centered at the origin, assume that vertex A(x,y,z) is magnified on the X,y, and z axes respectively,,Times alpha, and then you get magnified vertices= ( * x , * y , * z), represented by the scaling matrix as follows:


Among themIt’s the scaling matrix. By default, the center of the scale is the origin of the coordinates. If we want to specify vertex P( , , ) to scale the object. Then you can follow the following steps:

  1. Move vertex P to the origin
  2. Rotate the specified Angle centered on the origin of the coordinates
  3. The whole process of moving vertex P back to its original position can be reduced to a matrix:

In OpenGL, we can scale using the GLM library:

glm::mat4 model; // Define identity matrix
model = glm::scale(model, glm::vec3(2.0 f.0.5 f.1.0 f);
Copy the code

The code above defines the scaling model matrix, representing fa2x on the X-axis, 0.5x on the Y-axis, and unchanged on the Z-axis.

Rotation transformation

In 3D space, rotation requires defining an axis of rotation and an Angle. The object rotates at a specified Angle along a given axis of rotation. So let’s first look at, what does the rotation matrix look like around the Z axis? Suppose we have a vertex P whose original coordinates are ( , , ), the distance from the origin is, rotate clockwise along the Z axisDegree, the new coordinate is ( , , ), since z coordinates remain unchanged before and after rotation, it is temporarily ignored, and then it can be obtained:





According to the above formula, the rotation matrix around z-axis can be obtained:


Similarly, the rotation matrix around the X-axis can be obtained:


Similarly, the rotation matrix around the Y-axis can be obtained:


In OpenGL, we can use the GLM library to implement rotation transformations:

glm::mat4 model; // Define identity matrix
model = glm::rotate(model, glm::radians(45.0 f), glm::vec3(0.4 f.0.6 f.0.8 f));
Copy the code

The above code says: Rotate 45 degrees clockwise around the vectors (0.4f, 0.6f, 0.8f).

When performing rotation operations, there is often a confusion: clockwise is positive, or counterclockwise is positive. In fact, there is a left hand rule and a right hand rule that can be used to determine the positive direction of an object as it rotates around an axis.

In OpenGl, we use the right hand rule, with the thumb pointing in the positive direction of the rotation axis, and the rest of the fingers bending in the positive direction of rotation. So minus 45 degrees up here is rotating clockwise.

The order of model transformation

Because the matrix does not satisfy the commutative law, the order of translation, rotation, and scaling is important. It is generally scaled first, then rotated, then shifted. Ultimately, of course, you have to consider the actual situation. It is also important to note that the order of the GLM operation matrix is reversed. As shown below, although the order of writing is: Pan, rotate, and scale, the actual final model matrix is: scale first, then rotate, and finally pan.

glm::mat4 model; // Define identity matrix
model = glm::translate(model, glm::vec3(1.0 f.1.0 f.1.0 f));
model = glm::rotate(model, glm::radians(45.0 f), glm::vec3(0.4 f.0.6 f.0.8 f));
model = glm::scale(model, glm::vec3(2.0 f.0.5 f.1.0 f);
Copy the code

View transformation

After model transformation, all coordinates are in the world coordinate system. This section is to observe the whole world space from the Angle of the camera. First we need to define a camera coordinate system. In general, the following parameters are required to define a coordinate system:

  1. Specify the dimensions of the coordinate system: 2D, 3D, 4D, etc.
  2. Define the axis vectors of coordinate space, such as the X, Y, and Z axes, calledBase vectorThe basis vectors are usually orthogonal. All vertices in the coordinate system are represented by basis vectors.
  3. The origin of the coordinate system, O, is the reference point for all the other points in the coordinate system. In a nutshell,The coordinate system is equal to the basis vector, the origin O.

The same vertex has different coordinates in different coordinate systems, so how do I convert the vertex coordinates in the world coordinate system to the camera coordinate system? To realize coordinate transformation between different coordinate systems, it is necessary to calculate a transformation matrix. This matrix is the coordinate representation of the origin and the basis vector in frame A in the other frame B. Assume that there are coordinate systems A and B and vertex V, then the coordinate transformation formula of vertex V in coordinate systems A and B is as follows:



A quick explanation:

The coordinates of vertex V in A coordinate system = the basis vector of B coordinate system and the coordinates of the origin in A coordinate system represent the transformation matrix formed by * the coordinates of vertex V in B coordinate system; The coordinates of vertex V in B coordinate system = the basis vector of A coordinate system and the coordinates of the origin in B coordinate system represent the transformation matrix * the coordinates of vertex V in A coordinate systemCopy the code

Among them,andThe reciprocal inverse. Therefore, the key to the switch between coordinate systems is to find the transformation matrix of coordinate systems. thenHow should the matrix compute that? Let’s say the three basis vectors of frame A and the unit vector of the origin in the B coordinate space are, respectively,,and, thenThe matrix is as follows:


The matrix is computed similarly and will not be described here.

Now let’s see how OpenGL’s view transformation matrix is computed, right? There are now two coordinate systems: the world coordinate systemWAnd camera coordinate systemEAnd I have vertex V, and I know that the coordinates of vertex V in the world coordinate system are equal to (..), what are the coordinates of vertex V in camera coordinates? According to the formula above, the first thing we need to do is figure outMatrix.

As we all know, the origin of the world coordinate system O = (0,0,0), and the three basis vectors are respectively X axis :(1,0,0), Y axis :(0,1,0), and Z axis :(0,0,1). In theory, four parameters are needed to define a camera coordinate system:

  1. Position of the camera in the world coordinate system (the origin of the camera coordinate system)
  2. Observation direction of camera (Z basis vector of camera coordinate system)
  3. A vector pointing to the right of the camera.
  4. A vector pointing to the top of the camera (Y basis vector of the camera coordinate system).

With these four parameters, we actually create a coordinate system with three unit axes perpendicular to each other, starting with the camera position.

In use, we only need to specify three parameters:

  1. Camera position vector ()
  2. The target position vector of the camera ()
  3. Vector pointing to the top of the camera)

The following steps are derived from the unit basis vector of camera coordinate system according to the above three parameters:

  1. First, the orientation vector of the camera is calculated(The direction vector is the positive z-axis direction of the camera coordinate system, which is opposite to the actual observation direction).

And then you compute the unit direction vector


  1. According to the upper vectorAnd the unit vectorDetermine the right vector of the camera

And then you figure out the unit right vector


  1. Let’s look at the unit right vectorAnd the unit vectorLet’s determine the unit vector

In this way, three unit basis vectors of the camera coordinate system are determined:,andAnd the position vector of the camera. These four parameters together determine the camera coordinate system: the camera position is the origin of coordinates, the unit right vector points to the positive X axis, the unit upper vector points to the positive Y axis, and the unit direction vector points to the positive Z axis.

Now that we’ve defined a camera coordinate system, the next step is to take the vertices in the world coordinate system V =..) to this camera coordinate system. According to the above, vertex V is in the camera coordinate systemEThe coordinate calculation process of is as follows:


So the key is to compute the transformation matrix, and according to the basis vector of the camera coordinate system and the coordinate representation of the origin in world space, we can get:


So the final transformation matrixAs follows:


Among them,dotThe delta function is the dot product of vectors, which is a scalar quantity. And finally, the coordinates of vertex V in camera coordinatesAs follows:


The aboveMatrix isViewTransformation matrix.

Let’s take a case: suppose the coordinate of the camera is (0,0, 3), the observation direction of the camera is the origin of the world coordinate system (0,0,0), the upper vector is (0,1,0), and the coordinate of vertex V in the world coordinate system is (1,1,0), then the basis vector and origin of the camera coordinate system can be calculated as follows:

  1. =
  2. =
  3. =
  4. = So the correspondingViewThe transformation matrix is:

Finally, the coordinates of vertex V in the camera coordinate system are:


Although the above process is complicated, in OpenGL, we can define the View matrix through the GLM library. For the above example, the View matrix can be obtained through the lookAt function.

glm::mat4 view;
view = glm::lookAt(glm::vec3(0.0 f.0.0 f.3.0 f),glm::vec3(0.0 f.0.0 f.0.0 f), glm::vec3(0.0 f.1.0 f.0.0 f));
Copy the code

After verification, the View matrix obtained by lookAt function is:


Obviously, the View matrix derived from the lookAt function is the same as the View matrix derived above.

Projection transformation

After model transformation and view transformation, the 3D model is already in the camera coordinate system. The projection transformation in this section transforms the object from the camera coordinate system to the clipping coordinate system to prepare for the viewport transformation in the next step. The projection transform determines which objects in the scene can be displayed on the screen by specifying the viewing body. Objects in the visual body will appear on the projection plane, while objects outside the visual body will not appear on the projection plane. In OpenGL, we mainly consider perspective projection and orthogonal projection, the difference between the two is as follows:

Whether it’s perspective projection, whether it’s orthogonal projection, GLdouble left, GLdouble right, GLdouble bottom, GLdouble top, GLdouble near, GLdouble far) can be specified by specifying 6 parameters. (Left,bottom) specifies the coordinates of the lower left corner of the near clipping surface, (right,top) specifies the coordinates of the upper right corner of the near clipping surface, -near indicates the near clipping surface, and −far indicates the far clipping surface. Now we need to use these six parameters to derive the projection matrix.

In the camera coordinate system, the camera points to the -Z axis, so the near clipping plane z=− NEAR and the far clipping plane z=−far. And OpenGL is imaging in the near plane.

The perspective projection transformation specified by the above six parameters is as follows:

The orthogonal projection transformation specified by the above six parameters is shown as follows:

After projection transformation and perspective division, the vertices in the camera coordinate system are mapped to a standard cube, the NDC coordinate system. Where on X-axis: [left,right] maps to [−1,1], on Y-axis: [bottom,top] maps to [-1,1], and on z-axis: [near,far] maps to [−1,1]. The following matrix derivation will use the mapping here. Let’s look at the derivation process of the two projection matrices respectively.

Perspective projection

The coordinates of perspective projection and perspective division are mapped as follows:

In the figure above, the camera coordinate system is right-handed, and the NDC coordinate system is left-handed. The Z axis of the NDC coordinate system points to the -z axis of the camera coordinate system.

Assume that the coordinates of vertex V in camera coordinate system = ( , , , ), transform to the coordinates of the clipped coordinate system = ( , , , ), the coordinates of perspective division to NDC coordinate system = ( , , , ). Our goal is to compute the projection matrixThat:


Meanwhile, the transformation of perspective division can be obtained:


First of all, let’s look at the projection matrixTransformation of the X and Y axes. When vertex P is projected onto the near plane, you get the vertex= ( , , – near). The specific schematic diagram is as follows:


Therefore, the projection value on the X-axis can be obtained:


Similarly, through the figure on the right, the projection value on the Y-axis can be obtained:


And we can see from the equations (1) and (2), they’re all dividedAnd it’s inversely proportional to it. This can serve as a clue to perspective division, so our matrixAs follows:


That is to say,.

Next, we use the,The mapping relationship with NDC coordinates is derivedThe first two lines of.Satisfy that [left,right] maps to [-1,1], as follows:

K
P


By substituting the mapping between [left,right] and [-1,1], the linear equation can be obtained:


Substituting formula (1) above into Formula (3), it can be obtained:


And because, so the formula can be further simplified:


According to Formula (5), the matrix can be further obtained:


OK, let’s go ahead[bottom,top] is mapped to [-1,1], as shown below:


And because, so the formula can be further simplified:


According to Formula (7), the matrix can be further obtained:


And then we need to calculateThe coefficient of theta, this is the same as,Because of the coordinate system of the cameraProjection to the near plane is always -near. And we also know thatIndependent of the x and y components, therefore, the matrix can be further obtained:


because, so it can be obtained:


And because the camera coordinates= 1, so it can be further obtained:


Again, plug inwithThe mapping relation of [-near, -far] to [-1,1] can be obtained:


And because, can be further simplifiedandThe relationship between:


We know A and B from equation (9), and therefore, the final matrix:


In general, the projection’s viewing body is symmetric, that is, left=−right,bottom=− TOP, then we can get:



The matrixIt can be simplified as:



In addition to can pass (left, right, bottom, top, near, far) specifies the perspective projection matrix, The perspective projection matrix can also be generated by using the function GLM :: Perspective to specify Fov, Aspect, Near and Far, as shown below. The 45-degree perspective is specified, and the Near and Far planes are 0.1F and 100.0f, respectively:

glm::mat4 proj = glm::perspective(glm::radians(45.0 f), width/height, 0.1 f.100.0 f);
Copy the code

A schematic diagram of the viewing Angle is shown below:



Since the visual body is symmetric, formulas (10) and (11) are substituted into the existing onesMatrix, you can get Fov represented by perspectiveThe matrix is as follows:


throughMultiplying the vertices in the camera coordinate system by the matrix transforms them to the clipped coordinate system. And then, through perspective division, we go to NDC coordinates.

Orthogonal projection

The orthogonal projection matrix is simpler than the perspective projection matrix, as shown below:


For orthogonal projection transformations, the coordinates projected to the near plane ( , ) = ( , ), so it can be used directlywith,with,withIs a linear mapping relationship, to find the linear equation coefficient. The mapping of X, Y and Z axes is as follows:

The mapping relationship Mapping the value Schematic diagram
withMapping relation of [left , right] [1, 1]
withMapping relation of [bottom , top] [1, 1]
withMapping relation of [near , far] [1, 1]

According to the above mapping relationship, and the camera coordinate system= 1, three linear equations can be obtained, as follows:




According to the above three linear equations, the orthogonal projection matrix can be obtained:


If the visual body is symmetric, i.e. left=−right, bottom=− TOP, then we can get:



Is the orthogonal projection matrixIt can be further simplified as:


Viewport transformation

After projection transformation and perspective division, we cut out the invisible objects and get the NDC coordinates. The last step is to map the NDC coordinates to screen coordinates ( , ). As follows:

// Specify the window position and width
glViewport(GLint x , GLint y , GLsizei width , GLsizei height); 
// Specify the window depth
glDepthRangef(GLclampf near , GLclampf far);
Copy the code

Then the linear mapping between NDC coordinates and screen coordinates can be obtained:

The mapping relationship Mapping the value
withMapping relation of [1, 1] [x , x + width]
withMapping relation of [1, 1] [y , y + height]
withMapping relation of [1, 1] [near , far]

Therefore, we can set up linear equations to find the coefficients K and the constant P.


By substituting the above mapping into the linear equation, the parameter values of each component can be obtained.

Coordinate component The coefficients of a linear equationK The constant of a linear equationP
X component linear equation x +
Y component linear equation y +
Z component linear equation

The viewport transformation matrix can be obtained from the above coordinate components:


Thus, the screen coordinates are obtained by multiplying the NDC coordinates left by the ViewPort matrix.

For 2D screens, near and FAR are generally 0. So the third row of the ViewPort matrix is all 0’s. So after the viewport transformation, the Z value of the screen coordinates is 0.

So far, OpenGL’s whole coordinate transformation process is introduced, the key is to more practice, practice, practice!!

Reference documentation

  1. Cmd Markdown formula guide
  2. Elementary thinking about homogeneous coordinate systems
  3. Affine transformation and homogeneous coordinates
  4. Mathematical foundations of coordinates and transformations
  5. OpenGL learning footprint