Many OpenGL ES tutorials have an example of showing an image as a texture on the screen.

Because the texture coordinates are not the same as the actual screen coordinates, the image is upside down when rendered on the screen.

One solution is to take the current vertex coordinates and multiply them by a matrix rotated 180 degrees about the Z-axis so that the image will display correctly.


[ c o s Theta. s i n Theta. 0 0 s i n Theta. c o s Theta. 0 0 0 0 1 0 0 0 0 1 ] \begin{bmatrix} cos\theta&-sin\theta&0&0\\ sin\theta&cos\theta&0&0\\ 0&0&1&0\\ 0&0&0&1 \end{bmatrix}

So how do we understand this rotation matrix?

The first problem that affects our understanding of this matrix is:

Why is this 4×4 matrix? And we found that the rotation, scaling, and translation matrices are all 4×4.

In terms of intuition, it’s enough to express the coordinates of a three-dimensional space with x,y, and z, but in a three-dimensional space with a matrix transformation, it’s enough to use a 3×3 matrix, why do we need 4×4?

To answer this question, let’s first understand the relationship between ** vectors and matrices in a geometric sense, and then solve the puzzle step by step by deriving the rotation matrix ** and translation matrix.

Vectors and matrices

On a geometric plane, we can think of any point on the plane as a vector associated with the origin.

As shown in the figure, point A can be expressed as the vector OA⃗\vec{OA}OA; There are points I (1, 0) and J (0, 1) on the X-axis and Y-axis respectively. Again, let them form vectors with the origin. For simplicity, we denote these vectors by I ⃗\vec{I} I and j⃗\vec{j}j.

Since the coordinates of point A are (3, 2), if we were to use I ⃗\vec{I} I and j⃗\vec{j}j for OA⃗\vec{OA}OA, it would look like this:


3 i + 2 j = O A 3\vec{i} + 2\vec{j} = \vec{OA}

The geometric meaning here is that I ⃗\vec{I} I extends to 3i⃗3\vec{I}3i, j⃗\vec{j}j extends to 2j⃗2\vec{j}2j, and then add the two vectors to get OA alc \vec{OA}OA

The I coordinate is (1, 0), and the j coordinate is (0, 1). Let’s convert the above equation into a vertical column:


3 [ 1 0 ] + 2 [ 0 1 ] = [ 3 2 ] 3 \begin{bmatrix} 1\\0 \end{bmatrix} + 2 \begin{bmatrix} 0\\1 \end{bmatrix} = \begin{bmatrix} 3\\2 \end{bmatrix}

Here is actually a simple vector operation, the operation process is as follows:


[ 3 x 1 3 x 0 ] + [ 2 x 0 2 x 1 ] = [ 3 x 1 + 2 x 0 3 x 0 + 2 x 1 ] = [ 3 2 ] \begin{bmatrix} 3 \times 1\\ 3 \times 0 \end{bmatrix} + \begin{bmatrix} 2 \times 0\\ 2 \times 1 \end{bmatrix} = \begin{bmatrix} 3 \times 1 + 2 \times 0\\ 3 \times 0 + 2 \times 1 \end{bmatrix} = \begin{bmatrix} 3\\ 2 \end{bmatrix}

Does it feel like deja vu to see how it works? Isn’t that just matrix vector multiplication? The operation is simply multiplying a vector by a matrix:


[ 1 0 0 1 ] [ 3 2 ] = [ 3 x 1 + 2 x 0 3 x 0 + 2 x 1 ] = [ 3 2 ] \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix} \begin{bmatrix} 3\\ 2 \end{bmatrix} = \begin{bmatrix} 3 \times 1 + 2 \times 0\\ 3 \times 0 + 2 \times 1 \end{bmatrix} = \begin{bmatrix} 3\\2 \end{bmatrix}

[1001]\begin{bmatrix} 1&0 \\ 0&1 \end{bmatrix}[1001] This element matrix, seen in two vertical columns, is the coordinates of points I and J, and is also the vectors I ⃗\vec{I} I and j⃗\vec{j}j. In mathematics, I ⃗\vec{I} I and j⃗\vec{j}j are called the basis vectors of this coordinate system.

Derivation of rotation matrix

Let’s now rotate the entire axis counterclockwise about the origin:

After rotation, points I and J correspond to the positions I ‘I ^\primei’ and J ‘j^\primej’.

Through simple trigonometric function calculation: [cos45osin45o]\begin{bmatrix}cos45^o \ sin45^o\end{bmatrix}[cos45osin45o] \begin{bmatrix} -sin45^ O \\ cons45^ O \end{bmatrix}[−sin45ocons45o]

What are the coordinates of A after the rotation? Reviewing the practice of the above, I ‘⃗ \ vec {I ^ \ prime}’ I extend into 3 I ‘⃗ 3 \ vec {I ^ \ prime} 3 I’, j ‘⃗ \ vec {j ^ \ prime} j’ extended to 2 j ‘⃗ 2 \ vec {j ^ \ prime} 2 j’, Then add the two vectors to get OA ‘⃗\vec{OA^\prime}OA’. Combined with the matrix and vector deduction in the above section, the following form can be obtained:


[ c o s 4 5 o s i n 4 5 o s i n 4 5 o c o s 4 5 o ] [ 3 2 ] \begin{bmatrix} cos45^o & -sin45^o \\ sin45^o & cos45^o \end{bmatrix} \begin{bmatrix} 3 \\ 2 \end{bmatrix}

We find that the matrix on the left is not the rotation matrix we saw at the beginning? Except this is a rotation matrix in two dimensions:


[ c o s Theta. s i n Theta. s i n Theta. c o s Theta. ] \begin{bmatrix} cos\theta & -sin\theta \\ sin\theta & cos\theta \end{bmatrix}

Combining graphs and calculations, we can understand this two-dimensional matrix as follows: a two-dimensional matrix represents two basis vectors in a coordinate system, and vectors composed of points and origins in this coordinate system can be represented by the transformation of these two basis vectors. So if you rotate a point, you can translate that to rotate the coordinate system that the point is in, and you can figure out the position of the rotated point from the basis vectors that change.

In fact, this transformation is called a linear transformation in mathematics, linear transformation is achieved by matrix multiplication

Linear transformation: A special mapping between two vector Spaces (including abstract vector Spaces made up of functions) that preserves vector addition and scalar multiplication

Linear transformations have the following characteristics in geometric intuition:

  • Before and after the transformation, the line remains straight

  • The origin stays fixed before and after the transformation

We went from two dimensions to three dimensions and we did the same thing, but we added the Z-axis


[ 1 0 0 0 1 0 0 0 1 ] \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \\

If you look at this matrix vertically, it’s the basis vectors on the x, y, and z axes, and it’s also a element matrix.

In the same way, the rotation matrix of three-dimensional coordinates around z-axis is:


[ c o s Theta. s i n Theta. 0 s i n Theta. c o s Theta. 0 0 0 1 ] \begin{bmatrix} cos\theta&-sin\theta&0\\ sin\theta&cos\theta&0\\ 0&0&1 \end{bmatrix}

Derivation of translation matrix

Now, can we translate this matrix times a vector as well? Let’s go back to the two-dimensional plane again, and see what happens when we shift A to B.

To move A (3, 2) to B (4,5) is essentially to move A 1 to the right and then 3 up, so that the x position increases by 1 and the y position increases by 3


[ x + 1 y + 3 ] = [ x y ] + [ 1 3 ] = [ 3 + 1 2 + 3 ] = [ 4 5 ] \begin{bmatrix} x + 1 \\ y + 3 \end{bmatrix} = \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} 1 \\ 3 \end{bmatrix} = \begin{bmatrix} 3 + 1 \\ 2 + 3 \end{bmatrix} = \begin{bmatrix} 4 \\ 5 \end{bmatrix}

From the above operation, translation operation is actually vector addition, namely:


O A + O C = O B \vec{OA} + \vec{OC} = \vec{OB}

[ 3 2 ] + [ 1 3 ] = [ 4 5 ] \begin{bmatrix} 3 \\ 2 \end{bmatrix} + \begin{bmatrix} 1 \\ 3 \end{bmatrix} = \begin{bmatrix} 4 \\ 5 \end{bmatrix}

We can further understand this by using the parallelogram rule for vector addition, as shown in the following figure:

We can’t just do translation by matrix multiplication.

In fact, translation is an affine transformation.

Affine transformation, also known as affine mapping, is a linear transformation of a vector space in geometry followed by a translation to transform it into another vector space.

In terms of geometric intuition, affine transformation, compared with linear transformation, does not need to keep the coordinate origin unchanged before and after transformation.

So here we go from point A to point B, and let’s think about it in A different way. Instead of moving the point, we move the entire axis, and we can also move the point from point A to point B, but we move the origin to point O’ (1,3).

What we want to construct is an equation for matrix multiplication like the one below, so that we can use a general computational model to deal with coordinate point transformations.


[ matrix ] [ 3 2 ] = [ 4 5 ] \begin{bmatrix} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} \end{bmatrix} 3\\2 \end{bmatrix} = \begin{bmatrix} 4\\5 \end{bmatrix}

At this point, we’re finally going to solve for homogeneous coordinates.

Homogeneous coordinates are the representation of an n-dimensional vector by an N +1 dimensional vector, a coordinate system used in projection geometry, just as cartesian coordinates are used in Euclidean geometry.

To put it more colloquially, we need higher dimensions to deal with this problem.

By adding a dimension, we can handle the lower-dimensional affine transformation by linear transformation in the higher dimension.

It sounds philosophical at first, but here’s a mathematical equation that explains why:


[ 1 0 1 0 1 3 0 0 1 ] [ 3 2 1 ] = [ 1 x 3 + 0 x 2 + 1 x 1 0 x 3 + 1 x 2 + 3 x 1 0 x 3 + 0 x 2 + 1 x 1 ] = [ 4 5 1 ] \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 3 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \times 3 + 0 \times 2 + 1 \times 1 \\ 0 \times 3 + 1 \times 2 + 3 \times 1 \\ 0 \times 3 + 0 \times 2 + 1 \times 1 \end{bmatrix} = \begin{bmatrix} 4 \\ 5 \\ 1 \end{bmatrix}

And if you look at the operation above, Results: [451]\begin{bmatrix}4\\ 5\ 1 \end{bmatrix}⎣⎢, “451 \\ ⎥⎤”, “begin{bmatrix}4\\ 5\ end{bmatrix}[45].” It’s just an extra z coordinate.

By upgrading one dimension, we transform the translation problem in two dimensions into matrix and vector multiplication in three dimensions. So in two dimensions, the translation matrix is:


[ 1 0 t x 0 1 t y 0 0 1 ] \begin{bmatrix} 1 & 0 & tx \\ 0 & 1 & ty \\ 0 & 0 & 1 \end{bmatrix}

Tx and TY correspond to the distance traveled on the x and y axes.

The same applies to three-dimensional coordinate systems. In order to realize the translation operation of three-dimensional coordinates, it is also necessary to introduce homogeneous coordinates through dimensional raising. Then the translation matrix in three-dimensional coordinates is:


[ 1 0 0 t x 0 1 0 t y 0 0 1 t z 0 0 0 1 ] \begin{bmatrix} 1 & 0 & 0 & tx \\ 0 & 1 & 0 & ty \\ 0 & 0 & 1 & tz \\ 0 & 0 & 0 & 1 \end{bmatrix}

conclusion

So now we can answer the original question, why is the matrix transformation in OpenGL a 4×4 matrix?

Let’s imagine a scenario: if I want to rotate the vertex coordinates by a certain Angle and then shift them by a certain distance, then the operations involved in the calculation of the 3×3 matrix and the calculation of the 4×4 matrix, if not unified, this continuous transformation operation will be very complicated.

Therefore, if you want to use matrix multiplication to unify all translation, rotation and other transformation calculations, the unified use of 4×4 matrix calculation can not only meet the scene and convenient calculation.