The core of DIBR is to first project the reference image into 3d Euclidean space using depth information, and then project the 3d space points onto the imaging plane of the virtual camera. It is called 3D Image Warping technique in computer graphics.

 

Pixel coordinates and image coordinates

As shown in Figure 1, the position of three-dimensional space points on the imaging plane can be represented by pixel coordinate system and image coordinate system.

In the rectangular integer pixel coordinate system U-V, the coordinates (u,v)T of each pixel respectively represent the number of columns and rows of the pixel in the two-dimensional image array. Since this coordinate system does not represent the actual position of pixels in the image in physical units, it is necessary to establish the image coordinate system X-Y represented in physical units. The coordinate system takes a point M in the graph as the origin, and the x and y axes are parallel to u and V respectively. If the coordinates of M in the u and V coordinate systems are (mu,mv), the physical dimensions of each pixel along the x and y axes are dx, dy. Then the coordinates of any pixel in the image in two coordinate systems have the following relationship:

(1)

Expressed in homogeneous coordinates as:

(2)

Pinhole camera model

We consider the central projection of a space point onto a plane. Let the projection center be at the origin of a Euclidean coordinate system, and the plane Z = f is called the image plane or the focusing plane. In the pinhole camera model, the point X with spatial coordinates X = (X, Y, Z)T is mapped to a point on the image plane, which is the intersection of the line connecting point X with the projection center and the image plane. Figure 3 — 7 illustrates this, from similar triangles, it is quickly calculated that points (X, Y, Z)T are mapped to points (fX/Z, fY /Z, f)T on the image plane. Therefore, the central projection from world coordinates to image coordinates is:

(3)

This is a mapping from three dimensional Euclidean space to two dimensional Euclidean space. The projection center is called the camera center, also known as the optical center. The vertical line from the camera’s center to the image plane is called the camera’s principal axis or principal ray. The intersection of the main axis and the image plane is called the main point, and the plane passing through the camera center parallel to the image plane is called the main plane of the camera. If the world and image points are represented by a secondary vector, then the central projection can be expressed very simply as a linear mapping between homogeneous coordinates.

FIG. 2 Schematic diagram of pinhole camera model

Equation 3 can be written as the following matrix:

(4)

The expression of matrix can be written as the diag (f, f, 1) [I | 0], the diag (f, f, 1) is a diagonal matrix, and [I | 0] said partitioned matrix into a 3 x 3 identity matrix and a zero vector.

We now introduce the following notation: the world point X is represented by a 4-dimensional homogeneous vector (X,Y,Z,1)T; The image point X is represented as a 3-dimensional homogeneous vector. P represents the 3 × 4 homogeneous camera projection matrix, and the pattern 4 can be written as:

(5)

It defines the camera matrix of the pinhole model of central projection as:

(6)

 

Figure 3 shows the camera coordinate system and the image coordinate system.

FIG. 3 Camera coordinate system and image coordinate system

Equation 3 assumes that the coordinate origin of the image plane is at the main point, which may not be the case, as shown in FIG. 3. At this point, the mapping of the central projection can be expressed as:

(7)

Where (px, py)T is the coordinate of the main point, and the equation can be expressed in homogeneous coordinates as follows:

(8)

If remember

(9)

Then equation 7 has a succinct form:

(10)

Matrix K is called camera calibration matrix. In Equation (3-10), we write (X,Y,Z,1)T as Xcam to emphasize that the camera is set at the origin of a European coordinate system and the main axis points along the Z-axis, while point Xcam is represented in this coordinate system. Such coordinate system can be called camera coordinate system.

But sometimes the pixels of some CCD cameras may not be square. If image coordinates are measured in pixels, then non-equal scale factors need to be introduced in each direction. Specifically, if the number of pixels per unit distance of the image coordinates in the X and y directions is MX and my respectively, then the change from the world coordinates to the pixel coordinates is obtained by multiplying equation (3 — 9) by an additional factor diag(mx, my, 1). Therefore, the general form of this kind of camera calibration matrix is:

(11)

Where Ax = FMX and ay= fmy convert the focal length of the camera into the dimension of pixels in the X and Y directions respectively. Similarly, x 0= (u0, v0)T is the principal point in the dimension of pixels, and its coordinates are u0= MXPX and v0= mypy.

 

Camera parameters

Usually space points are represented by different Euclidean coordinate systems, called world coordinate systems. The world and camera coordinates are related by rotation and translation, as shown in Figure 4.

FIG. 4 Euclidean representation between the world and camera coordinates

The homogeneous vector represents the coordinates of a certain point in the world coordinate system, and Xcam is the same point represented by the camera coordinate system. Then we can write Xcam= R(X-C), where C represents the coordinates of the camera center in the world coordinate system, and R is a 3 × 3 rotation matrix, representing the orientation of the camera coordinate system. This equation can be written in homogeneous coordinates as:

(13)

Combine this with Equation 10 to form the formula:

(14)

For convenience, usually the camera center is not clearly marked, but the transformation from the world coordinate system to the image coordinate system is expressed as Xcam= RX + t. In this representation, the camera matrix is simplified as:

(15)

It is not hard to see from Equation 14 that t = RC.

 

3D Image Warping equation

As shown in FIG. 5, let the homogeneous vector of a certain space in the world coordinate system be expressed as Pw=(Xw, Yw, Zw, 1)T, and the pixel coordinates of the point projected on the reference view image plane and virtual view image plane are P1 =(U1, v1, 1)T and P2 =(u2, v2, 1)T respectively. For the camera coordinate system of reference view point image and virtual view point image, the rotation matrix and shift matrix are denoted as R1, R2 and T1 = R1C1, t2= R2C2 respectively.

(16)

(17)

 

FIG. 5 Virtual viewpoint rendering based on 3D Image Warping equation

K1 and K2 represent the internal parameter matrices of the reference camera and the virtual camera respectively, while λ1 and λ2 correspond to the homogeneous scale factor of the camera respectively. According to Formula 16, the coordinates of space point Pw in three-dimensional Euclidean space can be expressed as:

(18)

Finally, by substituting Equation 18 into Equation 17, we can obtain the pixel coordinate P2 of point Pw on the virtual view image:

(19)

 

 

Is the camera zoom coefficient, which is usually the depth value.

 

References:

Research on Virtual Viewpoint Synthesis Technology of Freeview STEREO TV System

 

rows = 768 ;
cols = 1024;
numf = 1;
Color = 'Color.yuv';
Depth = 'Depth.yuv';

%Stereoscopic view generation
[L R] = cd2lr(Color,Depth,rows, cols, numf);
figure, imshow(yuv2rgb(L{1}.luma,L{1}.chroma1,L{1}.chroma2));
figure, imshow(yuv2rgb(R{1}.luma,R{1}.chroma1,R{1}.chroma2));

%Stereoscopic view generation only luma components
[L R] = cd2lrluma(Color,Depth,rows, cols, numf);
figure, imshow(uint8(L{1}.luma));
figure, imshow(uint8(R{1}.luma));
Copy the code