Three dimensional reconstruction of binocular vision — Thinking of hierarchical reconstruction
FesianXu 2020.7.22 at ANT FINANCIAL intern

preface

This paper is a note from the author’s reading of chapter 10 [1]. From a macroscopic point of view, this paper describes several hierarchical methods of binocular 3D reconstruction, including projection reconstruction, affine reconstruction and similarity reconstruction to the last Euclidean reconstruction. As an introductory article, this article only provides the ideas of these methods without too many details, which will be continued in a future blog post. If there are fallacies, please contact the author pointed out, reproduced please indicate the source.

∇ nabla∇

e-mail: [email protected]

QQ: 973926198

github: github.com/FesianXu

Zhihu column: Computer Vision/Computer Graphics theory and Application

Wechat official account:


Note: before reading this article, readers are strongly recommended to read the [4] and [5], in order to understand [the level of the geometric transformation, projection transformation, affine transformation, of transformation and Euclidean transformation measures the difference between] and [conic taper thread and quadric definition and application of the quadratic curve cone], in this paper will be based on prior knowledge to discuss these. At the same time, as the basic knowledge of imaging, the knowledge of internal and external parameters of the camera [6] must also be understood.

Introduction to binocular 3D reconstruction

As a 3D reconstruction, we want to get the structure of the reconstructed object (i.e. the location of the points in the 3d world). In binocular 3D reconstruction, just as the “binocular” said, we assume that there are two cameras observing an object and obtain a picture describing the same object from multiple perspectives. In general, in 3d reconstruction tasks, we can obtain the corresponding point relations between these multi-view images through a series of pre-processing algorithms. As shown in Fig. 1, point pl\mathbf{p}_lpl and point pr\mathbf{p}_rpr are both projections of point P\mathbf{p} p, the object of the 3d world. At the same time, it can be put into two pieces at the same time. At the same time, it can be put into two pieces at the same time. At the same time, it can be put into two pieces at the same time. In the author’s previous blog [2,3], the author has introduced the influence of polar line constraints and image correction on corresponding points, and readers may refer to it by themselves if they are interested.

Fig.1. The corresponding points between binocular multi-perspective images are all projections of object point P in the THREE-DIMENSIONAL world.

Generally speaking, in 3d reconstruction tasks, we assume that the position of the 3d point P\mathbf{P}P corresponding to the corresponding point is unknown and needs to be solved, and we also do not know the camera direction, position and internal parameter calibration matrix, etc. (i.e., external and internal parameters are unknown). The whole reconstruction task is to compute the camera matrix P\mathrm{P}P and P ‘\mathrm{P}^{\prime}P’ so that for the 3d point Xi\mathbf{X}_iXi:


x i = P X i . x i = P X i i (1) \mathbf{x}_i = \mathrm{P} \mathbf{X}_i, \mathbf{x}_i^{\prime} = \mathrm{P}^{\prime} \mathbf{X}_i \\ \forall i \tag{1}

Iii represents the number of the corresponding point. When there are too few corresponding points given, this task obviously cannot be completed. However, if there are enough corresponding points given, there are enough constraints to uniquely determine a Fundamental Matrix [2], as shown in Equation (2) (Of course, solving this Fundamental Matrix is not so simple. This is not covered in this article). At this point, the entire 3D reconstruction will be subject to a phenomenon called projective ambiguity, which will be described below. As a camera without calibration, this is the best result that can be obtained by binocular 3D reconstruction without any prior knowledge. After adding other prior knowledge of the scene (such as parallel line constraints, vertical line constraints, the assumption of the same parameters in the camera, etc.), projection ambiguity can be reduced to affine ambiguity or even similar ambiguity.


x T F x = 0 (2) \mathbf{x}^{\mathrm{T}} \mathcal{F} \mathbf{x}^{\prime} = 0 \tag{2}

Overall, the trilogy of reconstruction processes can be divided into:

  1. The basic matrix is calculated according to the corresponding points.
  2. The camera matrix P,P ‘\ Mathrm {P},\ Mathrm {P}^{\prime}P,P’ is obtained from the basic matrix calculation.
  3. For each pair of corresponding point XI ↔xi ‘\mathbf{x}_i \leftrightarrow \mathbf{x} _I ^{\prime}xi↔ XI’, by means of triangulation, calculate its three-dimensional point coordinate position in space.

Notice of these reconstruction method introduced in this paper is a conceptual approach, readers need to clarify is, don’t try to achieve this paper simply introduces the method to realize the reconstruction, for real scene images, reconstruction process there are all kinds of noise (such as corresponding points for may not be accurate, such as noise, need robust estimation). These specific methods will be covered in a future blog post.

Triangulation as shown in Fig. 2, after the camera matrix P,P ‘\mathrm{P},\ Mathrm {P}^{\prime}P,P’ was calculated, the camera matrix P,P ‘\mathrm{P}^{\prime}P,P’ was obtained. We know that x and x ‘\ mathbf {x}, \ mathbf {x} ^ {\ prime} x, x’ meet to constraint xTFx ‘= 0 \ mathbf {x} ^ {\ mathrm {T}} \ mathcal {} F \ mathbf {x} ^ = {\ prime} 0xTFx ‘=0, in other words, we know that x\mathbf{x}x is on the polar line Fx’ \mathcal{F}\mathbf{x}^{\prime}Fx ‘, This in turn means that the rays from the inverse projection of the image point x,x ‘\mathbf{x},\mathbf{x}^{\prime}x,x’ are coplanar, so that their inverse rays will intersect at a point x \mathbf{x} x. By means of triangulation, we can measure any point in 3d space except the baseline. The reason is that the reverse rays at the baseline are collinear, so the intersection cannot be uniquely determined.

Figure 2. Determine points in three-dimensional space by triangulation.

Reconstructing ambiguity

The reconstruction process is more or less ambiguous, and in order to solve the ambiguity, we need to introduce additional information.

There is certain ambiguity in 3d reconstruction of objects simply from corresponding points, but this ambiguity will be alleviated after certain prior knowledge of the scene is introduced. For example, light from pairs of corresponding points to (and perhaps even more point of view), cannot calculate the scene of the absolute position (latitude and longitude) refers to the earth and absolute orientation, it is easy to understand, such as shown in Fig. 3, even given the camera parameters, such as it is impossible to decide b and c the two east-west corridor of the concrete, Whether it’s north or south, or whether it’s latitude or longitude, these are absolute information about geographical location that can’t be reconstructed with a camera.

Fig.3. Without introducing any other knowledge, the absolute geographic information of the scene cannot be judged by the photos obtained only from the camera.

In general, camera-based reconstruction, which we call the best case, for the world coordinate system, is only the Euclidean transformation (including the rotation and the offset). Of course, if our camera is not calibrated, that is, the internal parameters are unknown, take the corridor in Fig 3 as an example, we cannot determine the width and length of the corridor. It may be 3 meters, or it may be just a toy corridor, only 30 centimeters, all of which are possible. Without introducing any prior to the scale of the scene and without camera calibration, we say that the photo-based scene reconstruction can only be best to the similarity transformation (that is, there is rotation, offset and scale scaling).

To explain the phenomenon mathematically, Xi\mathbf{X}_iXi represents the three-dimensional points in a series of scenes,P,P ‘\mathrm{P},\ Mathrm {P}^{\prime}P,P’ represents a pair of cameras, It projects the three-dimensional point to xi,xi ‘\mathbf{x}_i, \mathbf{x} _I ^{\prime}xi,xi’. Suppose we have the similarity transformation HS\ Mathrm {H}_SHS


H S = [ R t 0 T Lambda. ] (3) \mathrm{H}_S = \left[\begin{matrix} \mathrm{R} & \mathbf{t} \\ \mathbf{0}^{\mathrm{T}} & \lambda \end{matrix}\right] \tag{3}

Where R\mathrm{R}R is rotation matrix, t\mathbf{t}t is offset, λ−1\lambda^{-1}λ−1 is scale scaling. Suppose we perform a similarity transformation on three dimensional points, then we replace Xi\mathbf{X}_iXi with HSXi\mathrm{H}_S\mathbf{X}_iHSXi, And with PHS – 1 \ mathrm {P} \ mathrm {H} _S ^ {1} PHS – 1 and P ‘HS – 1 \ mathrm ^ {P} {\ prime} \ mathrm {H} _S ^ {1}’ P HS – 1 instead of camera parameters P, P ‘\ mathrm {P}, \ mathrm ^ {P} {\ prime} P, P’. And what we found is, Because of PXi = (PHS – 1) (HSXi) \ mathrm {P} \ mathbf {X} _i = (\ mathrm {P} \ mathrm {H} _S ^ {1}) (\ mathrm {H} _S \ mathbf {X} _i) PXi = (PHS – 1) (HSXi), So the position of the projection point on the image doesn’t change. Generally speaking, without the introduction of other priors, we will find that in the reconstruction process, the algorithm can only ensure that the position of the projection point on the image is correct, and can not guarantee other information in the three-dimensional space.

Similarity ambiguity

Further, to decompose camera parameters, we have a P = K \ [RP ∣ tP] mathrm = {P} \ mathrm {K} [\ mathrm {R} _P | \ mathbf {t} _P] P = K/RP ∣ tP, so after the similarity transformation, are:


P H S 1 = K [ R P R 1 t ] (4) \mathrm{P}\mathrm{H}_S^{-1} = \mathrm{K} [\mathrm{R}_P\mathrm{R}^{-1} | \mathbf{t}^{\prime}] \tag{4}

In general, we don’t care much about offset t ‘\mathbf{t}^{\prime}t’. It can be found that the similarity transformation does not change the internal parameters of the camera, K\ Mathrm {K}K will not change, that is to say, even for the calibrated camera, the best reconstruction result will have similarity ambiguity. This is called Similarity reconstruction (Metric reconstruction). As shown in Fig. 4, this is a schematic diagram of similarity ambiguity, and we cannot determine the absolute size of the scene.

Fig.4 similarity ambiguity and projection ambiguity of reconstruction

If the focal lengths in the internal parameters of both cameras are known, then the reconstruction is ideally similar to similarity reconstruction, which introduces similarity ambiguity. In other words, the reconstructed scene will only differ in size from the real scene.

Projection ambiguity

If we are ignorant of the internal parameters and the relative position relationship between cameras, the reconstruction of the entire scene will be subject to projective ambiguity, as shown in Figure B of Fig. 4. Similarly, we can assume an irreversible matrix HP∈R4×4\ Mathrm {H}_P \in \mathbb{R}^{4 \times 4}HP∈R4×4 as a projection matrix. Using the method we introduced earlier, we can find that when the projection matrix is applied to both 3d points and the camera, Does not affect the position of projection points on the image. Therefore, the actual reconstruction of 3d points will be projective ambiguity. This is called Projective reconstruction. The difference between projection reconstruction and similar reconstruction is that the focus position of the camera is fixed because the internal parameters of the camera are known, while the focus position may change because there is no calibration machine parameter. Fig. 4 illustrates this point.

Affine ambiguity

If the two cameras only have changes in position offset, but the internal parameters are completely the same (it can be regarded as the photos taken by the same camera at different positions), then the reconstruction process is best to achieve the degree of Affine reconstruction, and corresponding, This reconstruction process introduces affine ambiguity. For example, if we know that the focal length of different cameras is the same (the focal length is part of the internal parameter), then the whole scene may have rotation, offset and scale scaling or Shear [7]. However, there will not be as serious ambiguity as the projection ambiguity shown in Fig. 4.

Metric reconstruction and Euclidean reconstruction

Generally speaking, metric reconstruction can also be used to express the similarity reconstruction, because some parameters in the similarity reconstruction process, such as the Angle of line to line and the proportion of line segment, should be consistent between the reconstructed scene and the real scene. In addition, when we talk about Euclidean reconstruction, we generally use it as a synonym for metric reconstruction or similarity reconstruction, because there is no additional knowledge, such as the orientation of the world coordinates of the scene, the depth of field or even the longitude and latitude of the world coordinates. We can’t really do a Euclidean reconstruction, and this extra knowledge is taken away from camera-based reconstruction.

Ideal point, infinitely far plane and IAC

We have mentioned The Ideal point, The plane at infinity and IAC (Image of Absolute Conic) in blog [4,7] [8]. These geometric elements are used to describe some properties in the projection space, including invariance before and after transformation. Here is a knowledge review and further introduction.

To put it simply, in the process of projection transformation, the parallel nature of parallel lines will not be retained, so perspective may exist, as shown in Fig. 5. The vanishing point at infinity where the parallel lines meet is called an ideal point, and the plane made up of all ideal points is called an infinite far plane, denoted by the symbol π ∞\Pi_{\infty} ∞.

Fig.5. The originally parallel road becomes “non-parallel” when the camera takes an image, converging at the vanishing point at an infinite distance.

Before we talk about Absolute Conic (AC) and the projection of Absolute Conic (IAC), we can take an example from everyday life, we all know that the moon is far away from us, Think of it as a circle at infinity (a special type of conic). Imagine driving down a straight road with the full moon high on your right. You will notice that no matter how fast you drive, the moon seems to follow you in the same position and size.

This romantic example of “the moon follows me” is the effect of Euclidean transformations on absolute conic lines on a plane at infinity. The absolute conic curve AC refers to the conic curve (conic surface) on the infinitely far plane, while the projection of the absolute conic curve IAC refers to the projection of the absolute conic curve on the imaging plane, as shown in Fig. 6. Usually we use Omega \Omega ω for IAC and ω\ Omega ω for AC. CCC stands for focus. When the distance is far enough, can be regarded as infinity, we’ll find intuitive, as long as the imaging plane is Euclid transform, then the AC will not change, the reason is very simple, because any rotation and translation for the infinity, are too small, so can be regarded as no European transformation can affect these properties. Therefore, we know that Euclidean transformation does not affect the size of IAC.

Fig.6. AC in the infinitely far plane and IAC in the imaging plane.

Why are we discussing these concepts here? The reason is that both geometric transformation, the 3 d reconstruction, involve the quantity of “change” and “constant”, when the need for some changes in the amount of constraints to the same quantity, we need to add conditions, such as hold infinity plane position, fixed AC shape, size, etc., which can reduce the ambiguity, to achieve more accurate 3 d reconstruction.

Hierarchical 3D reconstruction

By adding a series of information on the basis of projection reconstruction, we can get affine reconstruction and similarity reconstruction of the scene respectively.

We consider the hierarchical process of binocular 3D reconstruction. Firstly, as a basis, assuming that our camera is not calibrated, we need to obtain the projection reconstruction of the scene based on the corresponding points between the two images, and then add some information to obtain affine reconstruction and similarity reconstruction of the scene respectively.

For an uncalibrated camera, suppose it is given xi↔xi ‘\mathbf{x}_i \leftrightarrow \mathbf{x} _I ^{\prime}xi↔xi’, From equation (2) we can calculate the basic matrix F\mathcal{F}F. Through the previous analysis, we know that the projection matrix HP\ Mathrm {H}_PHP can make the scene reconstruction have projection ambiguity, as shown in Fig.5.

It is important to note that the corresponding point pairs mentioned here cannot be at the focal line of the two cameras (the baseline), as we mentioned earlier.

Fig. 5. Projection ambiguity brought by projection reconstruction. For a single reconstruction, every possible scene of projection reconstruction is projection consistent.

Affine reconstruction

From projection reconstruction to affine reconstruction, recall what we discussed in [4] :

Affine transformations do not affect the position of the infinitely far plane

In other words, to disambiguate the projection, we need to add constraints that fix the position of the infinitely far plane. To formalize our process mathematically, suppose we now have a projection reconstruction of the scene, Including a triple (P, P ‘, {Xi}) (\ mathrm {P}, \ mathrm ^ {P} {\ prime}, \ {\ mathbf {X} _i \}) (P, P ‘{Xi}). Where P,P ‘\mathrm{P}, \ Mathrm {P}^{\prime}P,P’ is the camera matrix, Xi\mathbf{X}_iXi is the scene coordinate point set. Further, suppose that we determine the plane π\ PI π as the true infinitely far plane, then the plane will be represented by a vector in homogeneous coordinates with π∈R4\ PI \in \mathbb{R}^4π∈R4. We need to get this plane into (0,0,0,1) T (0,0,0,1) ^ {\ mathrm {T}} (0,0,0,1) T go, so you need to find a projection matrix. Map the PI \ PI PI to (0,0,0,1) T (0,0,0,1) ^ {\ mathrm {T}} (0,0,0,1) T, namely: H – 1 PI = (0,0,0,1) T \ mathrm {H} ^ {1} \ PI = (0,0,0,1) ^ {\ mathrm {T}} H – 1 PI = (0,0,0,1) T. There are:


H = [ I 0 PI. T ] (5) \mathrm{H} = \left[ \begin{matrix} \mathrm{I} | \mathbf{0} \\ \pi^{\mathrm{T}} \end{matrix} \right] \tag{5}

In this case, H\mathrm{H}H can be applied to all 3d reconstructed point sets and the two camera matrices. Note that equation (5) will fail in the case of πT=0\ PI ^{\mathrm{T}} = \mathbf{0}πT=0. By finding this projection matrix, we have an affine reconstruction.

However, as we said, the infinitely far plane π\ PI π cannot be determined unless we add some additional information, and we give a few examples of what type of information is sufficient to determine the plane.

Migration movement

The Translational motion is when we know that two photos were taken from the same camera but from different angles, And this Angle change is only due to the tT\mathbf{t}^{\ Mathrm {t}}tT of the camera matrix, that is, the offset. To put it simply, as shown in Fig.6, the yellow plane is the camera with only translation, while the gray plane has translation and rotation.

Fig.6. The green object is the imaging object, the yellow plane is the camera with only offset changes, and the gray plane has offset and rotation.

Recall that the “moon follows me” example, for an object that is infinitely far away, the mere translation does not affect the object’s position (because the translation distance is so insignificant compared to the distance between them). Therefore, the camera’s translation does not affect the position of the point in the plane of infinity in the two pictures. Let’s use X\ Mathrm {X}X to represent this point in the plane of infinity. As shown in Fig. 7, the position of opposite poles of the two images is consistent in the camera coordinate system of the image, because the translation of the camera is insignificant for the depth of the corridor, so it can be regarded as the infinitely far plane, so it is unchanged for the poles. We can know that the three-dimensional space points that are projected to form these invariant points are at an infinite distance. We can find more than three groups of such infinite distance points by matching points and pixel positions of the picture (screening by two conditions: both matching points and invariant points on the picture). And the plane π\ PI π at infinity can be found by least square or analytic method.

Fig. 7. Pure translational motion does not change the position of opposite poles at a distance (which can be regarded as an infinite distance).

Although this is feasible in principle, in fact, there is a large numerical problem in the calculation process. In fact, the skew-symmetric matrix we calculated in this way is an anti-symmetric matrix, which means that we also need to constrain the basic matrix. In fact, the most common constraint in practice is the parallel constraint discussed below.

Parallel constraint

We know affine transformation doesn’t change the parallelism parallel lines, while the projection transformation is likely to change, then based on this knowledge, we can start looking for more than three groups should be in the scene in the actual three-dimensional space in parallel, but because the perspective projection phenomenon about the intersection of parallel lines in the image, will their intersection as the infinite point, The plane π\ PI π at infinity can therefore be determined.

Fig.8. Find three groups of parallel lines in the scene that should be parallel in the actual three-dimensional space, but intersect in the image due to the existence of perspective. Their intersection point is the point at the infinite distance, so the plane at the infinite distance can be determined.

This process sounds ideal, but because of the noise, multiple groups of different parallel lines do not necessarily intersect at the same point in the picture. In this case, we need robust estimation to solve numerical problems, which we will discuss later.

If we observe Fig.8 carefully, we will find that although affine reconstruction retains parallelism, it does not guarantee orthogonality. Therefore, the image below right of Figure (b) is not guaranteed, and we can only get the image below left of figure (b) in general. In order to ensure the orthogonality of the scene, we will consider it in metric reconstruction.

Line segment ratio

As we know, affine transformation does not change the proportion between the line segments before and after transformation. See the detailed description in [4]. This provides another way for us to calculate the vanishing point at an infinite distance: we can determine the position of the vanishing point by introducing the proportional length of a line segment on a line in the real three-dimensional world, as shown in Fig.9 and Fig.10. The detailed calculation process will be described in a later blog post.

Fig.9. The position of vanishing point was calculated according to the proportion of line segments in the real world.

Fig. 10. Projection ambiguity is eliminated by introducing prior knowledge of line segment proportion.

Infinite homography matrix

Once The plane at infinity has been determined, affine reconstruction has been determined, and then we have a special matrix called The Infinite homography. This matrix is responsible for mapping the vanishing points of the two cameras’ images to infinity, and is a 2D homography matrix. Suppose the vanishing point x\mathbf{x}x of the picture taken by the camera P\ Mathrm {P}P corresponds to the object point x\mathbf{x}x in the plane at infinity, Then assume that the object point on another camera P ‘\mathrm{P}^{\prime}P’ corresponds to the projection of the picture taken as X ‘\mathbf{x}^{\prime}x’. Then the existence of this infinite homography matrix has the following properties:


x = H up x (6) \mathbf{x}^{\prime} = \mathrm{H}_{\infty} \mathbf{x} \tag{6}

Suppose we know two camera matrices


P = [ M m ] P = [ M m ] (7) \begin{aligned} \mathrm{P} &= [\mathrm{M} | \mathbf{m}] \\ \mathrm{P}^{\prime} &= [\mathrm{M}^{\prime} | \mathbf{m}^{\prime}] \\ \end{aligned} \tag{7}

They are the camera matrices conforming to affine reconstruction, then we have the infinite homography matrix H∞=M ‘M−1\ Mathrm {H}_{infty} = \mathrm{M}^{\prime} \ Mathrm {M}^{-1}H∞=M’ M−1. This is not hard to prove, I’ll leave it to the reader. Combined with its (6), we have:


x = M M 1 x (8) \mathbf{x}^{\prime} = \mathrm{M}^{\prime} \mathrm{M}^{-1} \mathbf{x} \tag{8}

In other words, we can calculate the infinite homography matrix H∞\ Mathrm {H}_{infty}H∞ by finding the corresponding vanishing point pairs of two images. We can then standardize one of the camera matrix, so (7) is changed as:


P = [ I 0 ] P = [ M e ] (8) \begin{aligned} \mathrm{P} &= [\mathrm{I} | \mathbf{0}] \\ \mathrm{P}^{\prime} &= [\mathrm{M}^{\prime} | \mathbf{e}^{\prime}] \\ \end{aligned} \tag{8}

Now H∞=H ‘\ Mathrm {H}_{\infty} = \ Mathrm {H}^{\prime}H∞=H’, that is to say, by calculating the infinite homography matrix, we can recover the camera matrix reconstructed from affine.

One of the cameras is an affine camera

Suppose we confirm that one of the two cameras is an affine camera [10]. Of course, affine camera is only an approximation of projection camera. The basic assumption of the approximation is that the surface texture depth of the photographed object is negligible for the shooting distance, which is the low-depth texture mentioned in [11], low-relief. We know that the affine camera performs affine transformation, so it will not move the position of the plane at infinity. Moreover, we know that the principle plane of the affine camera is the plane at infinity, and it can be represented by the third row vector of the camera matrix. Suppose the simplest case, we put the affine matrix P \ mathrm camera camera {P} standard into P = P \ [I ∣ 0] mathrm = {P} [\ mathrm {I} | \ mathbf {0}] P = I ∣ [0], Third behavior (0,0,1,0) T (0,0,1,0) ^ {\ mathrm {T}} (0,0,1,0) T, therefore the infinity plane fixed to (0,0,0,1) T (0,0,0,1) ^ {\ mathrm {T}} (0,0,0,1) T, only need to:

  1. Simply swap the last two columns of the two camera matrices at the same time;
  2. Swap the last two coordinates of each 3d object point Xi\mathbf{X}_iXi simultaneously.

Similarity reconstruction/metric reconstruction

We know that in affine transformation, the position of the plane at infinity remains the same; We also know that in the similarity transformation/metric transformation, the position of the absolute conic AC on the plane at infinity remains unchanged before and after the transformation. Based on this, we remove affine ambiguity from affine reconstruction and improve the result to metric reconstruction based on how to fix the position of absolute cone AC. At the same time, because the absolute conic ω ∞\Omega_{\infty} ω ∞ lies on the plane far from infinity, the determination of the position of AC also means the determination of the position of the plane far from infinity.

Recall that in our post [5] we discussed mathematical expressions for conic and quadratic conic surfaces. In short, we know that describing a conic can be expressed as:


x T C x = 0 (9) \mathbf{x}^{\mathrm{T}} \mathbf{C} \mathbf{x} = 0 \tag{9}

Of the points on the curve of x = (x1, x2, x3) T \ mathbf {x} = (x_1, x_2, x_3) ^ {\ mathrm {T}} x = (x1, x2, x3) T, Conic line C∈R3×3\mathbf{C} \in \mathbb{R}^{3 \times 3}C∈R3×3 is:


C = [ a b / 2 d / 2 b / 2 c e / 2 d / 2 e / 2 f ] (10) \mathbf{C} = \left[\begin{matrix} a & b/2 & d/2 \\ b/2 & c & e/2 \\ d/2 & e/2 & f \end{matrix} \right] \tag{10}

In practice, the easiest way to do this is to consider IAC in one of the images. IAC is an imaging projection of AC on an image and is therefore a conic curve. The inverse projection of IAC according to the position of the focus forms a cone, as shown in Fig 6, which intersects the plane at infinity, forming AC. It is important to note that IAC is a property of the image itself, like points and lines in the image, and is not dependent on a particular reconstruction, so it is invariant to the transformation in the reconstruction.

Hypothesis known affine reconstruction, one camera matrix for P = [M ∣ M] \ mathrm = {P} [\ mathrm {M} | \ mathbf {M}] P = ∣ M [M], with the image of the camera imaging get IAC for omega, omega, omega, then, Metric reconstruction can be obtained from this affine reconstruction after some 3D transformation, and the transformation matrix is:


H = [ A 1 1 ] (11) \mathrm{H} = \left[\begin{matrix}\mathrm{A}^{-1} & \\ & 1 \end{matrix}\right]\tag{11}

Where A\ Mathrm {A}A can be obtained by Cholesky decomposition, We have the AAT = (MT omega M) – 1 \ mathrm {A} \ mathrm {A} ^ {\ mathrm {T}} = (\ mathrm ^ {M} {\ mathrm {T}} \ omega \ mathrm {M}) ^ = {1} AAT (MT omega M) – 1. Of course, the premise must ensure (MTωM)−1(\ Mathrm {M}^{\ Mathrm {T}}\omega\ Mathrm {M})^{-1}(MTωM)−1 is positive definite, otherwise Cholesky matrix decomposition cannot be carried out. The proof is concrete [1].

Thus, in order to upgrade from affine reconstruction to metric reconstruction, we need to know the expression of IAC. To achieve this expression, as in affine reconstruction, we need to introduce certain constraints from the scene.

Constraints derived from the orthogonality of the scene

Suppose we now have a pair of vanishing points v1\mathbf{v}_1v1 and v2\mathbf{v}_2v2. These vanishing points come from scenes in the image that should be orthogonal by themselves. For example, the three images in Fig.8 are all selected from the parts of the house scene that should be vertical by themselves. Find one pair of vanishing points, because the vanishing points are in the infinitely far plane, and the metric transformation does not affect orthogonality, so we have:


v 1 T Omega. v 2 = 0 (12) \mathbf{v}^{\mathrm{T}}_1\omega\mathbf{v}_2 = 0\tag{12}

From this we can calculate the expression of IAC.

Constraints are obtained from internal parameter calibration

Assuming that we know the internal parameters of the camera and that the internal parameter matrix is K\mathrm{K}K, we have ω=K−TK−1\omega = \ Mathrm {K}^{-\ Mathrm {T}}\ Mathrm {K}^{-1}ω=K−TK−1. We know from [6] :


K = [ Alpha. x s x 0 0 Alpha. y y 0 0 0 1 ] (13) \mathrm{K} = \left[\begin{matrix}\alpha_x & s & x_0 \\0 & \alpha_y & y_0 \\0 & 0 & 1\end{matrix}\right]\tag{13}

If there is s = 0 s = 0 s = 0, so there are 12 omega = 21 omega = 0 \ omega_ {12} = \ omega_ {21} = 0 12 omega = 21 omega = 0.

If there is alpha x = alpha y \ alpha_x = \ alpha_y alpha x = alpha y, there are 11 omega = 22 \ omega omega_ {11} = \ omega_ {and} omega 11 = 22 omega.


Reference

[1]. Hartley R, Zisserman A. Multiple view geometry in computer vision[M]. Cambridge university press, 2003. Chapter 10

[2]. Blog.csdn.net/LoseInVain/…

[3]. Blog.csdn.net/LoseInVain/…

[4]. Blog.csdn.net/LoseInVain/…

[5]. Blog.csdn.net/LoseInVain/…

[6]. Blog.csdn.net/LoseInVain/…

[7]. Blog.csdn.net/LoseInVain/…

[8]. www.cs.unc.edu/~marc/tutor.

[9]. Blog.csdn.net/richardzjut…

[10]. Blog.csdn.net/LoseInVain/…

[11]. Blog.csdn.net/LoseInVain/…