The identification and reconstruction of house type is a very important and challenging problem in decoration design. We propose a house type recognition algorithm based on differential rendering, which can accurately identify architectural elements, room types and sizes, and finally output an accurate 3D vector house type.

We use the segmentation and detection network based on deep learning to identify architectural elements, key point detection and clustering for scale recognition, and iterative optimization based on differential rendering for vectoring. This is the first vectoring method of house type identification based on segmented lattice diagram. Compared with the existing methods, our final calculation results have higher accuracy and generalization. It is very meaningful to realize the accurate identification of house pattern drawings. First of all, the accurate identification of house pattern drawings can significantly reduce the workload of online manual annotation and improve the efficiency and experience. Secondly, compared with the method of house type library construction with manual annotation, the method of house type identification with differential rendering can save the cost by more than 90%, making it possible to build a large-scale house type library offline. The construction of large-scale house model library realizes the accumulation of user data assets and provides data support for online house viewing, online design and online decoration. At present, the identification algorithm of house type drawing based on differential rendering has been used for online import identification of house type of flat designer, construction of d-end (designer end) house type library and construction of c-end (user end) standard house type library. The size of the d-end housing warehouse of the flat designer reaches 500W, and the coverage rate of the c-end standard housing warehouse of the benchmark city of Hangzhou reaches 70%+. The house type recognition algorithm based on differential rendering was accepted by CVPR2021.

background

House plan is usually generated using special drawing tools (such as AutoCAD, HomeStyler, etc.) to help people quickly identify the structure of the house in the process of buying or decorating. Designers in the process of making house plan operation objects are vectoring materials. To make it easier to circulate and save, vectorized material will eventually be transformed into bitmaps, the usual.jpg and.PNG images. This operation is often referred to as rasterization.

Designers in the decoration of the use of house plan, the need to raster house plan re vector, and then design. This vectoring process involves a lot of interaction, which is time-consuming and laborious. Designers need to describe the structure of walls, doors and Windows, and determine the type and size of the room. In order to solve this problem, some work began to study through automated methods to carry out the vectoring of house plan technology.

The methods of automatic vectoring of house type drawings are generally divided into two categories: one is to use the segmentation model to identify house type elements, the other is to use the detection model to detect the junction point of house type, and then the integer programming method to restore the topological connection. The advantage of house type recognition based on segmentation model is that it can obtain the pixel-by-pixel result of house type elements with high accuracy, but it cannot restore the vector structure of house type. The detection model-based method can reconstruct the vectoring structure of the house type, but the generalization performance is poor. The false detection of one connection point leads to the failure of the reconstruction of the full map. In addition, the detection method depends on the orthogonal assumption of the connection point, and cannot deal with the case of inclined wall.

In addition to house type elements such as room, wall, door, window and so on, there are also elements such as scale, room name, decoration layout and so on (as shown in Figure 1) in the house type drawing. These elements are also very important for the accurate reconstruction of the house type drawing, and the identification of these elements is not paid attention to in the existing methods.

Figure 1) House lattice diagram (top) and Vector Diagram (bottom)

In general, the existing methods ignore the rich multimodal information in the house plan and only pay attention to the most basic structure such as walls, doors and Windows. In the vectorization process of Windows and walls, some methods cannot deal with orthogonal elements, and other methods cannot complete the vectorization process. We hope to find a house type identification method that can accurately identify various elements of the house type (including walls, doors, Windows and other structures, room types, room sizes, room decoration layout elements, etc.) and can be converted into vector representation to solve the above problems.

methods

Figure 2) House type identification system framework integrating identification module and reconstruction module

Our system framework is shown in Figure 2. Given an input house type diagram, we first obtain the house type area by detection method. For the detected house type area, we will segment it element by element through the segmentation model, and further obtain accurate room type information through text detection and symbol detection. The scale module computes the end points and numbers of the scale line segments. The reconstruction module will integrate the above detected and segmented elements to carry out vectoring optimization based on differential rendering. Finally, an accurate 3D reconstruction result is obtained by integrating the scale module information. The specific technical methods used by each module are described in detail below.

detection

Under normal circumstances, the effective area of a house plan only accounts for 50% of the input image, most of the space may be occupied by the publicity copy. In order to extract the accurate house type information, we use the detection model to detect the house type area. We adopted a widely used lightweight detection model, YOLOv4, as our base network. The detected area was reduced to 512 * 512 resolution while maintaining the horizontal/vertical ratio for subsequent extraction of structural elements. Text messages can provide additional information about the type of room. Since the text information is basically the same across the different house plans (for example, the guest dining room, bedroom, kitchen, etc.), instead of using the usual OCR technique, we used a detection technique to detect the different room types. The decorative symbol test is similar to the test in this article, which can provide additional information about the type of room, for example, the toilet and sink sign can indicate that the current room is a toilet. We used YOLOv4 again, but instead of Mosaic, we used Mosaic, and the idea was to make it possible for the web to focus on every part of the decorator.

Extraction of structural elements

Structure element extraction is a process of image segmentation. We used DeepLav3+ as the base network, entered a 512 *512 resolution image, and output a 512 *512 * 15 feature. The 15 dimensions respectively represent the background, wall, door, window, doorway, guest dining room, bedroom, kitchen, bathroom, study, balcony, other rooms and the end of the door, window, doorway heat map. Different from the classic image segmentation algorithm, the loss function is specially designed. There are four loss functions in total, namely, cross entropy loss, neighbor field loss, endpoint regression loss and multi-task loss.

Scale calculation

The existing methods do not pay attention to the calculation of scale, but scale is very important for the three-dimensional reconstruction of house type, and any small error of scale will lead to huge difference in the area of house type. We systematically propose a scale recognition method, which is divided into four modules, namely line segment detection, number recognition, line segment and number matching and scale calculation.

Figure 3) Scale calculation framework

Line segment detection is obtained by detecting the end points of a line segment. Usually, the scale is distributed around the house plan, so we divide the endpoints into four categories, respectively, up, down, left and right. The design of the loss function is consistent with the regression loss in the extraction of structural elements. We use a distorted full convolutional network structure, using Resnet50 as the base module. In the digital recognition module, we do not use the traditional OCR, but directly use the detection model to detect the number region. The number region is divided into three categories, namely, the normal number region, the clockwise 90 degree number region and the counterclockwise 90 degree rotation region. After the number area is detected, we use a number recognition module to identify each specific number, and use a number quality regression module for quality check, and finally get the accurate scale number. We formalize it into a bipartite graph matching problem, which is solved by KM algorithm class. After matching, there are more matching pairs of numbers and line segments, and each matching pair can be calculated to obtain a scale result. By means of K-means, we choose the scale with the highest quality as the final result.

Vector quantization

In the module of structural element extraction, house type elements can be divided into two categories. One is room boundary elements, including wall, door, window, doorway, etc. One is room type elements, including guest restaurants, bathrooms, and so on. The core idea of vectoring is to get the room outline (line wall inside the room) information from the category elements of each room, and then get the line wall information in the room from the room outline and the room boundary elements. The overall process framework is shown in the figure below:

Figure 4) Vectorized process framework

Room contour optimization

In most of the houses we see, the walls are horizontal or vertical for architectural and aesthetic reasons. We assume that each room can be represented by a polygon. We first used the Douglas-Peucker algorithm to get an initial room polygon data, and then optimized the vertex coordinates of the polygon to get an accurate contour representation. We use two steps of room contour vertex optimization and room contour vertex reduction to iterate.

Room vertex coordinate optimization

We optimize the vertex coordinates by optimizing the following objective function:

The vertices of the room outline are reduced

We will reduce the vertices of the polygon when the following two conditions are met.

Line optimization

FIG. 5) Interior drawing wall (left) and center drawing wall (right)

In general, there are two ways to express the house type. One is to draw the wall along the edge of the room, and the solution in the optimization part of the room outline is the inner line of the room. Another way is to draw a wall with a center line, where each line represents a wall and has thickness information. The door model that two kinds of method express can undertake mutual conversion. Let’s use a similar method to solve the midline drawing method. We used two steps of wall join point coordinate optimization and wall join point number reduction to iterate.

Coordinate optimization of wall connection points

We optimized the coordinates of the wall connection points by optimizing the following objectives:

Reduced number of wall connections

We will reduce the wall connections when the following two conditions are met.

The experiment

quantitativeEvaluation ofmeasuring

Table 1) Quantitative evaluation compared with R2V and DFPR methods

We compared our segmentation Recognition method with the existing methods of RAster-to-vector [1] and Deep Floor Plan Recognition[2], which are abbreviated as R2V and DFPR. For the R2V method, because of the implicit Manhattan hypothesis, we deleted the data of inclined wall during the tag generation process, and then used the heuristic method to confirm the type of join point. To be fair, we used the detection module for both the R2V and DFPR methods. The final results are shown in Table 1, which shows that our method outperforms existing methods on almost all types and metrics. \

Table 2) Quantitative evaluation of vectoring results

Table 3) Ablation experiments with scale results

Table 4) Average accuracy of decorative symbol detection

Table 5) Average accuracy of text detection

Table 3 shows the measurement of the scale identification results, which is carried out in two dimensions, namely positive rate and average error. Positive examples are within a certain margin of error (2%, 5%, and 10%) from the labeled data. The mean error for all samples is 4.6mm. The Kmeans module and the digital volume regression module led to a 25% increase. Table 4 and Table 5 show the average accuracy of decorative symbol detection and text detection.

Qualitative evaluation

Figure 6) House type identification and reconstruction results

Figure 7) Reconstruction results of house type identification of more types

Figure 6 shows the results of house type recognition and reconstruction. From left to right are the original house type diagram, image recognition results, vectozed reconstruction results and final 3D reconstruction results. The second column identifies the different colors that represent different house elements. From the second line, you can see that the current method supports the reconstruction of slanted walls. In order to measure the generalization of our algorithm, we directly test the images with open source data sets (Rent3D [3] and Cubicasa5k [4]) without training, and still obtain high quality reconstruction results.

FIG. 8) House type identification and reconstruction results in Rent3D (first three rows) and Cubicasa5k (last two rows)

Discussion and Prospect

In the method of this paper, we assume that the room is divided by door, window, wall, doorway and other elements, so the elements of open kitchen cannot be accurately identified. In addition, because we only classified three categories in the scale detection module, we could not detect the scale elements on the inclined wall. In our vectoring module, we assume that the house is made of polygons, so we cannot reconstruct the curve result for the curve wall.

We formalize the extraction of structural elements in house plan recognition as a semantic segmentation problem. In our actual business use, we found that the online house layout naturally presents a long tail distribution. Many real estate developers in order to marketing effect, will define their own tile texture of different rooms, even the same developer at the same time in different regions still have different texture style, and the number of this style of house plan is very small. The semantic segmentation model trained on existing data sets will be greatly reduced when transferred to such images. For this kind of open set problem, there is no systematic method to solve it. In engineering practice, incremental learning can be used to get a phased available solution.

Reference

  1. Liu, Chen, et al. “Raster-to-vector: Revisiting floorplan transformation.” Proceedings of the IEEE International Conference on Computer Vision. 2017.
  2. Zeng, Zhiliang, et al. “Deep floor plan recognition using a multi-task network with room-boundary-guided attention.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
  3. Liu, Chenxi, et al. “Rent3d: Floor-plan priors for monocular layout estimation.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
  4. Kalervo, Ahti, et al. “Cubicasa5k: A dataset and an improved multi-task model for floorplan image analysis.” Scandinavian Conference on Image Analysis. Springer, Cham, 2019.

The paper links

Thesis text:

Openaccess.thecvf.com/content/CVP…

Attached materials:

Openaccess.thecvf.com/content/CVP…