Selected from arXiv

Heart of the machine compiles

Participation: Lu Xue


In this paper, a Stacked Deconvolutional Network (Stacked Deconvolutional Network) is introduced, which can be used for efficient semantic image segmentation. In this method, multiple shallow deconvolution networks are stacked and hierarchical supervision is used to help network optimization, which achieves optimal results on multiple data sets. The paper was introduced by Machine Heart.





Link: https://arxiv.org/pdf/1708.04943.pdf


Abstract: Recent advances in the field of semantic segmentation are mainly due to the improvement of spatial resolution in full convolutional networks (FCN). To solve this problem, we propose a Stacked Deconvolutional Network (SDN) for semantic segmentation. In SDN, multiple shallow deconvolution networks (namely SDN units) are stacked one by one to integrate context information and ensure fine recovery of location information. At the same time, connections between and within cells are used to support network training and promote feature fusion because these connections can improve information flow and gradient propagation across the entire network. In addition, the use of Hierarchical Supervision in the up-sampling process of each SDN unit can ensure the difference in feature representation and help network optimization. We implemented comprehensive experiments and achieved top-notch results on three data sets (PASCAL VOC 2012, CamVid, GATECH). In particular, our best model’s intersection-over-union score on the test set without using CRF post-processing was 86.6%.




Figure 1. The architecture of our approach. The upper part represents the structure of stacked deconvolution network (SDN) proposed by us, and the lower part represents the specific structure of SDN unit (a), down-sampling module (b) and up-sampling module (C).




Figure 2. Hierarchical supervision with Score Map Connection during upsampling.




Figure 3. Different stacked SDN structures.




Figure 4. Results of our method on the PASCAL VOC 2012 validation set. Each column lists the input image (A), the semantic segmentation result of SDN_M1 network (B), the semantic segmentation result of SDN_M2 network (C), the semantic segmentation result of SDN_M3 network (D) and the truth value (E/Groundtruth).




Table 5. Experimental results of our method on PASCAL VOC 2012 test set.




Figure 5. Results of our method on the PASCAL VOC 2012 dataset. Each line of images from left to right are (1) input image (2) truth value (3) semantic segmentation results.




Figure 6. Results of our method on the CamVid dataset. From top to bottom, each column is :(1) input image, (2) semantic segmentation result, and (3) truth value.




Table 6. Experimental results of our method on the CamVid test set.




Table 7. Experimental results on GATECH test set




Figure 7. Results of our method on GATECH data set. The images in each column from top to bottom are :(1) input image (2) semantic segmentation result (3) truth value.