Semantic segmentation refers to the process of associating every pixel in an image with a category label, which might include a person, a car, a flower, a piece of furniture, and so on. In this article, the author introduces some of the best recent semantic segmentation ideas and solutions that deserve to be described as a 2019 semantic segmentation guide.
Compiled by Derrick Mwiti, Heart of the Machine, Nurhachu Null, Geek AI.
We can think of semantic segmentation as image classification at pixel level. For example, in an image with many cars, the segmentation model will mark all objects (cars) as vehicles. However, another model, called instance segmentation, is able to mark independent objects that appear in an image as independent instances. This segmentation is useful when used to count objects (for example, counting the number of customers in a mall).
-
Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation
-
Fully Convolutional Networks for Semantic Segmentation
-
U-Net: Convolutional Networks for Biomedical Image Segmentation
-
The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation
-
Multi-Scale Context Aggregation by Dilated Convolutions
-
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
-
Rethinking Atrous Convolution for Semantic Image Segmentation
-
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
-
FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation
-
Improving Semantic Segmentation via Video Propagation and Label Relaxation
-
Gated-SCNN: Gated Shape CNNs for Semantic Segmentation
-
Address: arxiv.org/pdf/1502.02…
The main contributions of this paper are as follows:
-
Introduce EM algorithms for boundary-box or image-level training, which can be used in weakly supervised and semi-supervised environments.
-
It is proved that the combination of weak annotation and strong annotation can improve the performance. After combining the ANNOTATION of the MS-COCO dataset and PASCAL dataset, the authors of the paper achieved a 73.9% crossover ratio performance on PASCAL VOC 2012.
-
It is proved that their method achieves better performance by combining a small amount of pixel-level annotations with a large number of boundary box annotations (or image-level annotations).
-
Address: arxiv.org/pdf/1605.06…
In biomedical image processing, it is very important to obtain the category label of each cell in the image. One of the biggest challenges in biomedicine is that images for training are not easy to obtain, nor are data volumes large. U-net is a very famous solution. It builds the model on the fully connected convolution layer and modifs it so that it can run on a small amount of training image data and get more accurate segmentation.
-
The paper address: https://arxiv.org/pdf/1505.04597.pdf
In this model, training is accomplished by input images, their segmentation graphs, and stochastic gradient descent. Data enhancement is used to teach networks the robustness and immutability necessary when using very little training data. The model achieved 92% mIoU in one of the experiments.
-
The paper address: https://arxiv.org/pdf/1611.09326.pdf
-
The structure of DenseNet is extended to full convolutional networks for semantic segmentation.
-
The upsampling path in dense network has better performance than other upsampling paths.
-
Demonstrate that the network produces the best results on standard benchmarks.
This paper proposes a convolutional network module that can mix multi-scale context information without loss of resolution. This module can then be embedded into an existing structure at arbitrary resolution, based primarily on empty convolution.
-
The paper address: https://arxiv.org/abs/1511.07122
In this paper, the author makes the following contributions to the semantic segmentation task:
-
Use convolution with upsampling for intensive prediction tasks
-
Spatial Pyramid Pooling with Holes for Segmented Objects at Multiple Scales (ASPP)
-
The location of the target boundary is improved by using DCNNs
-
The paper address: https://arxiv.org/abs/1606.00915
This paper addresses the major challenges of semantic segmentation, including:
-
Reduced feature resolution due to repeated maximum pooling and downsampling
-
Detection of multi-scale targets
-
Because the target-centered classifier needs to have invariance to the spatial transformation, the positioning accuracy caused by the invariance of DCNN is reduced.
-
The paper address: https://arxiv.org/pdf/1706.05587.pdf
In the absence of dense Conditional Random field (DenseCRF), the DeepLabv3 version of the paper achieved 85.7% performance on the PASCAL VOC 2012 test set.
In this paper, the method “DeepLabv3+” achieved 89.0% and 82.1% performance on PASCAL VOC 2012 and Cityscapes datasets, respectively, without any post-processing. This model improves segmentation results by adding a simple decoding module on top of DeepLabv3.
-
The paper address: https://arxiv.org/pdf/1802.02611v3.pdf
This paper proposes a Joint up-sampling module called Joint Pyramid Upsampling (JPU) to replace the time-consuming and memory-consuming hole convolution. It formalizes the method of high resolution graph extraction and constructs it as an upsampling problem to achieve good results.
-
The paper address: https://arxiv.org/pdf/1903.11816v1.pdf
In this method, full convolutional network (FCN) is used as the main architecture, and JPU is used to up-sample the final feature image with low resolution, and a high resolution feature image is obtained. There is no performance penalty for using JPU instead of hole convolution.
Joint sampling uses a low resolution target image and a high resolution guidance image. Then the high resolution target image is generated by transferring the structure and details of the guide image.
This paper proposes a video-based approach to enhance the data set by synthesizing new training samples and improving the accuracy of semantic segmentation networks. This paper explores the ability of video prediction models to predict future frames, and in turn, to predict future tags.
-
The paper address: https://arxiv.org/pdf/1812.01593v3.pdf
-
Label Propagation: To create new training samples by pairing original future frames with propagated labels.
-
Joint image-label Propagation (JP) : To create new training samples by pairing corresponding image and label Propagation.
This paper is the latest achievement in the field of semantic segmentation (2019.07). The author proposes a dual-flow CNN structure. In this structure, the shape information of the target is processed through an independent branch, and the shape flow processes only boundary-related information. This is enforced by the gating layer (GCL) and local supervision of the model.
-
The paper address: https://arxiv.org/pdf/1907.05740.pdf