Abstract: Yann Lecun once compared unsupervised learning to the cake and supervised learning to the icing on the cake, claiming that we only know how to make the icing but not how to make the cake. In this article, we provide a recipe for training unsupervised learning algorithms to enhance satellite imagery.

The following is the translation:

Yann Lecun once compared unsupervised learning to the cake, and supervised learning to the icing on the cake, claiming that we only know how to make the icing but not the cake. In this article, we provide a recipe for training unsupervised learning algorithms to enhance satellite imagery.

The research stems from the growing availability of low-cost satellite imagery in the emerging commercial space industry. In this emerging industry, there is a trade-off between sensor quality, return rate, and cost. We investigate the properties of advanced image processing to reduce this trade-off and improve the image returned by low-quality sensors at the same cost.




Figure 1: Remote sensing using aircraft, commercial satellites and space stations. This figure is intended to represent the potential crossover between different remote sensing activities and does not represent the actual degree of crossover. Aerial remote sensing can be used to enhance sophisticated commercial satellite imagery. Sophisticated commercial satellite imagery can be used to enhance low-resolution satellite imagery.

We embed image details from high-resolution images in deep neural network (DNN) and extract these details while enhancing geographically similar images. As part of this study, we introduce disturbance layers suitable for image enhancement tasks to develop a novel architecture for deep neural networks.

Super-resolution technique

Image enhancement can be done in many ways, such as noise reduction and color adjustment. For satellite images, ground sampling distance (GSD) is a common method to measure image quality, which represents the actual physical distance represented by a single pixel in the image. The image enhancement mentioned in this paper refers to the reduction (optimization) of the ground adoption distance in satellite images, namely super-resolution technology. Super resolution technology improves image resolution by synthesizing sub-pixel information in the image. Common synthesis methods include:

Interpolation between adjacent pixels in an image

Interpolation between adjacent frames in an image

Frequency domain filtering, reduce noise

In this study, we extend the above methods and apply deep learning techniques to geo-related image processing.




Figure 2: Super-resolution technology. In order to transform super-resolution technology from ill-posed optimization problem to well-posed inverse problem, we must start from high resolution image, reduce image quality, and then optimize super-resolution technology to recover original image from degraded image. Peak signal to noise ratio (PSNR) was used to evaluate the difference between the original and restored images.

To quantify the effect of the enhancement methods, we compare the peak signal to noise ratio (PSNR) before and after image enhancement. In addition, for subsequent analysis, we also show the regional distribution and correlation of the peak SNR in the image.

PSNR is the inevitable choice to measure the generation ability of super resolution algorithm. We will publish a future article on learning a better cost function for super-resolution techniques using generative adversarial networks.

Complete convolutional neural networks with disturbance layers

Before showing the results directly, let’s discuss the framework developed to perform the super-resolution processing process. Standard deep neural networks, such as AlexNet, ResNet, VGG and GoogLeNet, are all frameworks for image classification and target detection of low-resolution images, which are not suitable for super-resolution image scenes with exponential output space.

Considering that super-resolution technology is essentially a disturbance of low-resolution images, we decided to design a new deep neural network composed of identity mapping disturbance sequences inspired by ResNet. This network extends its structure by optimizing the convex combination of the previous and current layers, one layer at a time, and generates trainable parameters (bypass parameters) that measure the contribution of the new layer to the final output.




Figure 3: Comparison of convex disturbance layer and ResNet layer proposed in this paper. Both architectures contain the combination of convolution layer and identity function. Convex perturbation enables this combination to be trained to an optimal level. As the value of β decreases, the contribution of layers to enhancement decreases.

This structure has the following benefits:

This network architecture is suitable for the training of extremely deep neural networks with jumping connections and random depth, which is consistent with modern training strategies

The bypass parameter evaluates the contribution of each layer and gives feedback on the depth the network should reach

Each layer performs approximately identical transformation and uses different structures to enhance the image

Each disturbance layer contains at least two convolution layers and a nonlinear ReLU layer between each convolution layer. The more convolution layers in the perturbation layer improve the ability of the perturbation layer to enhance the image, but the training convergence becomes more difficult. In addition, the additional disturbance layer has similar image enhancement potential and does not have convergence problems.




Figure 4: Deep neural network with disturbance layer

The bypass parameters provide direct feedback on the effects of each disturbance layer. This feedback helps answer the question of how deep the neural network should be.




Figure 5: Bypass parameters during model training. This figure plots the weights of bypass parameters in the training process. For this special training algorithm, each layer includes two training stages: first, the parameters of each layer are trained; Secondly, all previously trained parameters are combined with the new level for optimization. The bypass parameter decreases as the number of network layers increases. Finally, the new layer does not change the integral value of each pixel in the enhanced image (it does not converge with the other layers) — this is the definition of the subpixel threshold.

The experiment

Our preliminary experiment uses the Panama Canal 3 band GeoTIFF degraded image to evaluate the image enhancement capability of the deep neural network by enhancing degraded image. We used two GeoTIFF images (very large satellite images) provided by DigitalGlobe in our experiment: one for training and one for testing. In a deep neural network calculation, we did not choose to enhance the entire image. Instead, we enhanced each region of the image one by 27 pixels at a time. Because the GeoTIFF image is very large, the method of extracting the region of 27×27 pixels can provide sufficient training data for our deep neural network. More training images might improve the experiment. But in the following experiment, we use these two GeoTIFF images to train the deep neural network:

The two GeoTIFF images have been resized to effectively reduce the image resolution

A random sampling method was used to obtain samples from the first GeoTIFF image and train the deep neural network, with only one layer of the model trained at a time. We train the weights in the deep neural network to maximize the PSNR value of the output of the deep neural network

Deep neural networks were used to enhance two degraded GeoTIFF images

The experimental results are compared with the interpolation-based image enhancement algorithm

We used TensorFlow to create, train, and infer deep neural networks on a 2015 NVIDIA Devbox machine with four Titan X graphics cards, but only one of the graphics cards was used for training. To train the neural network, we use the optimization algorithm ADAM, which is associated with the parameters that can affect the training time and convergence speed. We didn’t fully explore the optimal selection of ADAM parameters, but it still took about 12 hours (using a Titan X graphics card) to train each disturbance layer. The convergence rate of bypass parameters (as shown in FIG. 5) helped us select ADAM parameter and even the subsequent training time.

The experimental results

In this experiment, we used two GeoTIFF images of the Panama Canal, one for training and one for testing.




Figure 6: Satellite image of panama Canal. This is the raw training image from the deep neural network.

The first is to create training data through GeoTIFF degraded images. By adjusting the size of the GeoTIFF image, the resulting degraded image is effectively reduced in GSD and resolution. Using linear interpolation as a starting point, we can plot the distribution of PSNR over the whole degraded image.




Figure 7: PSNR distribution in deep neural network input image. The input to the deep neural network is a degraded satellite image resized (multiplied by 2 times through linear interpolation) to match the size of the original GeoTIFF image. This figure shows the location of noise introduced during the degradation process. The blue region introduced more noise during quality reduction, while the red region introduced less noise. Blue areas typically represent areas with fine structures, such as boats, while red areas typically represent areas with rougher features, such as open water.

Figure 7 shows that a single number representing PSNR is not sufficient to describe the noise in the satellite image. In degraded images, areas with more structure, such as ships, have lower PSNR values than areas with less structure, such as waters. When we train super-resolution algorithms to enhance degraded images, we want to enhance the area we focus on, and this is usually the area that contains the structure.




Figure 8: PSNR gain of image after enhancement with deep neural network. We plot the distribution of PSNR benefits using test images not used for deep neural network training. Most areas of the image are enhanced. The blue area corresponds to the general area with significantly less noise in the original image. The enhancement effect of PSNR was compared with the initial linear interpolation method.




Figure 9: PSNR gains compared to bicubic interpolation. On GeoTIFF test images, we compared with the bicublic difference method to map the difference on PSNR. Areas with higher initial noise also benefit.




Figure 10: PSNR changes for the enhanced method based on deep neural networks compared to linear interpolation and bicubic interpolation. The PSNR change is calculated from the entire GeoTIFF image and the subregion of the GeoTIFF image containing the ship. The enhancement effect of the area containing fine structure was significantly higher than that of the water area.

The results in FIG. 10 show that the enhancement method based on deep neural network can significantly improve the region with more structures. Although the test image and the training image have the same GSD, different atmospheric conditions and cloud cover also affect the enhancement effect, which explains the reason that the performance optimization of the test image is higher than that of the training image to a certain extent. The sharpness of the image also affects the marking of the area containing the ship. Inaccurate marking may contain more water, thus reducing the revenue in the area. Experiments that avoid these disturbances are beyond the scope of this article.




Figure 11: Enhanced example of a ship in water. This image shows the enhancement effect to the degraded ship image. Since most of this area is water, its PSNR value is smaller than that of the area containing only ships.

Other Research areas

Some examples, including SRCNN, have applied super-resolution technology to non-satellite images and achieved similar enhancements when trained on ImageNet. These methods may be feasible for satellite image enhancement, but our proposed method has one fundamental advantage: the location information of the image. In addition, our proposed method is different from existing methods based on the following points:

Satellite imagery is often the extreme case of many deep neural network-based machine learning algorithm applications

Overtraining is not necessarily bad for our algorithm, we can get a more diverse set of image data

The disturbance layer provides information about the depth that deep neural networks need to reach and the marginal performance improvement expected to increase the network depth

In addition to red, green, and blue, GeoTIFF images can contain more color channels, and for additional color channels (such as 8-band images), our method can be used with simple modifications

Finally, we experiment with the number of convolutional layers in the perturbation layer, increase the number of convolutional layers in each perturbation layer, and see the improved performance. We present the results of these experiments, based on 8-band images and the SpaceNet data set, in part 2.

Super Resolution on Satellite Imagery using Deep Learning, Part 1, by Patrick Hagerty

For more details, please refer to the original text:The official blog of CosmiQ Works in Medium

This article is translated by Ali Yunqi Community Organization.