Abstract: Yann Lecun once compared unsupervised learning to the cake and supervised learning to the icing on the cake, claiming that we only know how to make the icing but not how to make the cake. In this article, we provide a recipe for training unsupervised learning algorithms to enhance satellite imagery.
The following is the translation:
Yann Lecun once compared unsupervised learning to the cake, and supervised learning to the icing on the cake, claiming that we only know how to make the icing but not the cake. In this article, we provide a recipe for training unsupervised learning algorithms to enhance satellite imagery.
The research stems from the growing availability of low-cost satellite imagery in the emerging commercial space industry. In this emerging industry, there is a trade-off between sensor quality, return rate, and cost. We investigate the properties of advanced image processing to reduce this trade-off and improve the image returned by low-quality sensors at the same cost.
We embed image details from high-resolution images in deep neural network (DNN) and extract these details while enhancing geographically similar images. As part of this study, we introduce disturbance layers suitable for image enhancement tasks to develop a novel architecture for deep neural networks.
Super-resolution technique
Image enhancement can be done in many ways, such as noise reduction and color adjustment. For satellite images, ground sampling distance (GSD) is a common method to measure image quality, which represents the actual physical distance represented by a single pixel in the image. The image enhancement mentioned in this paper refers to the reduction (optimization) of the ground adoption distance in satellite images, namely super-resolution technology. Super resolution technology improves image resolution by synthesizing sub-pixel information in the image. Common synthesis methods include:
Interpolation between adjacent pixels in an image
Interpolation between adjacent frames in an image
Frequency domain filtering, reduce noise
In this study, we extend the above methods and apply deep learning techniques to geo-related image processing.
To quantify the effect of the enhancement methods, we compare the peak signal to noise ratio (PSNR) before and after image enhancement. In addition, for subsequent analysis, we also show the regional distribution and correlation of the peak SNR in the image.
PSNR is the inevitable choice to measure the generation ability of super resolution algorithm. We will publish a future article on learning a better cost function for super-resolution techniques using generative adversarial networks.
Complete convolutional neural networks with disturbance layers
Before showing the results directly, let’s discuss the framework developed to perform the super-resolution processing process. Standard deep neural networks, such as AlexNet, ResNet, VGG and GoogLeNet, are all frameworks for image classification and target detection of low-resolution images, which are not suitable for super-resolution image scenes with exponential output space.
Considering that super-resolution technology is essentially a disturbance of low-resolution images, we decided to design a new deep neural network composed of identity mapping disturbance sequences inspired by ResNet. This network extends its structure by optimizing the convex combination of the previous and current layers, one layer at a time, and generates trainable parameters (bypass parameters) that measure the contribution of the new layer to the final output.
This structure has the following benefits:
This network architecture is suitable for the training of extremely deep neural networks with jumping connections and random depth, which is consistent with modern training strategies
The bypass parameter evaluates the contribution of each layer and gives feedback on the depth the network should reach
Each layer performs approximately identical transformation and uses different structures to enhance the image
Each disturbance layer contains at least two convolution layers and a nonlinear ReLU layer between each convolution layer. The more convolution layers in the perturbation layer improve the ability of the perturbation layer to enhance the image, but the training convergence becomes more difficult. In addition, the additional disturbance layer has similar image enhancement potential and does not have convergence problems.
The bypass parameters provide direct feedback on the effects of each disturbance layer. This feedback helps answer the question of how deep the neural network should be.
The experiment
Our preliminary experiment uses the Panama Canal 3 band GeoTIFF degraded image to evaluate the image enhancement capability of the deep neural network by enhancing degraded image. We used two GeoTIFF images (very large satellite images) provided by DigitalGlobe in our experiment: one for training and one for testing. In a deep neural network calculation, we did not choose to enhance the entire image. Instead, we enhanced each region of the image one by 27 pixels at a time. Because the GeoTIFF image is very large, the method of extracting the region of 27×27 pixels can provide sufficient training data for our deep neural network. More training images might improve the experiment. But in the following experiment, we use these two GeoTIFF images to train the deep neural network:
The two GeoTIFF images have been resized to effectively reduce the image resolution
A random sampling method was used to obtain samples from the first GeoTIFF image and train the deep neural network, with only one layer of the model trained at a time. We train the weights in the deep neural network to maximize the PSNR value of the output of the deep neural network
Deep neural networks were used to enhance two degraded GeoTIFF images
The experimental results are compared with the interpolation-based image enhancement algorithm
We used TensorFlow to create, train, and infer deep neural networks on a 2015 NVIDIA Devbox machine with four Titan X graphics cards, but only one of the graphics cards was used for training. To train the neural network, we use the optimization algorithm ADAM, which is associated with the parameters that can affect the training time and convergence speed. We didn’t fully explore the optimal selection of ADAM parameters, but it still took about 12 hours (using a Titan X graphics card) to train each disturbance layer. The convergence rate of bypass parameters (as shown in FIG. 5) helped us select ADAM parameter and even the subsequent training time.
The experimental results
In this experiment, we used two GeoTIFF images of the Panama Canal, one for training and one for testing.
The first is to create training data through GeoTIFF degraded images. By adjusting the size of the GeoTIFF image, the resulting degraded image is effectively reduced in GSD and resolution. Using linear interpolation as a starting point, we can plot the distribution of PSNR over the whole degraded image.
Figure 7 shows that a single number representing PSNR is not sufficient to describe the noise in the satellite image. In degraded images, areas with more structure, such as ships, have lower PSNR values than areas with less structure, such as waters. When we train super-resolution algorithms to enhance degraded images, we want to enhance the area we focus on, and this is usually the area that contains the structure.
The results in FIG. 10 show that the enhancement method based on deep neural network can significantly improve the region with more structures. Although the test image and the training image have the same GSD, different atmospheric conditions and cloud cover also affect the enhancement effect, which explains the reason that the performance optimization of the test image is higher than that of the training image to a certain extent. The sharpness of the image also affects the marking of the area containing the ship. Inaccurate marking may contain more water, thus reducing the revenue in the area. Experiments that avoid these disturbances are beyond the scope of this article.
Other Research areas
Some examples, including SRCNN, have applied super-resolution technology to non-satellite images and achieved similar enhancements when trained on ImageNet. These methods may be feasible for satellite image enhancement, but our proposed method has one fundamental advantage: the location information of the image. In addition, our proposed method is different from existing methods based on the following points:
Satellite imagery is often the extreme case of many deep neural network-based machine learning algorithm applications
Overtraining is not necessarily bad for our algorithm, we can get a more diverse set of image data
The disturbance layer provides information about the depth that deep neural networks need to reach and the marginal performance improvement expected to increase the network depth
In addition to red, green, and blue, GeoTIFF images can contain more color channels, and for additional color channels (such as 8-band images), our method can be used with simple modifications
Finally, we experiment with the number of convolutional layers in the perturbation layer, increase the number of convolutional layers in each perturbation layer, and see the improved performance. We present the results of these experiments, based on 8-band images and the SpaceNet data set, in part 2.
Super Resolution on Satellite Imagery using Deep Learning, Part 1, by Patrick Hagerty
For more details, please refer to the original text:The official blog of CosmiQ Works in Medium
This article is translated by Ali Yunqi Community Organization.