With the higher resolution of mobile screens, and even the so-called “retina screen” like the iPhone, the demand for high-definition pictures is also increasing. On QQ, Qzone, Weibo, wechat and other social platforms, people are often willing to send and browse megabytes of high-definition pictures to get a better visual experience. However, this can cause some problems for users — viewing large images in high definition means consuming a lot of bandwidth, increasing data costs, and slowing down the loading speed, resulting in a poor user experience. In an era when time is money, how could you spend such precious time waiting for loading?
Therefore, how to achieve the hd effect by transferring small images without affecting the user experience is a problem worthy of study. In October, Google published a paper about RAISR (Rapid and Accurate Image Super-Resolution), a new technology that uses machine learning to convert low-resolution images into high-resolution ones. This technology can achieve or exceed the resolution of the original image at a bandwidth saving of 75%, while increasing the speed by about 10 to 100 times. Soon RAISR became the industry benchmark in the field.
Tencent QQ Space and Youtu Lab recently launched their latest technology in this field TSR (Tencent Super Resolution). According to Lei Feng, the processing speed of TSR technology is 40% higher than that of RAISR under the same standard, and the processing effect is also significantly improved. In addition, TSR is the industry’s first mobile ultra resolution using deep neural networks, and images can be processed in real time. Even on the user’s regular Android phone, the technology can be used.
I. Super-resolution model
The structure diagram of the superpartition model is as follows:
1. Neural networks
First, in the deep convolutional neural network, they built a 10-layer network. Compared to the neural networks currently studied in academia, this network can solve the problem of Checker Board Artifacts and unclear textures for partial image processing. It abstracts the overall particularity of the image through neural network, identifies the texture and content of the image, and then reconstructs the high-definition details of the image according to the texture and content of the image, so as to achieve the visual effect far beyond the original image.
Deep convolutional neural network
By controlling the number of layers of convolutional neural network and the number of CHANEL in each layer, this network can effectively solve the problem that the picture is too smooth and the texture is not clear under the condition of simplifying the overall computation. By simplifying the design, TSR can ensure that the model performs well at 4.6KB.
2. CbCr is separated from Y channel
In the aspect of image preprocessing, TSR adopts the quadratic interpolation method to preprocess the image. This works well with UGC (user-generated Content) images that are blurry. According to the sensitivity of human eyes to color and brightness, CbCr and Y channel were used to separate images, and only Y channel data were superprocessed to improve the processing speed.
(Note: YCbCr is a color space, usually used for continuous image processing in film, or in digital photography systems. Cb and Cr are the blue and red concentration offset components, and Y is the so-called luminance, which indicates the intensity of light.)
3. PRelu activation function
In addition, PRelu (Parametric Correction Linear Unit) is used as the activation function in the model, so that faster convergence rate and better network expression ability can be obtained.
As the name implies, it is ReLU with parameters. The definition and difference of the two are shown in the figure
In addition, they adopt the gradient descent method based on Adam(Adaptive Moment Estimation) to solve the specific parameters of the neural network model.
4. Image preprocessing
For specific model training, they first used 1W real pictures of users, and then constructed a training sample set of millions by adjusting the color, height, contrast, transformation, left-right inversion and other data enhancement operations of the pictures. Then, the width and height of the training sample images are compressed to 1/2 of the original by compression method, and the overall bandwidth of the images is only 1/4 of the original.
5. Contrast tuning parameters
After the processed image is processed by the superseparation model introduced above, the effect is compared with that of the original image, and the model parameters are adjusted according to the comparison effect.
Different from the training method in the industry, in addition to comparison of picture loss (PSNR), they also introduced a visual evaluation system, which uses users’ real pictures for visual evaluation to optimize parameters.
6. Evaluation results
TSR is compared with the cutting-edge super-resolution technologies in academia, such as the figure below (NTIRE2017 data, 400* 300 enlarged to 800 * 600, hardware environment: Titan XP Workstation). You can see that TSR is better than others (including Google’s RAISR) in terms of processing speed and image effects.
Second, apply super-resolution technology to mobile terminals
At present, the mainstream deep neural network models generally run on high-performance GPU machines in the background, which has high requirements on machine performance. TSR is a deep learning architecture based on mobile terminals.
TSR transfers deep learning from the background to the mobile end, mainly including the following key technologies:
1. Block acceleration technology divides the picture into many small pieces for processing through neural network. The advantage of block acceleration technology is that it can make full use of the multi-core characteristics of CPU for multi-core parallel computing.
In the process of segmentation, the algorithm is also used to recognize and intelligently process the texture complexity of the image to improve the processing speed of the image. As shown in the figure below, the processing process of the block in the blue box can be accelerated through intelligent identification.
2. Heterogeneous multi-core CPU/GPU acceleration technology can intelligently divide tasks according to the GPU and CPU capabilities of users’ mobile phones, and combine GPU/CPU for processing to achieve better processing effects. Such technology may be an industry first.
RapidNet deeply integrates OpencL GPU parallel computing acceleration technology based on AND platform AND METAL acceleration technology based on IOS platform. Arm-based cpus can make full use of NEON SIMD technology and pure program pool technology.
The TSR/RapidNet architecture is shown below
It is understood that compared with the mainstream machine learning platform in the industry, the speed is more than 10 times faster, memory consumption is reduced by 95%.
TSR compares the processing effect of the industry
4. Dynamic detection and model dynamic loading technology ensure full coverage of mobile terminals. TSR will dynamically detect the processing power of mobile phones and load different models in real time for different mobile phones, so as to ensure that all mobile phone clients can use this technology and ensure full coverage of mobile phones.
TSR image processing effect
Let’s see how TSR works with images. (Note: the original image is on the left, and the super-resolution image is on the right)
Comparison of effects after TSR treatment:
Detail comparison:
Comparison of effects after TSR treatment:
Detail comparison:
Comparison of effects after TSR treatment:
Detail comparison:
Comparison of effects after TSR treatment:
Detail comparison:
The user’s ordinary picture is compressed by 75% and then processed by TSR to compare the effect with the original picture:
4. Comparison with RAISR and other technologies
Under the same treatment criteria, the effect performance comparison between TSR and RAISR is as follows:
It can be seen that TSR is better than PARSR, the benchmark of the previous industry, both in terms of processing speed and processing effect: the processing speed is increased by 40% on the basis of PARSR, and the processing effect is also significantly improved. Let’s see it in a picture.
As can be seen from the above comparison figure, TSR performs better in detail reduction than RAISR in the processing of image details and textures.
Secondly, according to the introduction, TSR is the only technology in the industry that can implement the super-resolution technology based on deep learning and apply it to mobile terminals. Even on users’ ordinary mobile phones, TSR can also run well and achieve good results.
In addition, RapidNet, the deep learning framework derived from TSR, compared with CAFFE2 and TENSORFLOW, has an average performance improvement of 20 times, and can implement deep learning to ordinary mobile phones.
5. Technical application scenarios
The application of this technology, as mentioned at the beginning of this article, can be applied to all image processing in the industry, saving users up to 75% of the traffic, thus greatly reducing the bandwidth of image transmission. For Tencent, TSR has been applied in Qzone. In addition, QQ, wechat, Tiantian P Map, animation and other scenes should also be the target of TSR technology.
According to Leifeng, the technology can also be used to intelligently repair old photos and blurry ones, and turn ordinary pictures into high-definition ones.
Of course, perhaps most importantly, TSR actually opens the door to ai-related deep machine learning models on mobile. To run deep neural networks, you had to buy expensive Gpus. Now even ordinary users can run the technology on their ordinary phones. If extended, TSR technology may be able to help the development of face recognition, OCR recognition, background recognition, beauty makeup and other technologies in the future.
According to reports, with the rise of AI technology, Tencent Qzone has also increased investment in AI, their joint Youtu laboratory in the intelligent processing of pictures (including video content recognition, face recognition) as well as voice recognition, dialogue robots in these fields more in-depth research.
Lei Feng net original article, prohibit reprint without authorization. See instructions for details.